The field of video generation and processing is moving towards more efficient and controllable methods. Recent developments have focused on reducing latency and improving the quality of generated content. One notable direction is the use of caching and sparse computation to accelerate video generation. Another area of research is the development of more versatile video tokenization methods, which can better capture the spatial and temporal structure of video data. These advancements have the potential to enable faster and more accurate video generation, as well as improved performance in downstream tasks such as action recognition and compression. Noteworthy papers include EVCtrl, which proposes a lightweight control adapter for efficient video generation, and MixCache, which introduces a mixture-of-cache framework for accelerating video diffusion transformers. Additionally, Compact Attention presents a hardware-aware acceleration framework for exploiting structured spatio-temporal sparsity in video data, and Versatile Video Tokenization with Generative 2D Gaussian Splatting proposes a novel video tokenization method based on generative 2D Gaussian splatting.