The field of video and 3D generation is rapidly advancing, with a focus on improving efficiency and reducing computational complexity. Recent developments have led to the creation of novel frameworks and techniques that enable the generation of high-quality video and 3D content at significantly reduced costs. One of the key areas of innovation is the use of sparse attention mechanisms, which allow for the efficient processing of large datasets while maintaining visual quality. Additionally, new approaches to video tokenization and dataset condensation have been proposed, enabling more efficient and effective video analysis and understanding. These advancements have the potential to unlock new applications and use cases for video and 3D generation, from computer vision and robotics to gaming and entertainment. Noteworthy papers in this area include Direct3D-S2, which introduces a scalable 3D generation framework based on sparse volumes, and Re-ttention, which implements high sparsity levels in attention mechanisms for visual generation models. Q-VDiT is also notable for its quantization framework specifically designed for video diffusion transformer models, achieving a 1.9x improvement over current state-of-the-art methods.