The field of video and image generation is rapidly advancing, with a focus on improving the quality and efficiency of generative models. Recent developments have led to the creation of more sophisticated models that can effectively disentangle static and dynamic factors in videos, enabling better video generation and editing capabilities. Additionally, there have been significant improvements in image generation, with models now able to produce high-quality images with increased efficiency. Notably, the use of diffusion models and latent space scaling has shown great promise in achieving these goals. Some papers have also explored the application of these models to specific tasks, such as face reenactment and text-to-image synthesis, with impressive results. Overall, the field is moving towards more powerful and efficient generative models that can be applied to a wide range of tasks. Noteworthy papers include DiViD, which introduced a novel video diffusion framework for explicit static-dynamic factorization, and LSSGen, which proposed a framework for efficient text-to-image generation using latent space scaling. TeEFusion also presented a novel method for distilling classifier-free guidance in text-to-image synthesis, allowing for faster inference speeds without sacrificing image quality.