Advances in Video and Image Generation

The field of video and image generation is rapidly advancing, with a focus on improving the quality and efficiency of generative models. Recent developments have led to the creation of more sophisticated models that can effectively disentangle static and dynamic factors in videos, enabling better video generation and editing capabilities. Additionally, there have been significant improvements in image generation, with models now able to produce high-quality images with increased efficiency. Notably, the use of diffusion models and latent space scaling has shown great promise in achieving these goals. Some papers have also explored the application of these models to specific tasks, such as face reenactment and text-to-image synthesis, with impressive results. Overall, the field is moving towards more powerful and efficient generative models that can be applied to a wide range of tasks. Noteworthy papers include DiViD, which introduced a novel video diffusion framework for explicit static-dynamic factorization, and LSSGen, which proposed a framework for efficient text-to-image generation using latent space scaling. TeEFusion also presented a novel method for distilling classifier-free guidance in text-to-image synthesis, allowing for faster inference speeds without sacrificing image quality.

Sources

DiViD: Disentangled Video Diffusion for Static-Dynamic Factorization

Generalist Forecasting with Frozen Video Models via Latent Diffusion

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models

Generative Distribution Distillation

Conditional Video Generation for High-Efficiency Video Compression

LSSGen: Leveraging Latent Space Scaling in Flow and Diffusion for Efficient Text to Image Generation

Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance

C2G-KD: PCA-Constrained Generator for Data-Free Knowledge Distillation

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis

Built with on top of