Efficient Diffusion Models for Video Generation

The field of diffusion models for video generation is rapidly advancing, with a focus on improving efficiency and reducing computational costs. Recent developments have led to the creation of novel frameworks and techniques that enable faster and more efficient video generation, such as SlimDiff, DriftLite, and QuantSparse. These innovations have achieved significant reductions in parameters, latency, and computational requirements, making diffusion models more practical for real-world applications. Noteworthy papers include SlimDiff, which introduced a training-free, activation-guided structural compression framework, and DriftLite, which proposed a lightweight, training-free particle-based approach for inference-time scaling of diffusion models. Other notable works, such as NeRV-Diffusion and SANA-Video, have demonstrated superior video generation quality and efficiency, while DC-Gen and DC-VideoGen have shown promising results in post-training acceleration and deep compression of latent spaces. Overall, the field is moving towards more efficient, scalable, and high-quality video generation capabilities.

Sources

SlimDiff: Training-Free, Activation-Guided Hands-free Slimming of Diffusion Models

DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models

QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

NeRV-Diffusion: Diffuse Implicit Neural Representations for Video Synthesis

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Attention Surgery: An Efficient Recipe to Linearize Your Video Diffusion Transformer

Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers

VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing

HilbertA: Hilbert Attention for Image Generation with Diffusion Models

Built with on top of