Accelerating Diffusion Models for Video Generation

The field of video generation is rapidly advancing with a focus on improving the efficiency and quality of diffusion models. Researchers are exploring innovative methods to reduce the computational costs and memory demands of these models, enabling their deployment in real-world applications. Notably, techniques such as quantization, adaptive inference-time scaling, and attention acceleration are being developed to enhance the performance of diffusion transformers. These advancements have the potential to revolutionize the field of video generation, enabling faster and more efficient generation of high-quality videos.

Some noteworthy papers in this area include: QVGen, which presents a novel quantization-aware training framework for high-performance and inference-efficient video diffusion models. Adaptive Cyclic Diffusion, which introduces a flexible inference framework for dynamically adjusting computational effort during inference. Grouping First, Attending Smartly, which proposes a training-free attention acceleration strategy for fast image and video generation. DraftAttention, which applies down-sampling to each feature map across frames to enable a higher-level receptive field and accelerate video diffusion transformers. FastCar, which explores temporal redundancy in auto-regressive video generation and proposes a cache attentive replay mechanism to reduce redundant computations. Communication-Efficient Diffusion Denoising Parallelization, which introduces a novel parallelization method based on a reuse-then-predict mechanism to reduce communication overhead. FPQVAR, which proposes an efficient post-training floating-point quantization framework for visual autoregressive models. REPA Works Until It Doesn't, which introduces a two-phase schedule for holistic alignment and stage-wise termination to accelerate diffusion training. Training-Free Efficient Video Generation via Dynamic Token Carving, which presents a novel inference pipeline that combines dynamic attention carving with progressive resolution generation.

Sources

QVGen: Pushing the Limit of Quantized Video Generative Models

Adaptive Cyclic Diffusion for Inference Scaling

Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers

DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance

FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design

REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

Training-Free Efficient Video Generation via Dynamic Token Carving

Built with on top of