Advancements in Diffusion Models and Video Processing

The field of diffusion models and video processing is rapidly evolving, with a focus on improving efficiency, quality, and scalability. Recent developments have led to the creation of hybrid adaptive diffusion models, such as HADIS, which optimize cascade model selection, query routing, and resource allocation to improve response quality and reduce latency. Additionally, bidirectional sparse attention frameworks, like BSA, have been proposed to accelerate video diffusion training by dynamically sparsifying queries and key-value pairs. Other notable advancements include the introduction of quantization-aware scheduling, such as Q-Sched, which achieves full-precision accuracy with reduced model size, and the development of generative video compositing models, like GenCompositor, which enable interactive video editing. Noteworthy papers in this area include HADIS, which improves response quality by up to 35% while reducing latency violation rates, and Q-Sched, which delivers substantial gains in image generation quality with a 4x reduction in model size.

Sources

HADIS: Hybrid Adaptive Diffusion Model Serving for Efficient Text-to-Image Generation

Bidirectional Sparse Attention for Faster Video Diffusion Training

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling

GenCompositor: Generative Video Compositing with Diffusion Transformer

DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off

Fitting Image Diffusion Models on Video Datasets

LMVC: An End-to-End Learned Multiview Video Coding Framework

MICACL: Multi-Instance Category-Aware Contrastive Learning for Long-Tailed Dynamic Facial Expression Recognition

Transition Models: Rethinking the Generative Learning Objective

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation

STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs

Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts

Micro-Expression Recognition via Fine-Grained Dynamic Perception

Home-made Diffusion Model from Scratch to Hatch

Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning

Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning

Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening