Advances in Video Generation and Editing

The field of video generation and editing is rapidly advancing, with a focus on improving the coherence and consistency of generated videos. Recent research has explored the use of diffusion models, graph structures, and attention mechanisms to enhance the quality and controllability of video generation. One of the key challenges in this area is maintaining long-term consistency and coherence in generated videos, which has been addressed through the development of novel frameworks and techniques. Notable papers in this area include InfLVG, which enables coherent long video generation without requiring additional long-form video data, and DanceTogether, which generates photorealistic videos of multi-person interactions while preserving individual identities. Other papers, such as ATI and MAGREF, have proposed innovative approaches to controllable video generation and editing, allowing for more precise control over the generated content. Overall, the field is moving towards more sophisticated and controllable video generation and editing capabilities, with potential applications in areas such as digital production, simulation, and embodied intelligence. Noteworthy papers include InfLVG, which achieves strong consistency and semantic fidelity across scenes, and DanceTogether, which outperforms prior arts by a significant margin on the TogetherVideoBench benchmark.

Sources

InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO

Multi-Person Interaction Generation from Two-Person Motion Priors

DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation

Diffusion Model-based Activity Completion for AI Motion Capture from Videos

Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion

Learning World Models for Interactive Video Generation

Autoregression-free video prediction using diffusion model for mitigating error propagation

StateSpaceDiffuser: Bringing Long Context to Diffusion World Models

ATI: Any Trajectory Instruction for Controllable Video Generation

Toward Memory-Aided World Models: Benchmarking via Spatial Consistency

Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

MAGREF: Masked Guidance for Any-Reference Video Generation

Built with on top of