Advances in Controllable Video Generation and Human Motion Synthesis

The field of controllable video generation and human motion synthesis is rapidly advancing, with a focus on improving the semantic consistency and realism of generated videos and motions. Recent developments have led to the creation of novel frameworks and models that can generate high-fidelity videos and motions that are precisely controlled by external signals such as text descriptions and music. These advancements have the potential to revolutionize various applications, including animation, gaming, and virtual reality. Notable papers in this area include SSG-Dit, which proposes a spatial signal guided framework for controllable video generation, and DanceEditor, which introduces a novel framework for iterative and editable dance generation. Other noteworthy papers include MoCo, which decouples the process of human video generation into structure and appearance generation, and OmniHuman-1.5, which generates character animations that are semantically coherent and expressive. Additionally, papers like MotionFlux and PersonaAnimator have made significant contributions to the field of motion generation and transfer. Overall, the field is moving towards more realistic and controllable video and motion generation, with a focus on semantic consistency and realism.

Sources

SSG-Dit: A Spatial Signal Guided Framework for Controllable Video Generation

MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation

DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions

MoCo: Motion-Consistent Human Video Generation via Structure-Appearance Decoupling

Controllable Single-shot Animation Blending with Temporal Conditioning

Wan-S2V: Audio-Driven Cinematic Video Generation

AniME: Adaptive Multi-Agent Planning for Long Animation Generation

OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation

MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

PersonaAnimator: Personalized Motion Transfer from Unconstrained Videos

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

InfinityHuman: Towards Long-Term Audio-Driven Human

Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion

Built with on top of