Advancements in Video and Motion Generation

The field of video and motion generation is rapidly advancing, with a focus on developing more efficient and effective methods for generating high-quality videos and motions. Recent research has explored the use of diffusion models, autoregressive models, and other techniques to improve the quality and diversity of generated content. One notable direction is the development of methods that can generate videos and motions in real-time, with applications in areas such as gaming, simulation, and robotics. Additionally, there is a growing interest in multimodal generation, where models can generate videos, motions, and other types of data simultaneously. Noteworthy papers in this area include FR-TTS, which proposes a novel test-time scaling method for image generation, and Generative Action Tell-Tales, which introduces a new evaluation metric for assessing human motion in synthesized videos. Other notable papers include ClusterStyle, YingVideo-MV, GalaxyDiT, FloodDiffusion, LSRS, UniMo, Beyond Flicker, Inference-time Stochastic Refinement of GRU-Normalizing Flow, VideoSSM, Live Avatar, Reward Forcing, Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens, and Deep Forcing. These papers demonstrate significant advancements in the field and highlight the potential for future research and applications.

Sources

FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal

Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

ClusterStyle: Modeling Intra-Style Diversity with Prototypical Clustering for Stylized Motion Generation

YingVideo-MV: Music-Driven Multi-Stage Video Generation

GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling

UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework

Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer

Back to Basics: Motion Representation Matters for Human Motion Generation Using Diffusion Model

VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression