Advances in Human Motion and Video Generation

The field of human motion and video generation is rapidly advancing, with a focus on developing more efficient and realistic models. Recent research has explored the use of latent-space streaming architectures, causal decoding, and motion-centric representation alignment to improve the quality and temporal consistency of generated videos. Additionally, there is a growing interest in applying these models to real-world applications, such as poultry farm intelligence, horse monitoring, and pedestrian dynamics simulation. Noteworthy papers include LILAC, which achieves long-sequence real-time arbitrary motion stylization, and OmniMotion-X, which introduces a versatile multimodal framework for whole-body human motion generation. MoAlign is also notable for its motion-centric alignment framework that improves the physical commonsense of generated videos.

Sources

LILAC: Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal Decoding

Poultry Farm Intelligence: An Integrated Multi-Sensor AI Platform for Enhanced Welfare and Productivity

Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset

HumanCM: One Step Human Motion Prediction

From Mannequin to Human: A Pose-Aware and Identity-Preserving Video Generation Framework for Lifelike Clothing Display

Monitoring Horses in Stalls: From Object to Event Detection

Can Image-To-Video Models Simulate Pedestrian Dynamics?

MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models

Video Consistency Distance: Enhancing Temporal Consistency for Image-to-Video Generation via Reward-Based Fine-Tuning

OmniMotion-X: Versatile Multimodal Whole-Body Motion Generation

Is This Tracker On? A Benchmark Protocol for Dynamic Tracking

Evaluating Video Models as Simulators of Multi-Person Pedestrian Trajectories

Built with on top of