The field of human motion analysis and generation is rapidly advancing, with a focus on developing more accurate and efficient models for understanding and synthesizing human movement. Recent research has explored the use of novel architectures, such as transformers and diffusion models, to improve the performance of human motion recognition and generation systems. Additionally, there is a growing interest in incorporating multimodal inputs, such as audio and text, to enable more nuanced and context-dependent motion analysis and generation. Notable papers in this area include VividAnimator, which proposes an end-to-end framework for generating high-quality, half-body human animations driven by audio and sparse hand pose conditions, and DEMO, which introduces a flow-matching generative framework for audio-driven talking-portrait video synthesis with disentangled, high-fidelity control of lip motion, head pose, and eye gaze. Overall, the field is moving towards more sophisticated and realistic models of human motion, with potential applications in areas such as computer vision, robotics, and healthcare.
Advances in Human Motion Analysis and Generation
Sources
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback
On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose Estimation