Advances in Human Motion Analysis and Generation

The field of human motion analysis and generation is rapidly advancing, with a focus on developing more accurate and efficient models for understanding and synthesizing human movement. Recent research has explored the use of novel architectures, such as transformers and diffusion models, to improve the performance of human motion recognition and generation systems. Additionally, there is a growing interest in incorporating multimodal inputs, such as audio and text, to enable more nuanced and context-dependent motion analysis and generation. Notable papers in this area include VividAnimator, which proposes an end-to-end framework for generating high-quality, half-body human animations driven by audio and sparse hand pose conditions, and DEMO, which introduces a flow-matching generative framework for audio-driven talking-portrait video synthesis with disentangled, high-fidelity control of lip motion, head pose, and eye gaze. Overall, the field is moving towards more sophisticated and realistic models of human motion, with potential applications in areas such as computer vision, robotics, and healthcare.

Sources

VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework

Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis

MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition

Personalized Motion Guidance Framework for Athlete-Centric Coaching

DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis

FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding

High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation

Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

Unconditional Human Motion and Shape Generation via Balanced Score-Based Diffusion

Learning Human Motion with Temporally Conditional Mamba

On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose Estimation

What If : Understanding Motion Through Sparse Interactions

SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion

Learning Neural Parametric 3D Breast Shape Models for Metrical Surface Reconstruction From Monocular RGB Videos

OmniMotion: Multimodal Motion Generation with Continuous Masked Autoregression

Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation