Advances in Text-to-Motion Generation

The field of text-to-motion generation is moving towards more controllable and realistic motion synthesis. Researchers are exploring new frameworks and methods to improve the alignment between text inputs and generated motions, such as dual-conditioning paradigms and step-aware reward-guided alignment. Additionally, there is a growing interest in generating human motions that are physically plausible and consistent with real-world statistics. Noteworthy papers include: MotionDuet, which introduces a multimodal framework for aligning motion generation with video-derived representations. FineXtrol, which proposes a novel control framework for efficient motion generation guided by fine-grained textual control signals. ReAlign, which addresses the misalignment between text and motion distributions in diffusion models. BRIC, which enables long-term human motion generation by adapting physics controllers to noisy motion plans at test time. Learning to Generate Human-Human-Object Interactions from Textual Descriptions, which presents a novel research problem and method for modeling correlations between two people engaged in a shared interaction involving an object.

Sources

MotionDuet: Dual-Conditioned 3D Human Motion Generation with Video-Regularized Text Learning

FineXtrol: Controllable Motion Generation via Fine-Grained Text

ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

BRIC: Bridging Kinematic Plans and Physical Control at Test Time

Learning to Generate Human-Human-Object Interactions from Textual Descriptions

Built with on top of