Advances in Text-to-Motion Generation

The field of text-to-motion generation is moving towards more controllable and realistic motion synthesis. Researchers are exploring new frameworks and methods to improve the alignment between text inputs and generated motions, such as dual-conditioning paradigms and step-aware reward-guided alignment. Additionally, there is a growing interest in generating human motions that are physically plausible and consistent with real-world statistics. Noteworthy papers include: MotionDuet, which introduces a multimodal framework for aligning motion generation with video-derived representations. FineXtrol, which proposes a novel control framework for efficient motion generation guided by fine-grained textual control signals. ReAlign, which addresses the misalignment between text and motion distributions in diffusion models. BRIC, which enables long-term human motion generation by adapting physics controllers to noisy motion plans at test time. Learning to Generate Human-Human-Object Interactions from Textual Descriptions, which presents a novel research problem and method for modeling correlations between two people engaged in a shared interaction involving an object.

Advances in Text-to-Motion Generation

Sources