Co-Speech Gesture Generation Advances

The field of co-speech gesture generation is moving towards more semantic and context-aware approaches, with a focus on generating gestures that are not only rhythmic but also semantically coherent and relevant to the speech. This is evident in the development of novel architectures and techniques that integrate semantic information at both fine-grained and global levels, enabling the synthesis of gestures that preserve example-specific characteristics while maintaining speech congruence. Noteworthy papers in this regard include SemGes, which proposes a semantics-aware co-speech gesture generation approach using semantic coherence and relevance learning, and MECo, which leverages large language models to enable motion-example-controlled co-speech gesture generation. Additionally, GestureHYDRA introduces a hybrid-modality diffusion transformer architecture for semantic co-speech gesture synthesis, and Real-time Generation of Various Types of Nodding proposes a model for predicting both the timing and type of nodding in real-time.

Co-Speech Gesture Generation Advances

Sources