Developments in Human-AI Interaction and Multimodal Understanding

The field of human-AI interaction is rapidly advancing, with a focus on developing socially intelligent AI technologies that can comprehend and generate dyadic behavioral dynamics. Researchers are working on creating models that can understand and replicate human-like interactions, including motion gestures and facial expressions, to enable more intuitive and responsive human-AI interactions. One of the key areas of research is the development of unified motion-language models that can treat human motion as a second modality, enabling effective cross-modal interaction and efficient multimodal scaling training. Another area of focus is the creation of virtual humans that can generate motion responses with both listening and speaking capabilities, making it challenging to achieve real-time and realistic interactions. Noteworthy papers include:

  • Seamless Interaction, which introduces a large-scale dataset and suite of models for generating dyadic motion gestures and facial expressions.
  • MotionGPT3, which proposes a bimodal motion-language model that achieves competitive performance on both motion understanding and generation tasks.
  • ARIG, which presents an autoregressive framework for real-time interactive head generation.

Sources

Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

Synthetically Expressive: Evaluating gesture and voice for emotion and empathy in VR and 2D scenarios

MotionGPT3: Human Motion as a Second Modality

ARIG: Autoregressive Interactive Head Generation for Real-time Conversations

Built with on top of