Advances in Multimodal Interaction and Generation

The field of multimodal interaction and generation is moving towards more nuanced and expressive models, with a focus on evaluating and refining socially intelligent agents. Recent developments have introduced new frameworks for assessing multiparty social behavior, as well as datasets and models for generating high-quality 3D gestures and facial motions. These advancements have the potential to improve applications in virtual reality, computer graphics, and human-computer interaction. Noteworthy papers include:

  • A study introducing a unified framework for evaluating multiparty social behavior, which delivers orthogonal insights into spatial structure, timing alignment, and behavioural variability.
  • A paper presenting a new dataset for multidimensional quality assessment of audio-to-3D gesture generation, which achieves state-of-the-art performance on the proposed dataset.
  • A work introducing a benchmark for expressive 4D facial motion generation, which provides a rich and extensible dataset for future research.

Sources

Multimodal Quantitative Measures for Multiparty Behaviour Evaluation

Ges-QA: A Multidimensional Quality Assessment Dataset for Audio-to-3D Gesture Generation

Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark

Say It, See It: A Systematic Evaluation on Speech-Based 3D Content Generation Methods in Augmented Reality

Built with on top of