The field of multimodal interaction and generation is moving towards more nuanced and expressive models, with a focus on evaluating and refining socially intelligent agents. Recent developments have introduced new frameworks for assessing multiparty social behavior, as well as datasets and models for generating high-quality 3D gestures and facial motions. These advancements have the potential to improve applications in virtual reality, computer graphics, and human-computer interaction. Noteworthy papers include:
- A study introducing a unified framework for evaluating multiparty social behavior, which delivers orthogonal insights into spatial structure, timing alignment, and behavioural variability.
- A paper presenting a new dataset for multidimensional quality assessment of audio-to-3D gesture generation, which achieves state-of-the-art performance on the proposed dataset.
- A work introducing a benchmark for expressive 4D facial motion generation, which provides a rich and extensible dataset for future research.