The field of affective computing is witnessing significant advancements with the integration of multimodal analysis and large language models. Researchers are moving towards trimodal and multimodal approaches to analyze complex emotional cues, leveraging speech, text, and visual data to improve emotion recognition accuracy. Longitudinal analysis and multi-task learning are becoming essential techniques to model temporal changes and comorbid conditions, enabling a more comprehensive understanding of emotional states.Noteworthy papers include the introduction of K-EVER^2, a knowledge-enhanced framework for visual emotion reasoning and retrieval, which achieves up to a 19% accuracy gain for specific emotions. Another notable work is the EmoArt dataset, a large-scale, fine-grained emotional dataset for emotion-aware artistic generation, containing 132,664 artworks across 56 painting styles.
Multimodal Affective Analysis and Emotion Recognition
Sources
Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025