Multimodal Affective Analysis and Emotion Recognition

The field of affective computing is witnessing significant advancements with the integration of multimodal analysis and large language models. Researchers are moving towards trimodal and multimodal approaches to analyze complex emotional cues, leveraging speech, text, and visual data to improve emotion recognition accuracy. Longitudinal analysis and multi-task learning are becoming essential techniques to model temporal changes and comorbid conditions, enabling a more comprehensive understanding of emotional states.Noteworthy papers include the introduction of K-EVER^2, a knowledge-enhanced framework for visual emotion reasoning and retrieval, which achieves up to a 19% accuracy gain for specific emotions. Another notable work is the EmoArt dataset, a large-scale, fine-grained emotional dataset for emotion-aware artistic generation, containing 132,664 artworks across 56 painting styles.

Sources

Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction

KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval

MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025

INESC-ID @ eRisk 2025: Exploring Fine-Tuned, Similarity-Based, and Prompt-Based Approaches to Depression Symptom Identification

EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation

Are Lexicon-Based Tools Still the Gold Standard for Valence Analysis in Low-Resource Flemish?

Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction