The field of speech analysis and emotion recognition is moving towards more comprehensive and robust representations of human affect. Researchers are exploring new approaches to capture both discrete and dimensional emotion modelling, leading to more accurate and empathetic multimodal reasoning. The integration of physiological information, such as phonation excitation and articulatory kinematics, is also being investigated to enhance speech emotion recognition. Furthermore, the development of AI-driven assessment models for public speaking skills is becoming increasingly important, with a focus on providing personalized and scalable feedback. Noteworthy papers include:
- MERaLiON-SER, a robust speech emotion recognition model that surpasses open-source speech encoders and large Audio-LLMs, and
- Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics, which introduces a portrayed emotional dataset and explores the potential of physiological information for SER.