Emotion Recognition and Speech Analysis

The field of speech analysis and emotion recognition is moving towards more comprehensive and robust representations of human affect. Researchers are exploring new approaches to capture both discrete and dimensional emotion modelling, leading to more accurate and empathetic multimodal reasoning. The integration of physiological information, such as phonation excitation and articulatory kinematics, is also being investigated to enhance speech emotion recognition. Furthermore, the development of AI-driven assessment models for public speaking skills is becoming increasingly important, with a focus on providing personalized and scalable feedback. Noteworthy papers include:

  • MERaLiON-SER, a robust speech emotion recognition model that surpasses open-source speech encoders and large Audio-LLMs, and
  • Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics, which introduces a portrayed emotional dataset and explores the potential of physiological information for SER.

Sources

EMO100DB: An Open Dataset of Improvised Songs with Emotion Data

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages

Enhancing Public Speaking Skills in Engineering Students Through AI

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics

The Dynamic Articulatory Model DYNARTmo: Dynamic Movement Generation and Speech Gestures

Built with on top of