The field of speech emotion recognition is moving towards more robust and fine-grained modelling of human emotions. Researchers are developing innovative methods to address challenges such as class imbalance, emotion ambiguity, and missing modalities. New datasets and frameworks are being introduced to improve the accuracy and reliability of speech emotion recognition models. Noteworthy papers include EmoNet-Voice, which introduces a large-scale pre-training dataset and a novel benchmark dataset for speech emotion detection, and MEDUSA, which proposes a multimodal deep fusion multi-stage training framework for speech emotion recognition in naturalistic conditions. Additionally, the development of frameworks such as CIDer and high-performance systems for emotional attribute prediction are also notable.