Advancements in Synthetic Data Generation for Emotion Recognition

The field of emotion recognition is witnessing significant advancements in synthetic data generation, driven by the need for high-quality, diverse emotional datasets. Researchers are exploring innovative methods to generate emotionally rich text using large language models, persona-based conditioning, and contrastive learning approaches. These methods aim to address the challenges posed by the scarcity of high-quality emotional datasets and the subjective nature of emotional expressions. The use of synthetic data has the potential to augment or replace real-world emotional datasets, enabling more accurate and robust emotion recognition models. Noteworthy papers in this area include PersonaGen, which introduces a novel framework for generating emotionally rich text using a large language model through multi-stage persona-based conditioning, and SYNTHIA, which presents a dataset of 30,000 backstories derived from 10,000 real social media users, bridging the spectrum between costly human-curated data and synthetic generation. Additionally, the Penalty-Adjusted Type-Token Ratio (PATTR) metric has been proposed to measure lexical diversity in synthetic texts, providing a more robust and task-specific evaluation method.

Advancements in Synthetic Data Generation for Emotion Recognition

Sources