Advancements in Synthetic Data Generation for Emotion Recognition

The field of emotion recognition is witnessing significant advancements in synthetic data generation, driven by the need for high-quality, diverse emotional datasets. Researchers are exploring innovative methods to generate emotionally rich text using large language models, persona-based conditioning, and contrastive learning approaches. These methods aim to address the challenges posed by the scarcity of high-quality emotional datasets and the subjective nature of emotional expressions. The use of synthetic data has the potential to augment or replace real-world emotional datasets, enabling more accurate and robust emotion recognition models. Noteworthy papers in this area include PersonaGen, which introduces a novel framework for generating emotionally rich text using a large language model through multi-stage persona-based conditioning, and SYNTHIA, which presents a dataset of 30,000 backstories derived from 10,000 real social media users, bridging the spectrum between costly human-curated data and synthetic generation. Additionally, the Penalty-Adjusted Type-Token Ratio (PATTR) metric has been proposed to measure lexical diversity in synthetic texts, providing a more robust and task-specific evaluation method.

Sources

Persona-Based Synthetic Data Generation Using Multi-Stage Conditioning with Large Language Models for Emotion Recognition

Backtranslation and paraphrasing in the LLM era? Comparing data augmentation methods for emotion classification

SYNTHIA: Synthetic Yet Naturally Tailored Human-Inspired PersonAs

A Penalty Goes a Long Way: Measuring Lexical Diversity in Synthetic Texts Under Prompt-Influenced Length Variations

Chinchunmei at SemEval-2025 Task 11: Boosting the Large Language Model's Capability of Emotion Perception using Contrastive Learning

Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs

Built with on top of