Advances in Text Classification and Data Augmentation

The field of natural language processing is witnessing significant advancements in text classification and data augmentation techniques. Recent research has focused on improving the performance of pre-trained language models by fine-tuning them on expanded datasets and employing strategies such as hard-voting and dynamic learning rates. Additionally, the use of data augmentation techniques, including text augmentation and imbalance handling, has shown promise in enhancing model performance, particularly in domains with limited data. These innovations have the potential to create more robust and scalable solutions for various applications, including scientific text classification, food hazard detection, and wildlife trafficking identification. Noteworthy papers in this area include:

  • A study that achieved state-of-the-art results in scientific text classification by fine-tuning pre-trained language models on an expanded dataset, demonstrating the effectiveness of dataset augmentation and hard-voting strategies.
  • Research that proposed a cost-effective approach to identifying wildlife trafficking in online marketplaces using large language models to generate pseudo labels, achieving up to 95% F1 score at a lower cost.

Sources

Advancing Scientific Text Classification: Fine-Tuned Models with Dataset Expansion and Hard-Voting

BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification

A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces

Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss

Built with on top of