The field of Natural Language Processing (NLP) is moving towards a greater emphasis on detecting and mitigating biases in language models. Recent studies have highlighted the importance of evaluating language models for demographic-targeted social biases and developing scalable bias-detection methods. The use of large language models has also been shown to perpetuate harmful stereotypes and biases, particularly in low-resource languages and culturally diverse contexts. To address these issues, researchers are proposing new evaluation frameworks, datasets, and methods for bias detection and mitigation, such as fine-tuning models to reflect desired distributions and using contrastive learning to capture fine-grained bias. Notable papers in this area include KurdSTS, which presents a Kurdish semantic textual similarity dataset, and IndiCASA, which introduces a dataset and bias evaluation framework for large language models in the Indian context. Other notable papers include Evaluating LLMs for Demographic-Targeted Social Bias Detection, which presents a comprehensive evaluation framework for assessing the ability of large language models to detect demographic-targeted social biases, and LLM Bias Detection and Mitigation through the Lens of Desired Distributions, which proposes a weighted adaptive loss-based fine-tuning method for aligning language models with desired distributions.
Advances in Bias Detection and Mitigation in NLP
Sources
IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context
What is a protest anyway? Codebook conceptualization is still a first-order concern in LLM-era classification
Surgeons Are Indian Males and Speech Therapists Are White Females: Auditing Biases in Vision-Language Models for Healthcare Professionals