Addressing Bias in Large Language Models

The field of natural language processing is moving towards a deeper understanding of the biases present in large language models. Recent studies have highlighted the need for more nuanced and effective methods to mitigate these biases, which can perpetuate harmful stereotypes and reinforce existing social inequalities. The use of prompting as a fairness intervention has shown promise, but its effects can be highly model-specific and may not always lead to more diverse and representative outputs. Furthermore, the evaluation of bias in language models is a complex task, and the choice of metrics and testing designs can significantly impact the results.

Noteworthy papers in this area include: Prompting Away Stereotypes, which introduces a pilot benchmark for assessing representational societal bias in text-to-image models and highlights the potential of prompting as a fairness intervention. Bias after Prompting, which investigates the transfer of biases from pre-trained models to adapted models and finds that popular prompt-based mitigation methods do not consistently prevent biases from transferring. Measuring Bias or Measuring the Task, which examines how signaling the evaluative purpose of a task impacts measured gender bias in language models and highlights the brittleness of bias evaluations.

Sources

Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations

Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models

The Basic B*** Effect: The Use of LLM-based Agents Reduces the Distinctiveness and Diversity of People's Choices

Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases

Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations

Transforming Fashion with AI: A Comparative Study of Large Language Models in Apparel Design

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models

Benchmarking Gender and Political Bias in Large Language Models

Evaluating and comparing gender bias across four text-to-image models

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

Bias after Prompting: Persistent Discrimination in Large Language Models

Simulating Identity, Propagating Bias: Abstraction and Stereotypes in LLM-Generated Text

Built with on top of