Advancements in Fairness and Bias Mitigation in Large Language Models

The field of natural language processing is witnessing a significant shift towards addressing fairness and bias concerns in large language models. Recent developments indicate a growing focus on detecting and mitigating fairness violations, particularly in relation to intersectional biases and dialect-based quality-of-service harms. Researchers are exploring innovative approaches to fairness testing, including systematic test generation and quantification methods, to improve the accuracy and reliability of fairness evaluations. These advancements have the potential to contribute to the development of more equitable and inclusive language models. Noteworthy papers in this area include: GenFair, which proposes a metamorphic fairness testing framework to detect fairness violations in large language models. Quantifying Query Fairness Under Unawareness, which introduces a robust fairness estimator to handle multiple sensitive attributes and establish a reliable protocol for measuring fairness under unawareness. A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms, which presents a framework for auditing LLM-based chatbots and reveals significant dialect-based quality-of-service harms in a widely-used chatbot.

Sources

GenFair: Systematic Test Generation for Fairness Fault Detection in Large Language Models

Quantifying Query Fairness Under Unawareness

A Framework for Auditing Chatbots for Dialect-Based Quality-of-Service Harms

Built with on top of