Advances in Mitigating Bias in AI Systems

The field of artificial intelligence is moving towards developing more fair and unbiased systems. Recent research has focused on identifying and mitigating biases in language models, retrieval-augmented generation, and other AI systems. Studies have shown that biases can be introduced through various means, including data poisoning, prompt injection, and unfairness in tool selection. To address these issues, researchers have proposed novel debiasing methods, such as BiasUnlearn, Open-DeBias, and BiasFreeBench, which have demonstrated promising results in reducing biases while preserving language modeling capabilities. Noteworthy papers include Open-DeBias, which introduces a comprehensive benchmark for evaluating biases across a wide range of categories and subgroups, and BiasUnlearn, which proposes a novel model debiasing framework that achieves targeted debiasing via dual-pathway unlearning mechanisms.

Sources

Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes

Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor Attacks

Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models

humancompatible.detect: a Python Toolkit for Detecting Bias in AI Models

HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment

Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset

Beyond Genre: Diagnosing Bias in Music Embeddings Using Concept Activation Vectors

Mitigating Biases in Language Models via Bias Unlearning

Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches

MGen: Millions of Naturally Occurring Generics in Context

Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models

Generating Difficult-to-Translate Texts

Deconstructing Self-Bias in LLM-generated Translation Benchmarks

Searching for Difficult-to-Translate Test Examples at Scale

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Hearing the Order: Investigating Selection Bias in Large Audio-Language Models

Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset

Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs

Bias beyond Borders: Global Inequalities in AI-Generated Music

The Current State of AI Bias Bounties: An Overview of Existing Programmes and Research