Safety Evaluation and Bias Mitigation in Large Language Models

The field of Large Language Models (LLMs) is moving towards a greater emphasis on safety evaluation and bias mitigation. Researchers are recognizing the need to assess the potential risks associated with LLMs, including toxicity, bias, and fairness. Recent studies have explored various aspects of LLM safety evaluation, such as the development of comprehensive frameworks for evaluating demographic biases and the creation of benchmarks to test safeguard robustness. The importance of prioritizing LLM safety evaluation is being highlighted, particularly in high-stake settings like AI-based recruitment. Noteworthy papers in this area include The Scales of Justitia, which provides a systematic overview of recent advancements in LLMs safety evaluation, and The Biased Samaritan, which introduces a novel method for evaluating demographic biases in generative AI models. Additionally, papers like FORTRESS and Gender Inclusivity Fairness Index have made significant contributions to the development of evaluation tools and metrics for assessing LLM safety and bias.

Sources

The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs

The Biased Samaritan: LLM biases in Perceived Kindness

Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment

Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics

Which Humans? Inclusivity and Representation in Human-Centered AI

FORTRESS: Frontier Risk Evaluation for National Security and Public Safety

Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models

Gender-Neutral Machine Translation Strategies in Practice

Built with on top of