Advances in Mitigating Memorization and Bias in Large Language Models

The field of large language models (LLMs) is moving towards addressing critical concerns around memorization, privacy, and bias. Researchers are exploring innovative approaches to mitigate these issues, including the development of new paradigms and frameworks that promote fairness and transparency. A key direction is the isolation of memorized content, making it easier to remove without compromising general language capabilities. Another area of focus is the identification of memorized personal data, enabling the dynamic construction of forget sets for machine unlearning and right-to-be-forgotten requests. Furthermore, there is a growing interest in guiding LLM decision-making with fairness reward models, which can down-weight biased trajectories and favor equitable ones. Noteworthy papers in this area include:

A study introducing a new paradigm called MemSinks that facilitates isolation of memorized content,
Research presenting a model-agnostic metric to quantify human-fact associations in LLMs,
A framework for training a generalizable Fairness Reward Model that enables trustworthy use of reasoning models in high-stakes decision-making.

Advances in Mitigating Memorization and Bias in Large Language Models

Sources