Advances in Large Language Model Unlearning and Fairness

The field of large language models (LLMs) is rapidly evolving, with a growing focus on unlearning and fairness. Recent research has highlighted the importance of removing sensitive or harmful content from LLMs, while preserving their overall utility. Several innovative approaches have been proposed, including attention-shifting frameworks, context-aware unlearning, and controllable machine unlearning via gradient pivoting. These methods aim to balance the trade-off between unlearning efficacy and model fidelity, and have shown promising results in experiments. Noteworthy papers in this area include Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting, which introduces a novel attention-shifting framework for selective unlearning, and Context-aware Fairness Evaluation and Mitigation in LLMs, which proposes a dynamic, reversible, pruning-based framework for detecting and mitigating bias in LLMs. Overall, the field is moving towards developing more robust and fair LLMs that can adapt to changing contexts and requirements.

Sources

The Right to Be Remembered: Preserving Maximally Truthful Digital Memory in the Age of AI

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning

Forget to Know, Remember to Use: Context-Aware Unlearning for Large Language Models

Context-aware Fairness Evaluation and Mitigation in LLMs

Controllable Machine Unlearning via Gradient Pivoting

LLM Unlearning with LLM Beliefs

Graph Unlearning Meets Influence-aware Negative Preference Optimization

LEGO: A Lightweight and Efficient Multiple-Attribute Unlearning Framework for Recommender Systems

Built with on top of