Advances in Machine Unlearning for Large Language Models

The field of machine unlearning for large language models is rapidly evolving, with a focus on developing innovative methods to eliminate unwanted knowledge and capabilities while preserving model utility. Recent research has highlighted the importance of evaluating unlearning effectiveness and the need for more robust and reliable evaluation frameworks.

One of the key directions in this area is the development of bi-level optimization approaches, which model the hierarchical structure of the unlearning problem and prioritize the forget problem over the retain problem. Another important area of research is the use of distillation techniques to robustify unlearning, which has shown promising results in reducing the computational cost and improving the effectiveness of unlearning methods.

The use of label-only dataset inference frameworks, such as CatShift, has also been proposed to identify dataset membership without relying on internal model logits. Additionally, researchers have explored the use of guided unlearning and retention via data attribution, which aims to mitigate unintended forgetting and preserve valuable information.

Some notable papers in this area include: Do LLMs Really Forget, which proposes a knowledge unlearning evaluation framework to more accurately capture the implicit structure of real-world knowledge. Distillation Robustifies Unlearning, which introduces a scalable method called Unlearn-Noise-Distill-on-Outputs (UNDO) that distills an unlearned model into a partially noised copy of itself. GUARD, which proposes a novel framework for guided unlearning and retention via data attribution, assigning adaptive unlearning weights to samples to mitigate unintended losses in retention.

Sources

Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

Distillation Robustifies Unlearning

Info-Coevolution: An Efficient Framework for Data Model Coevolution

BLUR: A Bi-Level Optimization Approach for LLM Unlearning

SoK: Machine Unlearning for Large Language Models

ErrorEraser: Unlearning Data Bias for Improved Continual Learning

Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Built with on top of