Advances in Machine Unlearning and Knowledge Tracing

The field of machine learning is moving towards developing more robust and controllable models, with a focus on machine unlearning and knowledge tracing. Researchers are exploring new methods to remove specific pieces of knowledge or data from trained models, while maintaining their performance on standard tasks. This includes developing techniques for post-hoc unlearning, such as anchored optimization and knowledge-tracing machine unlearning. Additionally, there is a growing interest in understanding how learning-time choices in knowledge encoding impact the effectiveness of unlearning. Studies have also shown that large language models can generate reversible sentence embeddings, allowing for exact reconstruction of original text. Furthermore, researchers are working on developing more robust verification mechanisms for machine unlearning, including behavioral and parametric verification methods. Noteworthy papers in this area include: Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs, which discovers that unlearning leaves behind persistent fingerprints in LLMs. Train Once, Forget Precisely: Anchored Optimization for Efficient Post-Hoc Unlearning, which introduces a theoretically grounded framework for post-hoc unlearning in deep image classifiers. Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings, which observes that LLMs can generate reversible sentence embeddings that allow for exact reconstruction of original text.

Sources

Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models

Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs

A Regret Perspective on Online Selective Generation

RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

Train Once, Forget Precisely: Anchored Optimization for Efficient Post-Hoc Unlearning

Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Learning-Time Encoding Shapes Unlearning in LLMs

Towards Reliable Forgetting: A Survey on Machine Unlearning Verification, Challenges, and Future Directions

The Compositional Architecture of Regret in Large Language Models

Built with on top of