Privacy Risks in Large Language Models

The field of large language models is moving towards a greater emphasis on privacy auditing and membership inference attacks. Researchers are developing innovative methods to identify and mitigate privacy risks in these models, including differential privacy and verifiable rewards. A key direction is the development of practical auditing tools that can efficiently estimate empirical privacy guarantees. Another area of focus is the design of membership inference attacks that can reliably infer training data exposure through behavioral traces. These advances have significant implications for the development of secure and private large language models. Noteworthy papers include:

  • Tight and Practical Privacy Auditing for Differentially Private In-Context Learning, which presents a tight and efficient privacy auditing framework for DP-ICL systems.
  • GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards, which proposes a novel membership inference framework specifically designed for RLVR.
  • Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm, which introduces a knowledge distillation-based MIA paradigm to improve attack performance.
  • Effective Code Membership Inference for Code Completion Models via Adversarial Prompts, which proposes a method combining code-specific adversarial perturbations with deep learning to capture nuanced memorization patterns.

Sources

Tight and Practical Privacy Auditing for Differentially Private In-Context Learning

GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

Membership Inference Attack against Large Language Model-based Recommendation Systems: A New Distillation-based Paradigm

Effective Code Membership Inference for Code Completion Models via Adversarial Prompts

Built with on top of