Advances in Data Privacy and Security for Machine Learning

The field of machine learning is moving towards a greater emphasis on data privacy and security, with a focus on developing methods to prevent data leakage and ensure the integrity of trained models. Recent research has explored the use of data cartography to identify and mitigate memorization hotspots in generative models, as well as the development of frameworks for analyzing and detecting data forging in machine unlearning. Additionally, there is a growing interest in understanding the learnability of distribution classes in the presence of adaptive adversaries and the vulnerability of test samples to targeted data poisoning attacks. Noteworthy papers in this area include: Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning, which introduces predictive criteria for targeted data poisoning difficulty, The Measure of Deception: An Analysis of Data Forging in Machine Unlearning, which develops a framework for analyzing the phenomenon of data forging, Generative Data Refinement: Just Ask for Better Data, which proposes a framework for using pretrained generative models to transform datasets with undesirable content into refined datasets.

Sources

Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models

On the Learnability of Distribution Classes with Adaptive Adversaries

The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning

How Far Are We from True Unlearnability?

Generative Data Refinement: Just Ask for Better Data

Built with on top of