The field of model protection and adversarial robustness is rapidly evolving, with a focus on developing innovative methods to safeguard machine learning models against various threats. Recent research has explored new approaches to watermarking, including sequential watermarking and modular token-rank partitioning, which have shown promising results in protecting model ownership and preventing unauthorized use. Additionally, there has been significant progress in improving adversarial robustness, with techniques such as manifold purification and directional orthogonal counterattacks demonstrating improved performance against various types of attacks. Noteworthy papers in this area include SWAP, which proposes a novel sequential watermarking method for soft prompts, and WaterMod, which introduces a modular token-rank partitioning approach for probability-balanced LLM watermarking. Other notable papers include Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification, which presents a new method for improving adversarial robustness in text classification, and Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference, which proposes a novel approach to generating counterattacks for improving robustness in vision-language models.
Advances in Model Protection and Adversarial Robustness
Sources
Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification
How do data owners say no? A case study of data consent mechanisms in web-scraped vision-language AI training datasets