Advances in Model Protection and Adversarial Robustness

The field of model protection and adversarial robustness is rapidly evolving, with a focus on developing innovative methods to safeguard machine learning models against various threats. Recent research has explored new approaches to watermarking, including sequential watermarking and modular token-rank partitioning, which have shown promising results in protecting model ownership and preventing unauthorized use. Additionally, there has been significant progress in improving adversarial robustness, with techniques such as manifold purification and directional orthogonal counterattacks demonstrating improved performance against various types of attacks. Noteworthy papers in this area include SWAP, which proposes a novel sequential watermarking method for soft prompts, and WaterMod, which introduces a modular token-rank partitioning approach for probability-balanced LLM watermarking. Other notable papers include Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification, which presents a new method for improving adversarial robustness in text classification, and Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference, which proposes a novel approach to generating counterattacks for improving robustness in vision-language models.

Sources

SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking

WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking

Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification

Theoretical Analysis of Power-law Transformation on Images for Text Polarity Detection

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

How do data owners say no? A case study of data consent mechanisms in web-scraped vision-language AI training datasets

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

AuthSig: Safeguarding Scanned Signatures Against Unauthorized Reuse in Paperless Workflows

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference

Unveiling Hidden Threats: Using Fractal Triggers to Boost Stealthiness of Distributed Backdoor Attacks in Federated Learning

AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness

Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy