Defending Vision-Language Models Against Adversarial Attacks

The field of vision-language models is moving towards developing more robust defense mechanisms against adversarial attacks. Recent research has focused on creating innovative methods to detect and mitigate such attacks, which can have a significant impact on the reliability of these models in real-world applications. One of the key directions is the development of techniques that can effectively neutralize adversarial corruptions in vision-language models, such as diffusion-based purification strategies. Another area of focus is the creation of defense mechanisms that can detect and recover from perturbations, including those that utilize reinforcement learning and semantic analysis. These advancements have the potential to significantly improve the security and trustworthiness of vision-language models. Noteworthy papers in this area include:

  • LightD, which proposes a novel framework for generating natural adversarial samples for vision-language pretraining models via semantically guided relighting.
  • DiffCAP, which introduces a diffusion-based purification strategy that can effectively neutralize adversarial corruptions in vision-language models.
  • SRD, which presents a reinforcement learning framework that mitigates backdoor behavior in vision-language models without prior knowledge of triggers.

Sources

Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models

AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders

DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models

BESA: Boosting Encoder Stealing Attack with Perturbation Recovery

SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs

Built with on top of