The field of vision-language models is moving towards developing more robust defense mechanisms against adversarial attacks. Recent research has focused on creating innovative methods to detect and mitigate such attacks, which can have a significant impact on the reliability of these models in real-world applications. One of the key directions is the development of techniques that can effectively neutralize adversarial corruptions in vision-language models, such as diffusion-based purification strategies. Another area of focus is the creation of defense mechanisms that can detect and recover from perturbations, including those that utilize reinforcement learning and semantic analysis. These advancements have the potential to significantly improve the security and trustworthiness of vision-language models. Noteworthy papers in this area include:
- LightD, which proposes a novel framework for generating natural adversarial samples for vision-language pretraining models via semantically guided relighting.
- DiffCAP, which introduces a diffusion-based purification strategy that can effectively neutralize adversarial corruptions in vision-language models.
- SRD, which presents a reinforcement learning framework that mitigates backdoor behavior in vision-language models without prior knowledge of triggers.