Advances in Vision-Language Model Security

The field of vision-language models is rapidly evolving, with a growing focus on security and robustness. Recent research has highlighted the vulnerability of these models to backdoor attacks, data poisoning, and other malicious threats. To address these challenges, researchers are developing innovative defense strategies, including regularization techniques, adversarial training, and safety alignment methods. These approaches aim to improve the robustness of vision-language models against various types of attacks, while preserving their performance on benign inputs. Notably, some papers have introduced new benchmarks and evaluation frameworks to assess the security of vision-language models, providing a more comprehensive understanding of their vulnerabilities. Overall, the field is moving towards developing more secure and reliable vision-language models that can be deployed in real-world applications. Noteworthy papers include: Robust Anti-Backdoor Instruction Tuning in LVLMs, which introduces a lightweight defense framework to prevent backdoor attacks in large visual language models. TED-LaST, which proposes a novel defense strategy against adaptive backdoor attacks in deep neural networks. ALKALI, which exposes a critical geometric blind spot in alignment and introduces a comprehensive adversarial benchmark for evaluating the security of large language models.

Sources

Robust Anti-Backdoor Instruction Tuning in LVLMs

Coordinated Robustness Evaluation Framework for Vision-Language Models

Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints

SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

TED-LaST: Towards Robust Backdoor Defense Against Adaptive Attacks

Built with on top of