The field of multimodal research is moving towards enhancing the safety and security of vision-language models (VLMs) and large language models (LLMs). Recent studies have focused on developing innovative methods to address the vulnerabilities of these models, including the use of security tensors, self-aware safety augmentation, and iterative defense-attack training. These approaches aim to improve the robustness of VLMs and LLMs against harmful inputs, such as jailbreak images and typographic prompt injection attacks. Noteworthy papers in this area include:
- CircuitProbe, which introduces a systematic framework to investigate spatiotemporal visual semantics in LVLMs.
- Rainbow Noise, which proposes a robustness benchmark for multimodal harmful-meme detectors and introduces a lightweight Text Denoising Adapter to enhance model resilience.
- Security Tensors as a Cross-Modal Bridge, which introduces security tensors to transfer textual safety alignment to visual processing in LVLMs.
- Self-Aware Safety Augmentation, which proposes a technique to leverage internal semantic understanding to enhance safety recognition in VLMs.
- Secure Tug-of-War, which presents an iterative defense-attack training method to enhance the security of MLLMs.
- Invisible Injections, which demonstrates the vulnerability of VLMs to steganographic prompt injection attacks.
- CapRecover, which proposes a cross-modality feature inversion attack framework to recover high-level semantic content from intermediate features.
- Adversarial-Guided Diffusion, which introduces an approach to generate adversarial images to deceive MLLMs.