Advances in Multimodal Safety and Security

The field of multimodal research is moving towards enhancing the safety and security of vision-language models (VLMs) and large language models (LLMs). Recent studies have focused on developing innovative methods to address the vulnerabilities of these models, including the use of security tensors, self-aware safety augmentation, and iterative defense-attack training. These approaches aim to improve the robustness of VLMs and LLMs against harmful inputs, such as jailbreak images and typographic prompt injection attacks. Noteworthy papers in this area include:

  • CircuitProbe, which introduces a systematic framework to investigate spatiotemporal visual semantics in LVLMs.
  • Rainbow Noise, which proposes a robustness benchmark for multimodal harmful-meme detectors and introduces a lightweight Text Denoising Adapter to enhance model resilience.
  • Security Tensors as a Cross-Modal Bridge, which introduces security tensors to transfer textual safety alignment to visual processing in LVLMs.
  • Self-Aware Safety Augmentation, which proposes a technique to leverage internal semantic understanding to enhance safety recognition in VLMs.
  • Secure Tug-of-War, which presents an iterative defense-attack training method to enhance the security of MLLMs.
  • Invisible Injections, which demonstrates the vulnerability of VLMs to steganographic prompt injection attacks.
  • CapRecover, which proposes a cross-modality feature inversion attack framework to recover high-level semantic content from intermediate features.
  • Adversarial-Guided Diffusion, which introduces an approach to generate adversarial images to deceive MLLMs.

Sources

CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content

Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models

Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM

Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security

Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models

Adversarial-Guided Diffusion for Multimodal LLM Attacks

Built with on top of