Multimodal Safety and Jailbreaking in Large Language Models

The field of multimodal large language models is rapidly advancing, with a focus on improving safety and robustness against adversarial attacks. Recent research has highlighted the vulnerabilities of these models to jailbreaks, which can be exploited using cross-modal attacks. The development of new methods and frameworks, such as those utilizing sequential comics and multimodal tree search, has shown promise in improving safety alignment and detecting risks. Additionally, the exploration of variational inference frameworks and perceptually simple transformations has revealed severe vulnerabilities in current models. Noteworthy papers include: Sequential Comics for Jailbreaking Multimodal Large Language Models, which introduces a novel method for circumventing safety alignments in state-of-the-art models. VisuoAlign, which proposes a framework for multi-modal safety alignment via prompt-guided tree search. IAD-GPT, which explores the combination of rich text semantics with image-level and pixel-level information for industrial anomaly detection. Multimodal Safety Is Asymmetric, which investigates jailbreaks in the text-vision multimodal setting and develops a black-box jailbreak method. VERA-V, which introduces a variational inference framework for jailbreaking vision-language models. Style Attack Disguise, which proposes a style-based attack that exploits the human-model perception gap. Beyond Text, which presents a systematic study of multimodal jailbreaks targeting both vision-language and audio-language models.

Sources

Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling

VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search

IAD-GPT: Advancing Visual Knowledge in Multimodal Large Language Model for Industrial Anomaly Detection

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Built with on top of