Multimodal Safety and Jailbreaking in Large Language Models

The field of multimodal large language models is rapidly advancing, with a focus on improving safety and robustness against adversarial attacks. Recent research has highlighted the vulnerabilities of these models to jailbreaks, which can be exploited using cross-modal attacks. The development of new methods and frameworks, such as those utilizing sequential comics and multimodal tree search, has shown promise in improving safety alignment and detecting risks. Additionally, the exploration of variational inference frameworks and perceptually simple transformations has revealed severe vulnerabilities in current models. Noteworthy papers include: Sequential Comics for Jailbreaking Multimodal Large Language Models, which introduces a novel method for circumventing safety alignments in state-of-the-art models. VisuoAlign, which proposes a framework for multi-modal safety alignment via prompt-guided tree search. IAD-GPT, which explores the combination of rich text semantics with image-level and pixel-level information for industrial anomaly detection. Multimodal Safety Is Asymmetric, which investigates jailbreaks in the text-vision multimodal setting and develops a black-box jailbreak method. VERA-V, which introduces a variational inference framework for jailbreaking vision-language models. Style Attack Disguise, which proposes a style-based attack that exploits the human-model perception gap. Beyond Text, which presents a systematic study of multimodal jailbreaks targeting both vision-language and audio-language models.

Multimodal Safety and Jailbreaking in Large Language Models

Sources