Advances in Multimodal Large Language Model Safety

The field of multimodal large language models (MLLMs) is rapidly evolving, with a growing focus on safety evaluation and mitigation strategies. Recent research has emphasized the need for comprehensive safety benchmarks and innovative approaches to detect harmful queries and mitigate their impact. Notably, the development of unified safety benchmarks and efficient detection methods has improved the accuracy and robustness of MLLM safety evaluation. Furthermore, researchers have explored the importance of reducing the modality gap between image and text representations to enhance safety alignment in vision-language models (VLMs). Additionally, the discovery of visual stitching abilities in VLMs has highlighted the potential risks of data poisoning and the need for robust safety measures. Overall, the field is moving towards more holistic and innovative approaches to safety evaluation and mitigation, with a growing emphasis on robustness, interpretability, and generalizability. Noteworthy papers include: OMNIGUARD, which proposes an efficient approach for AI safety moderation across modalities, and HoliSafe, which introduces a holistic safety benchmarking and modeling approach with a safety meta token for VLMs.

Sources

USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models

OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities

Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert

Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

VLMs Can Aggregate Scattered Training Patches

Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning

HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model

Built with on top of