The field of multimodal research is moving towards improving the robustness and understanding of models in the presence of adversarial attacks and complex multimodal data. Recent developments have focused on enhancing the security and reliability of vision-language models, with a particular emphasis on targeted adversarial attacks and multimodal hate detection. Noteworthy papers include:
- Semantically Guided Adversarial Testing of Vision Models Using Language Models, which proposes a semantics-guided framework for adversarial target selection using cross-modal knowledge transfer from pretrained language and vision-language models.
- TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks, which introduces a tri-level quantization-aware defense framework designed to disrupt the transferability of patch-based adversarial attacks across QNNs.