Advances in Adversarial Robustness and Multimodal Understanding

The field of multimodal research is moving towards improving the robustness and understanding of models in the presence of adversarial attacks and complex multimodal data. Recent developments have focused on enhancing the security and reliability of vision-language models, with a particular emphasis on targeted adversarial attacks and multimodal hate detection. Noteworthy papers include:

Semantically Guided Adversarial Testing of Vision Models Using Language Models, which proposes a semantics-guided framework for adversarial target selection using cross-modal knowledge transfer from pretrained language and vision-language models.
TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks, which introduces a tri-level quantization-aware defense framework designed to disrupt the transferability of patch-based adversarial attacks across QNNs.

Sources

Semantically Guided Adversarial Testing of Vision Models Using Language Models

Mitigating Category Imbalance: Fosafer System for the Multimodal Emotion and Intent Joint Understanding Challenge

Labels or Input? Rethinking Augmentation in Multimodal Hate Detection

TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks

Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations

Checkmate: interpretable and explainable RSVQA is the endgame

Mitigating Easy Option Bias in Multiple-Choice Question Answering

Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance

GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks

On Evaluating the Adversarial Robustness of Foundation Models for Multimodal Entity Linking