Advances in Multimodal Fact-Checking and Hallucination Detection

The field of multimodal fact-checking and hallucination detection is rapidly evolving, with a focus on developing more robust and accurate methods for verifying the authenticity of multimedia content. Recent research has highlighted the importance of considering both visual and textual information when evaluating the factuality of a claim, as well as the need for more fine-grained analysis of hallucinations in large vision-language models. Noteworthy papers in this area include MM-FusionNet, which introduces a context-aware dynamic fusion module for multi-modal fake news detection, and SHALE, a scalable benchmark for fine-grained hallucination evaluation in large vision-language models. These advancements have significant implications for the development of more reliable and trustworthy multimodal fact-checking systems.

Sources

MM-FusionNet: Context-Aware Dynamic Fusion for Multi-modal Fake News Detection with Large Vision-Language Models

FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification

Semi-automated Fact-checking in Portuguese: Corpora Enrichment using Retrieval with Claim extraction

What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?

ContextGuard-LVLM: Enhancing News Veracity through Fine-grained Cross-modal Contextual Consistency Verification

Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities

Bridging Semantic Logic Gaps: A Cognition-Inspired Multimodal Boundary-Preserving Network for Image Manipulation Localization

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media

A Context-aware Attention and Graph Neural Network-based Multimodal Framework for Misogyny Detection

Understanding Dementia Speech Alignment with Diffusion-Based Image Generation

UWBa at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs

XFacta: Contemporary, Real-World Dataset and Evaluation for Multimodal Misinformation Detection with Multimodal LLMs

HiFACTMix: A Code-Mixed Benchmark and Graph-Aware Model for EvidenceBased Political Claim Verification in Hinglish

RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

DiFaR: Enhancing Multimodal Misinformation Detection with Diverse, Factual, and Relevant Rationales

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences