Multimodal Misinformation Detection and Explanation

The field of multimodal misinformation detection is rapidly advancing, with a focus on developing more accurate and explainable models. Recent research has highlighted the importance of integrating multiple modalities, such as text, images, and videos, to improve detection performance. Additionally, there is a growing emphasis on providing transparent and trustworthy explanations for model predictions, which is crucial for building trust in AI systems.

Notable papers in this area include Debunk and Infer, which proposes a multimodal fake news detection framework that leverages debunking knowledge to enhance performance and interpretability. Another noteworthy paper is Towards Explainable Bilingual Multimodal Misinformation Detection and Localization, which introduces a bilingual multimodal framework that jointly performs region-level localization, cross-modal and cross-lingual consistency detection, and natural language explanation for misinformation analysis.

These advancements have significant implications for the development of more effective misinformation detection systems, which can help to mitigate the spread of false information online. Overall, the field is moving towards more comprehensive and explainable models that can effectively detect and mitigate multimodal misinformation.

Sources

Debunk and Infer: Multimodal Fake News Detection via Diffusion-Generated Evidence and LLM Reasoning

A Decade of News Forum Interactions: Threaded Conversations, Signed Votes, and Topical Tags

Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment

MisinfoTeleGraph: Network-driven Misinformation Detection for German Telegram Messages

Towards Explainable Bilingual Multimodal Misinformation Detection and Localization

EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations

HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction

MVP: Winning Solution to SMP Challenge 2025 Video Track

Embedding-based Retrieval in Multimodal Content Moderation

Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Built with on top of