Advances in Explainable AI and Multimodal Learning

The field of artificial intelligence is moving towards increased transparency and explainability, with a focus on multimodal learning and human-AI collaboration. Recent developments have highlighted the importance of designing and evaluating models that genuinely integrate visual and textual cues, rather than relying on single-modality signals. This shift is driven by the need for more accurate and trustworthy decision-making in high-stakes applications, such as healthcare and finance. Noteworthy papers in this area include ReVise, which introduces a visual analytic workflow for incremental recourse planning, and MultiSHAP, a model-agnostic interpretability framework for explaining cross-modal interactions in multimodal AI models. Additionally, papers like Your Model Is Unfair, Are You Even Aware? and On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI have shed light on the importance of addressing bias and misinformation in AI systems.

Sources

ReVise: A Human-AI Interface for Incremental Algorithmic Recourse

StackLiverNet: A Novel Stacked Ensemble Model for Accurate and Interpretable Liver Disease Detection

On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Your Model Is Unfair, Are You Even Aware? Inverse Relationship Between Comprehension and Trust in Explainability Visualizations of Biased ML Models

Correcting Misperceptions at a Glance: Using Data Visualizations to Reduce Political Sectarianism

MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

Overcoming Algorithm Aversion with Transparency: Can Transparent Predictions Change User Behavior?

Retinal Lipidomics Associations as Candidate Biomarkers for Cardiovascular Health

GlaBoost: A multimodal Structured Framework for Glaucoma Risk Stratification

A Visual Tool for Interactive Model Explanation using Sensitivity Analysis

ML-based Short Physical Performance Battery future score prediction based on questionnaire data

Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions

Built with on top of