Multimodal Fusion for Enhanced Medical Diagnosis

The field of medical diagnosis is witnessing a significant shift towards the integration of multimodal data, including electronic health records, medical imaging, and wearable sensor streams. This trend is driven by the need to improve diagnosis accuracy and patient outcomes. Researchers are exploring innovative approaches to fuse heterogeneous data sources, addressing challenges such as modality missingness, noise, and temporal asynchrony. Notably, self-supervised learning and contrastive learning are being leveraged to improve predictive models and robustly integrate asynchronous and incomplete multimodal data. The development of patient-centric multi-modal heterogeneous graphs and disease correlation-guided attention layers is also showing promise. These advancements have the potential to enable more accurate and personalized diagnosis, ultimately leading to better patient care. Noteworthy papers include: DAFTED, which proposes an asymmetric fusion strategy for cardiac hypertension diagnosis, achieving an AUC over 90%. VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer that predicts individual health risks for chronic diseases with an average AUROC of 0.90.

Sources

Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture

Clinical Multi-modal Fusion with Heterogeneous Graph and Disease Correlation Learning for Multi-Disease Prediction

DAFTED: Decoupled Asymmetric Fusion of Tabular and Echocardiographic Data for Cardiac Hypertension Diagnosis

Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Built with on top of