The field of medical diagnosis is witnessing a significant shift towards the integration of multimodal data, including electronic health records, medical imaging, and wearable sensor streams. This trend is driven by the need to improve diagnosis accuracy and patient outcomes. Researchers are exploring innovative approaches to fuse heterogeneous data sources, addressing challenges such as modality missingness, noise, and temporal asynchrony. Notably, self-supervised learning and contrastive learning are being leveraged to improve predictive models and robustly integrate asynchronous and incomplete multimodal data. The development of patient-centric multi-modal heterogeneous graphs and disease correlation-guided attention layers is also showing promise. These advancements have the potential to enable more accurate and personalized diagnosis, ultimately leading to better patient care. Noteworthy papers include: DAFTED, which proposes an asymmetric fusion strategy for cardiac hypertension diagnosis, achieving an AUC over 90%. VL-RiskFormer, a hierarchical stacked visual-language multimodal Transformer that predicts individual health risks for chronic diseases with an average AUROC of 0.90.
Multimodal Fusion for Enhanced Medical Diagnosis
Sources
Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture
Clinical Multi-modal Fusion with Heterogeneous Graph and Disease Correlation Learning for Multi-Disease Prediction
DAFTED: Decoupled Asymmetric Fusion of Tabular and Echocardiographic Data for Cardiac Hypertension Diagnosis