Advances in Multimodal Medical Models

The field of medical imaging and diagnostics is rapidly advancing with the development of multimodal large language models (MLLMs) and vision-language models. These models have shown significant potential in improving disease classification accuracy, medical visual question answering, and diagnostic tasks. Recent studies have focused on integrating multimodal data, such as images and text, to enhance model performance and interpretability. The use of cross-modal attention mechanisms, probabilistic contrastive learning, and multi-task fine-tuning has led to state-of-the-art results in various medical applications. Noteworthy papers include MDF-MLLM, which achieved a 56% improvement in disease classification accuracy, and InfiMed-Foundation, which demonstrated superior performance in medical visual question answering and diagnostic tasks. Overall, the field is moving towards more robust and generalizable models that can handle diverse medical data and tasks.

Sources

MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification

FoodSEM: Large Language Model Specialized in Food Named-Entity Linking

InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Q-FSRU: Quantum-Augmented Frequency-Spectral For Medical Visual Question Answering

TREAT-Net: Tabular-Referenced Echocardiography Analysis for Acute Coronary Syndrome Treatment Prediction

EYE-DEX: Eye Disease Detection and EXplanation System

Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation

MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment

TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models

Saliency Guided Longitudinal Medical Visual Question Answering

FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology

LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

ProbMed: A Probabilistic Framework for Medical Multimodal Binding

EchoingECG: An Electrocardiogram Cross-Modal Model for Echocardiogram Tasks

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

A Multi-purpose Tracking Framework for Salmon Welfare Monitoring in Challenging Environments

CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?

AI-CNet3D: An Anatomically-Informed Cross-Attention Network with Multi-Task Consistency Fine-tuning for 3D Glaucoma Classification

Multimodal Foundation Models for Early Disease Detection

BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals