The field of medical image analysis is rapidly advancing with the integration of multimodal information, including text reports and images. Recent developments have focused on improving the accuracy and robustness of medical image classification and retrieval systems. Researchers are exploring the use of large language models to generate visual concepts and enhance continual learning, as well as developing more efficient and effective methods for multimodal feature fusion and cross-modal attention. Notable papers in this area include Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification, which introduces a two-stage patch selection module and a hybrid multimodal fusion module to improve diagnostic performance. Another notable paper is Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval, which proposes a framework that introduces multi-level prototypes for each modality to better capture semantic variability and enhance retrieval robustness. These advancements have the potential to significantly improve the accuracy and reliability of medical image analysis systems, leading to better patient outcomes and more effective disease diagnosis and treatment.
Advances in Multimodal Medical Image Analysis
Sources
NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding
Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification