Advances in Medical Imaging Analysis

The field of medical imaging analysis is rapidly advancing, with a focus on developing innovative and effective methods for image classification, segmentation, and retrieval. Recent research has explored the use of self-supervised learning, multimodal fusion, and vision-language modeling to improve the accuracy and efficiency of medical imaging analysis. These approaches have shown promising results in various applications, including disease detection, tumor segmentation, and image captioning. Notably, the development of large-scale datasets and benchmarks has facilitated the evaluation and comparison of different methods, driving progress in the field.

Some noteworthy papers in this area include: M3Ret, which presents a unified visual encoder for multimodal medical image retrieval, achieving state-of-the-art performance in zero-shot image-to-image retrieval across various modalities. MedVista3D, which introduces a multi-scale semantic-enriched vision-language pretraining framework for 3D CT analysis, demonstrating state-of-the-art performance in zero-shot disease classification, report retrieval, and medical visual question answering. CLAPS, which proposes a CLIP-unified auto-prompt segmentation method for multi-modal retinal imaging, achieving performance on par with specialized expert models and surpassing existing benchmarks across most metrics.

Sources

Self-supervised large-scale kidney abnormality detection in drug safety assessment studies

A Multimodal Head and Neck Cancer Dataset for AI-Driven Precision Oncology

Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image Classification

Generalizable Self-supervised Monocular Depth Estimation with Mixture of Low-Rank Experts for Diverse Endoscopic Scenes

M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision

Unified Supervision For Vision-Language Modeling in 3D Computed Tomography

TransForSeg: A Multitask Stereo ViT for Joint Stereo Segmentation and 3D Force Estimation in Catheterization

A Multimodal and Multi-centric Head and Neck Cancer Dataset for Tumor Segmentation and Outcome Prediction

Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations

MedVista3D: Vision-Language Modeling for Reducing Diagnostic Errors in 3D CT Disease Detection, Understanding and Reporting

SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training

CLAPS: A CLIP-Unified Auto-Prompt Segmentation for Multi-Modal Retinal Imaging

Built with on top of