Advances in Multimodal Medical Imaging and Vision-Language Models

The field of medical imaging and vision-language models is rapidly advancing, with a focus on improving compositional generalization, zero-shot learning, and cross-modal representation learning. Recent research has introduced new benchmarks, such as CrossMed, and models like VoxTell, which demonstrate state-of-the-art performance in medical image segmentation and classification tasks. Additionally, techniques like prompt tuning and debiasing have shown promise in mitigating spurious biases and improving model robustness. Noteworthy papers include CrossMed, which evaluates compositional generalization in medical multimodal models, and VoxTell, which achieves state-of-the-art zero-shot performance in medical image segmentation. Other notable works include Doubly Debiased Test-Time Prompt Tuning, which mitigates prompt optimization bias, and OAD-Promoter, which enhances zero-shot VQA using large language models.

Sources

CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging

VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation

Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models

Multimodal ML: Quantifying the Improvement of Calorie Estimation Through Image-Text Pairs

Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging Benchmarks

OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description

Medical Knowledge Intervention Prompt Tuning for Medical Image Classification

SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias

LINGUAL: Language-INtegrated GUidance in Active Learning for Medical Image Segmentation

LSP-YOLO: A Lightweight Single-Stage Network for Sitting Posture Recognition on Embedded Devices

XAttn-BMD: Multimodal Deep Learning with Cross-Attention for Femoral Neck Bone Mineral Density Estimation

WaveFuse-AL: Cyclical and Performance-Adaptive Multi-Strategy Active Learning for Medical Images

Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval

LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets

ELPO: Ensemble Learning Based Prompt Optimization for Large Language Models

NutriScreener: Retrieval-Augmented Multi-Pose Graph Attention Network for Malnourishment Screening