Progress in Test-Time Adaptation for Vision-Language Models

The field of vision-language models is advancing rapidly, with a focus on improving performance under distribution shifts and real-world scenarios. Researchers are exploring innovative methods for test-time adaptation, including continual-temporal test-time adaptation, risk monitoring, and calibrated foundation models. These approaches aim to enhance the reliability and robustness of vision-language models in various applications, such as image classification and medical image tasks. Noteworthy papers in this area include BayesTTA, which proposes a Bayesian adaptation framework for continual-temporal test-time adaptation, and StaRFM, which introduces a unified framework for calibrated and robust foundation models. GS-Bias is another notable work, presenting an efficient and effective test-time adaptation paradigm that incorporates global and spatial biases. Overall, these developments are pushing the boundaries of vision-language models and enabling more accurate and reliable performance in real-world scenarios. Notable papers: BayesTTA consistently outperforms state-of-the-art methods in continual-temporal test-time adaptation. StaRFM shows consistent performance gains in vision-language and medical image tasks, with improved calibration and robustness. GS-Bias achieves state-of-the-art performance on 15 benchmark datasets while requiring minimal computational resources.

Progress in Test-Time Adaptation for Vision-Language Models

Sources