Advances in Vision-Language Models and Domain Adaptation

The field of vision-language models is rapidly evolving, with a focus on improving robustness, adaptability, and generalization capabilities. Recent developments have emphasized the importance of leveraging semantic relationships between modalities, addressing error accumulation in unknown sample detection, and enhancing resilience against corruptions. Notably, innovative approaches have been proposed to expand zero-shot object counting, Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models, and Single Domain Generalization for Few-Shot Counting. Furthermore, benchmarking efforts, such as REOBench, have highlighted the vulnerability of current Earth observation foundation models to real-world corruptions, underscoring the need for more robust and reliable models. Noteworthy papers include: Open Set Domain Adaptation with Vision-language models via Gradient-aware Separation, which proposes a novel approach to open-set domain adaptation using Contrastive Language-Image Pretraining. Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models, which introduces AdvCLIP-LoRA, a method to enhance the adversarial robustness of CLIP models fine-tuned with LoRA in few-shot settings.

Sources

Open Set Domain Adaptation with Vision-language models via Gradient-aware Separation

Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models

Expanding Zero-Shot Object Counting with Rich Prompts

On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable?

Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution Shifts

SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images

Single Domain Generalization for Few-Shot Counting via Universal Representation Matching

REOBench: Benchmarking Robustness of Earth Observation Foundation Models

Built with on top of