Advances in Vision-Language Models

The field of vision-language models is rapidly advancing, with a focus on improving few-shot learning capabilities and adapting to new domains. Recent developments have introduced innovative methods for transductive few-shot learning, single domain generalization, and prompt tuning, which have shown significant improvements in performance and efficiency. These advancements have the potential to enhance the robustness and scalability of vision-language models in real-world applications. Noteworthy papers include:

  • Language-Aware Information Maximization for Transductive Few-Shot CLIP, which proposes a novel loss function for transductive few-shot learning.
  • Target-Oriented Single Domain Generalization, which leverages textual descriptions of the target domain to guide model generalization.
  • Spotlighter, a lightweight token-selection framework that enhances accuracy and efficiency in prompt tuning.
  • Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning, which introduces a novel distance-generating function for meta-learning.
  • CLIP-SVD, a parameter-efficient adaptation technique that leverages Singular Value Decomposition to modify the internal parameter space of CLIP.
  • CaPL, a causality-guided text prompt learning method via visual granulation for CLIP.
  • Attn-Adapter, a novel online few-shot learning framework that enhances CLIP's adaptability via a dual attention mechanism.
  • AttriPrompt, a dynamic prompt composition learning framework that refines textual semantic representations by leveraging intermediate-layer features of CLIP's vision encoder.

Sources

Language-Aware Information Maximization for Transductive Few-Shot CLIP

Target-Oriented Single Domain Generalization

Spotlighter: Revisiting Prompt Tuning from a Representative Mining View

Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning

MLSD: A Novel Few-Shot Learning Approach to Enhance Cross-Target and Cross-Domain Stance Detection

Singular Value Few-shot Adaptation of Vision-Language Models

Causality-guided Prompt Learning for Vision-language Models via Visual Granulation

Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model

AttriPrompt: Dynamic Prompt Composition Learning for CLIP

On the Reproducibility of "FairCLIP: Harnessing Fairness in Vision-Language Learning''

Built with on top of