Advances in Few-Shot Learning and Vision-Language Models

The field of computer vision is rapidly advancing, with a focus on few-shot learning and vision-language models. Recent research has explored the use of transfer learning, meta-learning, and self-supervised learning to improve the performance of models on few-shot image classification tasks. Vision-language models, such as CLIP, have shown impressive results in zero-shot recognition and have been fine-tuned for various downstream tasks.

Noteworthy papers in this area include the proposal of ViT-ProtoNet, which integrates a Vision Transformer into the Prototypical Network framework for few-shot image classification. Another notable work is the introduction of Fine-grained Alignment and Interaction Refinement (FAIR), which dynamically aligns localized image features with descriptive language embeddings for fine-grained unsupervised adaptation. NegRefine is also a notable paper, which proposes a novel negative label refinement framework for zero-shot out-of-distribution detection.

These advances have the potential to improve the performance of models in real-world applications, such as image recognition, object detection, and natural language processing. Overall, the field is moving towards more efficient, effective, and generalizable models that can learn from limited data and adapt to new tasks and environments.

Sources

Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification

Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning

ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation

Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score

NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection

Beyond Graph Model: Reliable VLM Fine-Tuning via Random Graph Adapter

Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection

Clustering-Guided Multi-Layer Contrastive Representation Learning for Citrus Disease Classification

ProtoConNet: Prototypical Augmentation and Alignment for Open-Set Few-Shot Image Classification

Fine-Grained Image Recognition from Scratch with Teacher-Guided Data Augmentation

Cluster Contrast for Unsupervised Visual Representation Learning

Semantic-guided Fine-tuning of Foundation Model for Long-tailed Visual Recognition

GLAD: Generalizable Tuning for Vision-Language Models

Built with on top of