Continual Learning in Vision-Language Models

The field of continual learning is moving towards developing more effective methods for preserving knowledge and adapting to new tasks in vision-language models. Recent research has focused on addressing the challenges of class-incremental learning, domain incremental learning, and lifelong learning, with a particular emphasis on leveraging pre-trained models and multi-modal supervision. Notable advancements include the use of analytic contrastive projection, hierarchical semantic tree anchoring, and language-based anchors to mitigate catastrophic forgetting and improve performance.

Some noteworthy papers in this area include: AnaCP, which proposes a novel method for analytic contrastive projection to enable incremental feature adaptation without gradient-based training. LAVA, which introduces a language-anchored visual alignment framework to preserve relative visual geometry for domain incremental learning. HASTEN, which anchors hierarchical information into class-incremental learning to reduce catastrophic forgetting. BOFA, which proposes a bridge-layer orthogonal low-rank fusion framework for CLIP-based class-incremental learning. DMC, which introduces a two-stage framework for CLIP-based class-incremental learning that decouples the adaptation of the vision encoder and the optimization of textual soft prompts.

Sources

Preserving Cross-Modal Consistency for CLIP-based Class-Incremental Learning

BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning

MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images

AnaCP: Toward Upper-Bound Continual Learning via Analytic Contrastive Projection

Language as an Anchor: Preserving Relative Visual Geometry for Domain Incremental Learning

Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning

Built with on top of