The field of continual learning is moving towards developing more effective methods for preserving knowledge and adapting to new tasks in vision-language models. Recent research has focused on addressing the challenges of class-incremental learning, domain incremental learning, and lifelong learning, with a particular emphasis on leveraging pre-trained models and multi-modal supervision. Notable advancements include the use of analytic contrastive projection, hierarchical semantic tree anchoring, and language-based anchors to mitigate catastrophic forgetting and improve performance.
Some noteworthy papers in this area include: AnaCP, which proposes a novel method for analytic contrastive projection to enable incremental feature adaptation without gradient-based training. LAVA, which introduces a language-anchored visual alignment framework to preserve relative visual geometry for domain incremental learning. HASTEN, which anchors hierarchical information into class-incremental learning to reduce catastrophic forgetting. BOFA, which proposes a bridge-layer orthogonal low-rank fusion framework for CLIP-based class-incremental learning. DMC, which introduces a two-stage framework for CLIP-based class-incremental learning that decouples the adaptation of the vision encoder and the optimization of textual soft prompts.