Continual Learning in Vision-Language Models

The field of vision-language models is moving towards developing more effective continual learning methods to adapt to new tasks and domains without forgetting previously learned knowledge. Recent research has focused on addressing the challenges of catastrophic forgetting, cross-modal feature drift, and parameter interference in vision-language models. Noteworthy papers in this area include Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models, which proposes a novel framework that grounds the translation of visual information on instructions for language models. Another notable paper is Tackling Distribution Shift in LLM via KILO, which introduces a knowledge-instructed learning framework that integrates dynamic knowledge graphs with instruction tuning to enhance adaptability to new domains and retention of previously acquired knowledge.

Sources

Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models

Tackling Distribution Shift in LLM via KILO: Knowledge-Instructed Learning for Continual Adaptation

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

CRAM: Large-scale Video Continual Learning with Bootstrapped Compression

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Built with on top of