The field of vision-language models is moving towards developing more effective continual learning methods to adapt to new tasks and domains without forgetting previously learned knowledge. Recent research has focused on addressing the challenges of catastrophic forgetting, cross-modal feature drift, and parameter interference in vision-language models. Noteworthy papers in this area include Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models, which proposes a novel framework that grounds the translation of visual information on instructions for language models. Another notable paper is Tackling Distribution Shift in LLM via KILO, which introduces a knowledge-instructed learning framework that integrates dynamic knowledge graphs with instruction tuning to enhance adaptability to new domains and retention of previously acquired knowledge.