The field of concept-based models and vision-language integration is rapidly evolving, with a focus on improving model interpretability, robustness, and performance. Recent research has highlighted the importance of addressing concept mislabeling, leakage poisoning, and the need for more efficient and effective methods for learning visual concepts. Notable advancements include the development of new loss functions, such as the Concept Preference Optimization objective, and novel approaches to visual clue learning, like Multi-grained Compositional visual Clue Learning. Furthermore, researchers have introduced new benchmarks, like VCBENCH, to evaluate multimodal mathematical reasoning and have proposed innovative methods, such as Focus-Centric Visual Chain, to improve vision-language models' performance in multi-image scenarios. Noteworthy papers include: Avoiding Leakage Poisoning, which introduces MixCEM, a new concept-based model that learns to dynamically exploit leaked information. Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization, which proposes the Concept Preference Optimization objective to mitigate the negative impact of concept mislabeling. Multi-Grained Compositional Visual Clue Learning for Image Intent Recognition, which introduces a novel approach to image intent recognition by breaking down intent recognition into visual clue composition and integrating multi-grained features.