Advancements in Vision-Language Models for Industrial Anomaly Detection

The field of vision-language models is moving towards more efficient and effective test-time adaptation methods, enabling these models to better generalize to new domains and datasets. Recent developments have focused on improving the robustness and accuracy of vision-language models in industrial anomaly detection tasks, with a particular emphasis on few-shot learning and zero-shot anomaly detection. Noteworthy papers in this area include ETTA, which proposes a recursive updating module for dynamic embedding updates, and IAD-R1, which introduces a two-stage training strategy for enhancing anomaly detection capabilities. Additionally, the Architectural Co-Design framework has shown promise in decoupling representation and dynamically fusing features for zero-shot anomaly detection. Overall, these advancements have the potential to significantly improve the performance of vision-language models in industrial anomaly detection tasks.

Sources

ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates

Adaptive Cache Enhancement for Test-Time Adaptation of Vision-Language Models

Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

Built with on top of