The field of vision-language models is moving towards addressing the critical issue of hallucinations, which hinder the reliability and real-world applicability of these models. Researchers are actively exploring innovative methods to mitigate hallucinations, including test-time adaptation frameworks, reinforcement learning, and hallucination detection systems. A key focus area is the development of robust and efficient methods to identify and reduce hallucinations in vision-language models, with the goal of ensuring the safe and secure deployment of these models. Noteworthy papers in this area include:
- Transferable Adversarial Attacks on Black-Box Vision-Language Models, which demonstrates the vulnerability of proprietary vision-language models to targeted adversarial attacks.
- A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models, which proposes methods to mitigate hallucination in large vision-language models.
- Mitigating Image Captioning Hallucinations in Vision-Language Models, which presents a novel test-time adaptation framework using reinforcement learning to reduce hallucinations in vision-language models.