Advances in Large Language and Vision-Language Models

The field of large language and vision-language models is rapidly evolving, with a focus on improving the stability, interpretability, and reliability of these models. Recent research has highlighted the importance of addressing hallucination issues, which occur when models generate text or images that are not grounded in reality. To mitigate this problem, researchers have proposed various methods, including attention-based mechanisms, bias correction, and unlearning techniques. These innovations have the potential to significantly improve the performance and trustworthiness of large language and vision-language models. Notable papers in this area include: Watch the Weights, which introduces a new method for monitoring and controlling fine-tuned language models by interpreting weights rather than activations. EMA Without the Lag proposes a bias-corrected iterate averaging scheme to improve the stability of language model fine-tuning. MIHBench and SAVER address hallucination issues in multimodal large language models, while IKOD and AttnTrace focus on mitigating visual attention degradation and improving context traceback in large vision-language models. Analyzing and Mitigating Object Hallucination proposes an efficient and lightweight unlearning method to mitigate object hallucination via training bias unlearning.

Advances in Large Language and Vision-Language Models

Sources