Advances in Large Language and Vision-Language Models

The field of large language and vision-language models is rapidly evolving, with a focus on improving the stability, interpretability, and reliability of these models. Recent research has highlighted the importance of addressing hallucination issues, which occur when models generate text or images that are not grounded in reality. To mitigate this problem, researchers have proposed various methods, including attention-based mechanisms, bias correction, and unlearning techniques. These innovations have the potential to significantly improve the performance and trustworthiness of large language and vision-language models. Notable papers in this area include: Watch the Weights, which introduces a new method for monitoring and controlling fine-tuned language models by interpreting weights rather than activations. EMA Without the Lag proposes a bias-corrected iterate averaging scheme to improve the stability of language model fine-tuning. MIHBench and SAVER address hallucination issues in multimodal large language models, while IKOD and AttnTrace focus on mitigating visual attention degradation and improving context traceback in large vision-language models. Analyzing and Mitigating Object Hallucination proposes an efficient and lightweight unlearning method to mitigate object hallucination via training bias unlearning.

Sources

Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs

EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models

SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision

IKOD: Mitigating Visual Attention Degradation in Large Vision-Language Models

AttnTrace: Attention-based Context Traceback for Long-Context LLMs

Analyzing and Mitigating Object Hallucination: A Training Bias Perspective

Built with on top of