Advances in Interpreting and Improving Large Language Models

The field of large language models (LLMs) is rapidly advancing, with a focus on improving their interpretability and reliability. Researchers are developing new tools and methods to investigate the computational processes behind LLMs, such as visualizing internal states and information flow. Additionally, there is a growing effort to mitigate hallucinations in LLMs, which can be achieved through techniques like entropy analysis, extracting visual facts, and surfacing variations in model outputs. Another area of research is exploring the application of LLMs in multimodal tasks, including video anomaly detection and image description generation. Furthermore, researchers are working on improving the training of Transformers, enabling them to be trained with simpler optimizers. Noteworthy papers in this area include:

  • InTraVisTo, which introduces a visualization tool for investigating the internal state of Transformer models.
  • ICR Probe, which proposes a novel metric for detecting hallucinations in LLMs by analyzing the hidden state update process.
  • HiProbe-VAD, which leverages pre-trained Multimodal Large Language Models for video anomaly detection without requiring fine-tuning. These developments have the potential to significantly improve the performance and reliability of LLMs, and pave the way for their applications in a wide range of tasks.

Sources

InTraVisTo: Inside Transformer Visualisation Tool

Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution

Probing Information Distribution in Transformer Architectures through Entropy Analysis

Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models

Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions

Foundation Models and Transformers for Anomaly Detection: A Survey

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs

HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs

DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD

Explainable Mapper: Charting LLM Embedding Spaces Using Perturbation-Based Explanation and Verification Agents

Built with on top of