Advancements in Evaluating and Improving Language Models

The field of natural language processing is moving towards a deeper understanding of language models' intrinsic linguistic understanding. Researchers are exploring innovative methods to evaluate and improve these models, focusing on information-theoretic frameworks and mutual information. This direction has led to a better comprehension of how language models process and preserve input information, and how they can be fine-tuned to maximize their understanding ability. Noteworthy papers in this area include:

  • Rethinking the Understanding Ability across LLMs through Mutual Information, which proposes a novel framework for evaluating language models' understanding ability using mutual information.
  • Demystifying Reasoning Dynamics with Mutual Information, which investigates the reasoning mechanisms of large reasoning models from an information-theoretic perspective and proposes methods to improve their reasoning performance.

Sources

Rethinking the Understanding Ability across LLMs through Mutual Information

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

Learning to Insert [PAUSE] Tokens for Better Reasoning

Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence

Built with on top of