Advances in Language Model Calibration and Evaluation

The field of natural language processing is moving towards a deeper understanding of language model calibration and evaluation. Researchers are exploring new methods to improve the accuracy and fairness of language models, including the development of non-linear scoring models for translation quality evaluation and the analysis of entropy calibration in language models. These advances have the potential to improve the performance and reliability of language models in a variety of applications. Noteworthy papers in this area include:

  • On the Entropy Calibration of Language Models, which investigates the scaling behavior of miscalibration in language models and proves that it is theoretically possible to calibrate without tradeoffs.
  • Non-Linear Scoring Model for Translation Quality Evaluation, which presents a calibrated, non-linear scoring model that better reflects how human content consumers perceive translation quality across samples of varying length.

Sources

On the Entropy Calibration of Language Models

Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression

Non-Linear Scoring Model for Translation Quality Evaluation

Quadratic Term Correction on Heaps' Law

Built with on top of