Advances in Language Modeling and Complexity

The field of language modeling is moving towards more efficient and scalable architectures, with a focus on improving the trade-off between lexicon size and average morphosyntactic complexity. Recent work has highlighted the importance of incorporating regularity into studies of language optimality, and has proposed new measures of regularity and processing complexity based on the Minimum Description Length approach.

Another key direction is the development of new attention mechanisms, such as Higher-order Linear Attention, which can realize higher-order interactions via compact prefix sufficient statistics. This has the potential to improve the expressivity of autoregressive language models while maintaining efficiency.

Additionally, there is a growing interest in continuous autoregressive language models, which can model language as a sequence of continuous vectors instead of discrete tokens. This paradigm shift has been shown to improve the performance-compute trade-off and achieve state-of-the-art results at a lower computational cost.

Noteworthy papers include:

  • The introduction of Higher-order Linear Attention, which provides a principled and scalable building block for autoregressive language models.
  • The development of Continuous Autoregressive Language Models, which has been shown to significantly improve the performance-compute trade-off.
  • The proposal of a semantic information theory for large language models, which provides a theoretical framework for understanding the information-theoretic principles behind these models.

Sources

Recursive numeral systems are highly regular and easy to process

Higher-order Linear Attention

Continuous Autoregressive Language Models

Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs

Confounding Factors in Relating Model Performance to Morphology

State Complexity of Multiple Concatenation

Built with on top of