Efficient Large Language Models for Long-Context Tasks

The field of large language models (LLMs) is moving towards more efficient and effective solutions for long-context tasks. Recent research has focused on improving the performance of recurrent LLMs, which have linear computational complexity, to match that of self-attention-based LLMs, which have quadratic complexity. This has led to the development of novel methods, such as chunk-wise inference and basic reading distillation, that enable recurrent LLMs to process long contexts more effectively. Additionally, there is a growing interest in constructing long contexts for instruction tuning, with approaches like effortless context construction and multi-level foveation showing promise. These advancements have the potential to significantly improve the performance of LLMs on long-context tasks, while also reducing computational demands. Noteworthy papers include:

Smooth Reading, which proposes a chunk-wise inference method that substantially narrows the performance gap between recurrent and self-attention LLMs on long-context tasks.
Basic Reading Distillation, which educates a small model to imitate LLMs basic reading behaviors and achieves comparable performance to larger LLMs.
Flora, which introduces an effortless long-context construction strategy that enhances the long-context performance of LLMs.
NeedleChain, which proposes a novel benchmark for measuring intact long-context reasoning capability of LLMs.
Self-Foveate, which enhances diversity and difficulty of synthesized instructions from unsupervised text via multi-level foveation.

Efficient Large Language Models for Long-Context Tasks

Sources