Efficient Sequence Modeling and Large Language Models

The field of natural language processing is moving towards more efficient sequence modeling and large language models. Recent developments have focused on reducing the computational complexity and memory requirements of these models, while maintaining or improving their performance. One of the key directions is the use of linear or hybrid-linear attention architectures, which can significantly reduce the computational cost of sequence modeling. Another area of research is the development of novel training methods, such as gradient-based early stopping and bidirectional reconstruction, which can improve the efficiency and effectiveness of large language models. Additionally, there is a growing interest in brain-inspired models, such as SpikingBrain, which can provide a more efficient and scalable alternative to traditional transformer-based models. Noteworthy papers include the introduction of TConstFormer, which achieves constant-time transformer attention, and the development of SCOUT, a hybrid architecture that compresses tokens locally within fixed-size segments and applies attention only over these compressed representations. Overall, the field is moving towards more efficient, scalable, and effective models that can handle long sequences and complex tasks.

Sources

Yet Unnoticed in LSTM: Binary Tree Based Input Reordering, Weight Regularization, and Gate Nonlinearization

Scaling Legal AI: Benchmarking Mamba and Transformers for Statutory Classification and Case Law Retrieval

From TLinFormer to TConstFormer: The Leap to Constant-Time Transformer Attention: Achieving O(1) Computation and O(1) KV Cache during Autoregressive Inference

Chunked TabPFN: Exact Training-Free In-Context Learning for Long-Context Tabular Data

Memory Limitations of Prompt Tuning in Transformers

Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

Supervised In-Context Fine-Tuning for Generative Sequence Labeling

SCOUT: Toward Sub-Quadratic Attention via Segment Compression for Optimized Utility in Transformers

DTRNet: Dynamic Token Routing Network to Reduce Quadratic Costs in Transformers

Efficient Large Language Models with Zero-Shot Adjustable Acceleration

Iterative In-Context Learning to Enhance LLMs Abstract Reasoning: The Case-Study of Algebraic Tasks

GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction

Just-in-time and distributed task representations in language models

SpikingBrain Technical Report: Spiking Brain-inspired Large Models

Accelerating Large Language Model Inference via Early-Exiting Algorithms

MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML