Advances in Sequence Modeling

The field of sequence modeling is moving towards improving the recall and in-context learning abilities of recurrent neural networks (RNNs) and hybrid architectures. Researchers are exploring innovative methods to enhance the performance of RNNs, such as post-training state expansion and stochastic window sizes, to better handle long contexts and improve generalization. Additionally, there is a growing interest in applying alternative architectures, such as xLSTMs and diffusion models, to sequence labeling tasks. The use of hybrid models, combining the strengths of RNNs and Transformers, is also being investigated to improve length generalization and state tracking capabilities. Notable papers in this area include: StateX, which introduces a training pipeline for efficiently expanding the states of pre-trained RNNs, and SWAX, which proposes a hybrid architecture consisting of sliding-window attention and xLSTM linear RNN layers. Delayed Attention Training Improves Length Generalization in Transformer-RNN Hybrids is also noteworthy, as it proposes a simple yet effective training strategy to mitigate the shortcut reliance of Transformer components in hybrid models.

Advances in Sequence Modeling

Sources