Advances in Efficient and Interpretable Language Modeling

The field of natural language processing is moving towards more efficient and interpretable language models. Researchers are exploring alternatives to traditional self-attention mechanisms, such as adaptive two-sided Laplace transforms and recurrent attention layers, which can achieve state-of-the-art performance while reducing computational complexity. Another trend is the development of more robust and noise-robust speech coding methods, including variable bitrate residual vector quantization and techniques for accelerating neural speech transcription. Additionally, there is a growing interest in analyzing and understanding the underlying mechanisms of transformer-based language models, using tools such as free probability theory and spectral dictionary token mixers. Noteworthy papers in this area include:

  • Adaptive Two Sided Laplace Transforms, which proposes a learnable and scalable replacement for self-attention.
  • Early Attentive Sparsification Accelerates Neural Speech Transcription, which accelerates neural speech transcription by sparsifying the hidden state.
  • Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ, which introduces a variable bitrate residual vector quantization framework for noise-robust speech coding.

Sources

Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention

Early Attentive Sparsification Accelerates Neural Speech Transcription

Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

A Free Probabilistic Framework for Analyzing the Transformer-based Language Models

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

Deep generative models as the probability transformation functions

From Pixels and Words to Waves: A Unified Framework for Spectral Dictionary vLLMs

Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment

Distilling Normalizing Flows

Built with on top of