Advances in Efficient Large Language Models

The field of large language models is moving towards more efficient and scalable architectures. Researchers are exploring new methods to reduce the computational costs and memory requirements of these models, while maintaining their performance. One of the key directions is the development of quantization techniques, such as ternary weight quantization, which can significantly reduce the model size and improve inference speed. Another important area of research is the development of scaling laws, which can help predict the performance of models based on their size and computational budget. Noteworthy papers in this area include 'HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space', which introduces a novel pruning algorithm for mixture-of-experts architectures, and 'Tequila: Trapping-free Ternary Quantization for Large Language Models', which proposes a trapping-free quantization optimization method. Additionally, 'xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity' provides insights into the scaling behavior of xLSTM models, showing their potential as a competitive alternative to Transformers.

Sources

Scaling Laws for Neural Material Models

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

Tequila: Trapping-free Ternary Quantization for Large Language Models

Evaluating the Robustness of Chinchilla Compute-Optimal Scaling

Pretraining Scaling Laws for Generative Evaluations of Language Models

A multiscale analysis of mean-field transformers in the moderate interaction regime

On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs

Transformers through the lens of support-preserving maps between measures

Collaborative Compression for Large-Scale MoE Deployment on Edge

Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling

xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity

Built with on top of