Scaling Laws and Efficient Training of Large Language Models

The field of large language models is rapidly advancing, with a focus on understanding and improving the scaling laws that govern their performance. Recent research has highlighted the importance of efficient training methods, including the use of zeroth-order optimizers and data reuse techniques. These innovations have the potential to significantly reduce the computational costs and environmental impact of training large language models. Notable papers in this area include FZOO, which introduces a fast zeroth-order optimizer that achieves Adam-scale speed, and FLoRIST, which proposes a federated fine-tuning framework that achieves mathematically accurate aggregation without incurring high communication or computational overhead. Overall, the field is moving towards more efficient and scalable training methods, which will be crucial for the continued development of large language models.

Sources

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Scaling Laws of Motion Forecasting and Planning -- A Technical Report

Improved Scaling Laws in Linear Regression via Data Reuse

Effective Data Pruning through Score Extrapolation

FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed

MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning

FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models

NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Farseer: A Refined Scaling Law in Large Language Models

Built with on top of