Advances in Neural Network Optimization

The field of neural network optimization is moving towards a deeper understanding of the underlying dynamics and the development of more efficient and robust algorithms. Recent research has focused on analyzing the behavior of neural networks during training, including the effects of finite-width corrections and the role of gating mechanisms in recurrent neural networks. Additionally, there has been a surge of interest in developing new optimization methods that can handle large batch sizes and adapt to changing learning rates. Notable papers in this area include: Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks, which provides a unified dynamical-systems perspective on how gating couples state evolution with parameter updates. Kourkoutas-Beta, an Adam-style optimizer that replaces the fixed second-moment discount beta2 with a layer-wise dynamic value driven by a bounded sunspike ratio, improving stability and final loss versus fixed-beta2 Adam. Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches, which introduces a novel technique that restores the effectiveness of second-order methods at very large batch sizes, enabling scalable training with improved generalization and faster convergence.

Sources

Finite-Width Neural Tangent Kernels from Feynman Diagrams

Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks

Kourkoutas-Beta: A Sunspike-Driven Adam Optimizer with Desert Flair

Explainable Learning Rate Regimes for Stochastic Optimization

Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size

Built with on top of