The field of deep learning is witnessing significant advancements in optimization techniques, with a focus on improving the efficiency and effectiveness of training deep neural networks. Recent developments are moving towards leveraging geometric properties of the networks, incorporating curvature information, and adapting to problem geometry. Notably, lifted training methods and non-Euclidean gradient descent approaches are gaining attention for their potential to overcome challenges such as vanishing or exploding gradients and improve parallelization. Additionally, there is a growing interest in automatic learning rate selection, preconditioned norms, and stochastic gradient methods. These innovative approaches aim to enhance the performance and generalization of deep learning models, and their impact is being felt across various applications.
Some noteworthy papers in this area include: An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants, which systematically explores different alternatives for aggregating norms across layers and derives new variants of the Muon optimizer. Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods, which proposes a unified framework generalizing steepest descent, quasi-Newton methods, and adaptive methods through the novel notion of preconditioned matrix norms. Noise-Adaptive Layerwise Learning Rates: Accelerating Geometry-Aware Optimization for Deep Neural Network Training, which introduces a noise-adaptive layerwise learning rate scheme on top of geometry-aware optimization algorithms and substantially accelerates DNN training.