Optimization and Generalization in Deep Learning

The field of deep learning is moving towards a deeper understanding of optimization and generalization. Recent research has focused on developing new optimization algorithms and analyzing the properties of existing ones. A key direction is the development of more efficient and robust optimization methods, such as those that can handle non-convex landscapes and saddle points. Another important area is the study of generalization, including the role of regularization, normalization, and other techniques in improving the performance of deep neural networks. Noteworthy papers in this area include 'Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules', which proposes a theoretical framework for understanding learning rules, and 'The Hidden Power of Normalization: Exponential Capacity Control in Deep Neural Networks', which provides a theoretical explanation for the success of normalization methods. Additionally, 'A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization' provides a rigorous geometric explanation for the effectiveness of variable elimination algorithms.

Sources

Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules

Exploring Landscapes for Better Minima along Valleys

Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance

The Hidden Power of Normalization: Exponential Capacity Control in Deep Neural Networks

Regularization Implies balancedness in the deep linear network

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

Estimation of Toeplitz Covariance Matrices using Overparameterized Gradient Descent

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Bulk-boundary decomposition of neural networks

Matrix Sensing with Kernel Optimal Loss: Robustness and Optimization Landscape

Neural network initialization with nonlinear characteristics and information on spectral bias

Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

Relative entropy estimate and geometric ergodicity for implicit Langevin Monte Carlo

Mean square error analysis of stochastic gradient and variance-reduced sampling algorithms