Optimization Advances in Deep Learning

The field of deep learning is witnessing significant advancements in optimization techniques, with a focus on improving generalization, robustness, and convergence. Researchers are exploring novel optimizers, such as those incorporating dynamic scaling and adaptive damping, to enhance training efficiency and stability. Additionally, there is a growing interest in analyzing and optimizing the convergence behavior of stochastic gradient descent with momentum (SGDM) under various scheduling strategies. The integration of scaling laws into deep reinforcement learning (DRL) is also being investigated, with a focus on balancing scalability with computational efficiency. Noteworthy papers in this area include: ZetA, which introduces a novel deep learning optimizer that extends Adam by incorporating dynamic scaling based on the Riemann zeta function, demonstrating improved generalization and robustness. Accelerating SGDM via Learning Rate and Batch Size Schedules, which analyzes the convergence behavior of SGDM under dynamic learning rate and batch size schedules, providing a unified theoretical foundation and practical guidance for designing efficient and stable training procedures. Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes, which proposes a novel method that updates network parameters in an alternating manner, reducing per-step computational overhead and enhancing training stability in nonconvex settings. Optimal Growth Schedules for Batch Size and Learning Rate in SGD, which theoretically derives optimal growth schedules for the batch size and learning rate that reduce stochastic first-order oracle complexity, offering both theoretical insights and practical guidelines for scalable and efficient large-batch training in deep learning.

Sources

ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning

Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies

Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes

Comparative Analysis of Novel NIRMAL Optimizer Against Adam and SGD with Momentum

Robustly Learning Monotone Single-Index Models

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

Optimal Growth Schedules for Batch Size and Learning Rate in SGD that Reduce SFO Complexity

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity

Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and Adam

Built with on top of