Advancements in Optimization and Generalization for Deep Learning

The field of deep learning is moving towards more efficient and effective optimization techniques, with a focus on generalization and robustness. Recent developments have led to the creation of new algorithms and methods that improve upon existing ones, such as mixed-mode differentiation, sharpness-aware minimization, and adaptive gradient algorithms. These advancements have shown promising results in reducing computational costs, improving convergence rates, and enhancing model performance. Notably, some papers have introduced innovative approaches to existing problems, such as the use of Z-score gradient filtering and the development of novel loss functions. Furthermore, research has also explored the intersection of optimization and generalization, with studies on the effect of importance weighting and the development of methods to mitigate catastrophic overfitting. Overall, the field is witnessing a significant shift towards more principled and efficient optimization methods, with a strong emphasis on generalization and real-world applicability. Noteworthy papers include: * Scalable Meta-Learning via Mixed-Mode Differentiation, which proposes a practical algorithm for efficient meta-learning. * Focal-SAM, which introduces a novel sharpness-aware minimization technique for long-tailed classification. * Learning from Loss Landscape, which presents a generalizable mixed-precision quantization approach via adaptive sharpness-aware gradient aligning.

Sources

Scalable Meta-Learning via Mixed-Mode Differentiation

Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification

Adaptively Point-weighting Curriculum Learning

Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks

Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless $l^p$ Norm Solution for Fast Adversarial Training

More Optimal Fractional-Order Stochastic Gradient Descent for Non-Convex Optimization Problems

Understand the Effect of Importance Weighting in Deep Learning on Dataset Shift

When Dynamic Data Selection Meets Data Augmentation

SAND: One-Shot Feature Selection with Additive Noise Distortion

SEVA: Leveraging Single-Step Ensemble of Vicinal Augmentations for Test-Time Adaptation

Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness

Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning

Precise gradient descent training dynamics for finite-width multi-layer neural networks

Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

Built with on top of