Advances in Efficient Model Training and Representation

The field of deep learning is moving towards more efficient model training and representation methods. Recent research has focused on developing techniques to reduce the complexity of models while maintaining their performance. This includes methods such as sparsification, low-rank training, and dynamic rank adjustment, which can significantly reduce the number of trainable parameters and improve training speed. Additionally, there is a growing interest in understanding the underlying dynamics of model training, including the role of implicit bias and the importance of layer normalization. These advances have the potential to enable the training of more accurate and efficient models, and to improve the overall performance of deep learning systems. Noteworthy papers include 'One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging', which introduces a novel sparsification strategy that respects the heterogeneity of model parameters, and 'Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training', which proposes a framework for dynamically adjusting the rank of weight matrices during training.

Sources

One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging

Sparsity-Driven Plasticity in Multi-Task Reinforcement Learning

A Globally Optimal Analytic Solution for Semi-Nonnegative Matrix Factorization with Nonnegative or Mixed Inputs

Intrinsic training dynamics of deep neural networks

Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification

Orthogonal Low Rank Embedding Stabilization

Addendum on data driven regularization by projection

Scaled-Dot-Product Attention as One-Sided Entropic Optimal Transport

Sparse Partial Optimal Transport via Quadratic Regularization

Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training

Towards Scalable Lottery Ticket Networks using Genetic Algorithms

Global Convergence Analysis of Vanilla Gradient Descent for Asymmetric Matrix Completion

Prototype Training with Dual Pseudo-Inverse and Optimized Hidden Activations

Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models