Efficient Neural Network Compression and Optimization

The field of neural network compression and optimization is rapidly advancing, with a focus on developing innovative methods to reduce memory and computational costs while maintaining model accuracy. Recent developments have centered around low-rank factorization, sparse dictionary learning, and dynamic rank allocation, which have shown promising results in compressing large language models and convolutional neural networks. Noteworthy papers include LANCE, which proposes a framework for efficient on-device continual learning, and CoSpaDi, which introduces a novel compression framework using sparse dictionary learning. Additionally, papers such as BALF and D-Rank have demonstrated the effectiveness of budgeted rank allocation and dynamic rank allocation in compressing models without fine-tuning. These advancements have the potential to enable efficient deployment of neural networks on edge devices and improve their performance in resource-constrained environments.

Sources

LANCE: Low Rank Activation Compression for Efficient On-Device Continual Learning

COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning

Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs

BALF: Budgeted Activation-Aware Low-Rank Factorization for Fine-Tuning-Free Model Compression

Six Sigma For Neural Networks: Taguchi-based optimization

Effective Model Pruning

Layer-wise dynamic rank for compressing large language models

Growing Winning Subnetworks, Not Pruning Them: A Paradigm for Density Discovery in Sparse Neural Networks

A Unified Probabilistic Framework for Dictionary Learning with Parsimonious Activation

CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models

Efficient CNN Compression via Multi-method Low Rank Factorization and Feature Map Similarity

Budgeted Broadcast: An Activity-Dependent Pruning Rule for Neural Network Efficiency

Robust Classification of Oral Cancer with Limited Training Data

The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM

ENLighten: Lighten the Transformer, Enable Efficient Optical Acceleration