Efficient Machine Learning through Data Compression and Pruning

The field of machine learning is moving towards more efficient and scalable methods, with a focus on reducing the need for large amounts of labeled data and computational resources. Recent developments have shown that data compression and pruning techniques can significantly accelerate training, reduce memory usage, and cut storage costs, without sacrificing model performance. These advancements have the potential to unlock new possibilities for distributed and federated learning, as well as tinyML on resource-constrained edge devices. Noteworthy papers in this area include dreaMLearning, which introduces a novel framework for learning from compressed data, and Partial Forward Blocking, which proposes a novel data pruning paradigm for lossless training acceleration. Additionally, papers such as Pruning by Block Benefit and Quality over Quantity demonstrate the effectiveness of pruning methods in preserving model performance while reducing computational costs. AdaDeDup, a novel hybrid framework for data pruning, also shows promising results in efficient large-scale object detection training.

Sources

dreaMLearning: Data Compression Assisted Machine Learning

Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration

Pruning by Block Benefit: Exploring the Properties of Vision Transformer Blocks during Domain Adaptation

Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information

AdaDeDup: Adaptive Hybrid Data Pruning for Efficient Large-Scale Object Detection Training

When Does Pruning Benefit Vision Representations?

Built with on top of