Efficient Model Compression

The field of model compression is moving towards developing innovative methods to reduce the storage and deployment costs of large models. Researchers are exploring new approaches to compress models while maintaining their performance, such as using random orthogonal transformations to decorrelate task-specific parameters and dynamic base model adaptation to improve delta compression. Another key area of focus is the development of data-free pipelines for ultra-efficient delta compression, which can achieve high compression ratios without sacrificing model performance. Additionally, there is a growing interest in targeted pruning methods that can achieve extreme sparsity while preserving critical information. Notable papers in this area include:

RanDeS, which proposes a randomized delta superposition approach to reduce interference among task-specific parameters.
Dynamic Base model Shift for Delta Compression, which introduces a dynamic base model adaptation method to improve delta compression performance.
Breaking the Compression Ceiling, which presents a data-free pipeline for ultra-efficient delta compression.
TRIM, which achieves extreme sparsity with targeted row-wise iterative metric-driven pruning.

Efficient Model Compression

Sources