Quantization Advances in AI Models

The field of AI model optimization is moving towards more efficient and effective quantization techniques, enabling the deployment of complex models on resource-constrained devices. Recent developments focus on improving the quality of synthetic data used in quantization, as well as enhancing the calibration and inference stages to reduce performance degradation. Noteworthy papers include DFQ-ViT, which achieves remarkable superiority over existing data-free quantization methods, and SegQuant, a unified quantization framework that adaptively combines techniques to enhance cross-model versatility. Additionally, Task-Specific Zero-shot Quantization-Aware Training and QuaRC are innovative approaches that address specific challenges in object detection and edge device deployment, respectively.

Sources

DFQ-ViT: Data-Free Quantization for Vision Transformers without Fine-tuning

SegQuant: A Semantics-Aware and Generalizable Quantization Framework for Diffusion Models

Improving Model Classification by Optimizing the Training Dataset

Task-Specific Zero-shot Quantization-Aware Training for Object Detection

Enhancing Quantization-Aware Training on Edge Devices via Relative Entropy Coreset Selection and Cascaded Layer Correction

Built with on top of