The field of large language models is moving towards more efficient and compact models through the use of quantization techniques. Researchers are exploring various methods to reduce the size and computational costs of these models while maintaining their performance. A key challenge in this area is finding the right balance between model compression and task performance.
Noteworthy papers in this area include: Error Propagation Mechanisms and Compensation Strategies for Quantized Diffusion, which develops a theoretical framework for error propagation in diffusion models and proposes a timestep-aware cumulative error compensation scheme. DLLMQuant, which proposes a post-training quantization framework tailored for diffusion-based large language models, incorporating novel techniques such as Temporal-Mask Adaptive Sampling and Certainty-Guided Quantization.