The field of model optimization and quantization is rapidly advancing, with a focus on improving the efficiency and accuracy of large language models and vision transformers. Recent developments have centered around novel quantization strategies, such as sparse model inversion and block rotation, which aim to reduce the computational cost and memory requirements of these models. Additionally, researchers have explored the use of stochastic rounding, mixed-precision training, and outlier-aware post-training quantization to further improve model performance. Noteworthy papers in this area include TetraJet-v2, which introduces an end-to-end 4-bit fully-quantized training method, and DartQuant, which proposes an efficient distribution-aware rotational calibration method for LLM quantization. Overall, these advancements have the potential to significantly accelerate the deployment of large-scale models in resource-constrained environments.