The field of large language models (LLMs) is moving towards efficient deployment, with a focus on reducing model size and inference latency without compromising performance. Researchers are exploring various techniques, including post-training pruning, quantization, and knowledge distillation, to achieve this goal. Notably, the development of novel pruning methods, such as those that leverage weight update magnitudes and activation patterns, has shown promising results. Additionally, the use of quantization has been found to have a nuanced impact on model bias, highlighting the need for careful consideration of ethical implications. Overall, the field is advancing towards more efficient and scalable LLMs, with potential applications in resource-constrained environments. Noteworthy papers include Z-Pruner, which introduces a novel post-training pruning method, and How Quantization Shapes Bias in Large Language Models, which provides a comprehensive evaluation of the impact of quantization on model bias.