The field of large language models (LLMs) is shifting towards efficient deployment and compression techniques to enable their use in resource-constrained environments. Researchers are exploring various methods to reduce the memory and computational requirements of LLMs, including quantization, pruning, and knowledge distillation. A key challenge in this area is maintaining model performance while reducing resource demands. Recent developments have led to the proposal of novel frameworks and techniques, such as outlier-aware weight-only quantization and entropy-encoded weight compression, which show promising results in achieving this goal. Notable papers in this area include ICQuant, which leverages outlier statistics to design an efficient index coding scheme for outlier-aware weight-only quantization, and EntroLLM, which integrates mixed quantization with entropy coding to reduce storage overhead while maintaining model accuracy.