The field of large language models (LLMs) is moving towards more efficient compression and fine-tuning methods to reduce computational resources and memory requirements. Recent developments have focused on innovative quantization techniques, such as grouped lattice vector quantization and mixed-precision quantization, which achieve better trade-offs between model size and accuracy. Additionally, new fine-tuning methods like token-wise input-output projections and zero-latency fused low-rank adapters have shown promising results in reducing latency and improving performance. These advancements have the potential to enable the deployment of large models under stringent resource constraints and improve their overall efficiency. Noteworthy papers include Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression, which introduces a novel quantization framework, and zFLoRA: Zero-Latency Fused Low-Rank Adapters, which proposes a new adapter that introduces zero or negligible latency overhead. LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits is also notable for its mixed-precision post-training quantization method tailored to LoRA.