The field of large language models (LLMs) is moving towards more efficient key-value (KV) cache management to reduce redundant computation and improve inference performance. Recent developments have focused on innovative lossy compression techniques, graph-based eviction strategies, and adaptive cache compression methods. These advancements aim to minimize loading delays, reduce memory footprints, and maintain high generation quality. Notable papers in this area include AdaptCache, which achieves significant delay savings and quality improvements through lossy KV cache compression, and GraphKV, which introduces a graph-based framework for token selection and adaptive retention. Other noteworthy papers are KVComp, which presents a high-performance, LLM-aware lossy compression framework, and EvolKV, which proposes an evolutionary framework for layer-wise, task-driven KV cache compression.