The field of large language models (LLMs) is moving towards more efficient and optimized models, with a focus on reducing token usage and improving reasoning efficiency. This is driven by the need to balance accuracy and efficiency in practical applications, where longer chain-of-thought traces and increased token usage can lead to higher inference latency and memory consumption. Researchers are exploring novel compression methods, such as abstractive token-level compression and entropy-guided training frameworks, to condense reasoning paths while preserving performance. Additionally, there is a growing interest in defining and optimizing LLM agent efficiency, including step-level and trajectory-level efficiency, to improve interaction efficiency in real-world scenarios. Notable papers in this area include: Cmprsr, which presents a novel prompt compression paradigm and achieves significant improvements in compression ability and downstream task performance. TokenSqueeze, which proposes a Long2Short method that condenses reasoning paths while preserving performance and relies exclusively on self-generated data. Entropy-Guided Reasoning Compression, which addresses the entropy conflict in compression training and achieves impressive compression ratios while maintaining or surpassing baseline accuracy. DEPO, which introduces a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps, resulting in significant reductions in token usage and steps.