The field of large language models is moving towards improving the efficiency of reasoning capabilities. Recent developments focus on reducing the computational cost and latency associated with chain-of-thought reasoning, while maintaining or improving accuracy. Researchers are exploring novel reinforcement learning methods, low-rank distillation techniques, and dynamic skipping mechanisms to achieve this goal. Notable papers in this area include S-GRPO, which proposes a serial-group decaying-reward policy optimization method to trigger early exit in chain-of-thought generation, and Adaptive GoGI-Skip, which introduces a goal-gradient importance metric and dynamic skipping mechanism to compress reasoning traces. Other papers, such as Learning to Think and Reinforcing the Diffusion Chain of Lateral Thought, have also made significant contributions to the field, demonstrating substantial efficiency gains and accuracy improvements in various reasoning benchmarks.