Efficient Reasoning in Large Language Models

The field of large language models is moving towards improving the efficiency of reasoning processes. Researchers are focusing on reducing overthinking, which is a common issue where models generate excessively long reasoning paths without any performance benefit. Various approaches have been proposed, including adaptive reasoning suppression, early termination, and cumulative entropy regulation. These methods aim to dynamically determine the optimal point to conclude the thought process, thus achieving efficient reasoning without sacrificing problem-solving ability. Notable papers in this area include: On the Self-awareness of Large Reasoning Models' Capability Boundaries, which investigates whether large reasoning models possess self-awareness of capability boundaries and proposes optimization strategies to improve reliability and efficiency. SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression, which introduces a training regime that alternates between compressing and expanding the reasoning budget to reduce redundant tokens and increase reasoning density. Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling, which proposes a novel framework to reduce overthinking by decoupling token-level rewards and introducing a curriculum batch scheduling strategy. RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning, which proposes a method to guide models toward concise reasoning by strategically recomposing the training data. Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners, which introduces a simple yet effective adaptation to reinforcement learning that bridges long chain-of-thought distillation and standard reinforcement learning. Entropy After /Think for reasoning model early exiting, which proposes a novel signal for monitoring and deciding whether to exit reasoning early. ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models, which proposes a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy. ThinkBrake: Mitigating Overthinking in Tool Reasoning, which introduces a training-free decoding heuristic that monitors the log-probability margin between the current top token and the stop token at sentence boundaries. Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort, which proposes a method to detect implicit reward hacking by measuring how early a model's reasoning becomes sufficient to pass a verifier. Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation, which introduces a novel metric and a reasoning paradigm to help the model dynamically determine the optimal point to conclude its thought process.

Efficient Reasoning in Large Language Models

Sources