Efficient Reasoning in Large Language Models

The field of large language models is moving towards more efficient reasoning mechanisms. Recent research has highlighted the issue of overthinking, where models generate unnecessarily long and verbose responses. To address this, innovative approaches such as adaptive reasoning, speculative chain-of-thought, and dynamic reasoning depth adjustment have been proposed. These methods aim to balance reasoning length and accuracy, reducing inference costs and improving overall performance. Noteworthy papers in this area include Fast-Slow Thinking for Large Vision-Language Model Reasoning, which achieves state-of-the-art accuracy while reducing token usage by up to 67.3%. Another notable paper is ShorterBetter, which enables reasoning language models to discover their own optimal Chain-of-Thought lengths, resulting in up to an 80% reduction in output length while maintaining accuracy.

Sources

Fast-Slow Thinking for Large Vision-Language Model Reasoning

Efficient Reasoning for LLMs through Speculative Chain-of-Thought

Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Built with on top of