Efficient Reasoning in Large Language Models

The field of large language models is moving towards more efficient and effective reasoning capabilities. Recent studies have demonstrated that scaling reasoning can improve factuality in large language models, and that adaptive token allocation and token-efficient inference scaling can lead to significant improvements in performance. The use of cost-aware planning and budget allocation has also been shown to enhance reasoning accuracy and efficiency. Notably, the introduction of speculative parallel scaling reasoning and length-based reward shaping has achieved strong gains in mathematical reasoning benchmarks. Furthermore, the development of frameworks such as Plan-and-Budget and Cost-Augmented Monte Carlo Tree Search has improved reasoning efficiency across a range of tasks and models. Some particularly noteworthy papers in this regard include: SelfBudgeter, which proposes a self-adaptive controllable reasoning strategy for efficient reasoning, achieving up to 74.47% response length compression on the MATH benchmark while maintaining nearly undiminished accuracy. A*-Decoding, which introduces a search-based inference-time strategy that builds on the A* search algorithm to optimally utilize a fixed compute budget, reaching the performance levels of strong inference scaling baselines while using up to 3x fewer tokens and 30% fewer PRM passes under equivalent compute budgets.

Sources

Scaling Reasoning can Improve Factuality in Large Language Models

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning

A*-Decoding: Token-Efficient Inference Scaling

Cost-Augmented Monte Carlo Tree Search for LLM-Assisted Planning

The Energy Cost of Reasoning: Analyzing Energy Usage in LLMs with Test-time Compute

Incorporating Token Usage into Prompting Strategy Evaluation

SSR: Speculative Parallel Scaling Reasoning in Test-time

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning

How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance

Built with on top of