Efficient Reasoning in Large Language Models

The field of large language models is moving towards more efficient and effective reasoning capabilities. Recent developments focus on reducing computational overhead, improving data efficiency, and enhancing generalizability across domains. Notable advancements include the use of self-distillation, reinforcement learning, and prototype-based reasoning. These innovations have led to significant performance gains and improved robustness in various reasoning tasks. Noteworthy papers in this area include LearnAlign, which introduces a novel gradient-alignment-based method for data-efficient reinforcement learning, and DART, which proposes a self-distillation framework for efficient chain-of-thought reasoning. TreeRL is also notable for its incorporation of on-policy tree search into reinforcement learning for large language models. Additionally, ProtoReasoning presents a framework that leverages prototypical representations to enhance generalizable reasoning in large language models. These contributions have the potential to significantly advance the field of large language models and their applications in reasoning tasks.

Sources

History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM

LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment

DART: Distilling Autoregressive Reasoning to Silent Thought

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search

Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Optimizing Length Compression in Large Reasoning Models

Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Truncated Proximal Policy Optimization

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

CC-LEARN: Cohort-based Consistency Learning

Built with on top of