Efficient Reasoning in Large Language Models

The field of large language models (LLMs) is moving towards more efficient and effective methods for improving reasoning capabilities. Recent research has focused on developing techniques that can transfer reasoning behaviors from small models to larger ones, reducing the computational cost associated with reinforcement learning (RL). Additionally, there is a growing interest in exploring alternative training paradigms, such as combining supervised fine-tuning (SFT) and RL, to optimize LLMs for reasoning tasks. Noteworthy papers in this area include RAST, which proposes a method for transferring reasoning behaviors from small models to larger ones, and BREAD, which introduces a new training paradigm that unifies SFT and RL stages. Other notable papers, such as Command-V and SRFT, have also made significant contributions to the field, offering innovative solutions for retrofitting LLMs with new behaviors and optimizing their performance on reasoning tasks. Overall, the field is advancing rapidly, with a focus on developing more efficient, scalable, and effective methods for improving LLM reasoning capabilities.

Sources

RAST: Reasoning Activation in LLMs via Small-model Transfer

BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

Command-V: Pasting LLM Behaviors via Activation Profiles

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Built with on top of