The field of large language models (LLMs) is witnessing significant advancements in reasoning capabilities, with a focus on improving performance, efficiency, and interpretability. Researchers are exploring novel approaches to fine-tune LLMs, leveraging techniques such as reinforcement learning, synthetic data generation, and pruning strategies to enhance model performance. Notably, the development of scalable frameworks and off-policy reinforcement learning algorithms is enabling LLMs to learn from their own reasoning traces and improve their capabilities. Furthermore, the integration of cognitive mapping and programmatic representations is showing promise in enhancing human-like planning and problem-solving abilities in LLMs.
Some noteworthy papers in this area include:
- Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models, which introduces a scalable framework for iteratively fine-tuning models on their own reasoning traces.
- Training Large Language Models to Reason via EM Policy Gradient, which proposes an off-policy reinforcement learning algorithm for enhancing LLM reasoning.
- Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets, which leverages a novel Process Reward Model to guide generation and improve mathematical reasoning in LLMs.
- Reinforcement Learning for Reasoning in Large Language Models with One Training Example, which demonstrates the effectiveness of reinforcement learning with verifiable reward using a single training example.
- Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving, which introduces a novel self-training algorithm for training LLMs to formulate anticipatory plans for problem-solving.