Advancements in Large Language Models for Reasoning and Mathematics

The field of large language models (LLMs) is witnessing significant advancements in reasoning and mathematics capabilities. Recent developments have focused on improving the accuracy and efficiency of LLMs through innovative training methods, such as combining supervised fine-tuning (SFT) and reinforcement learning (RL). These approaches have led to state-of-the-art performance on challenging benchmarks, including mathematical Olympiad competitions. Notably, researchers have discovered that prolonged SFT phases can significantly enhance model accuracy, while RL can optimize solution length and improve token efficiency. Moreover, adaptive guidance and difficulty-aware reinforcement learning frameworks have been proposed to stabilize training and improve reasoning performance. These advancements have far-reaching implications for developing powerful and robust reasoning models. Some notable papers include:

  • A Practical Two-Stage Recipe for Mathematical LLMs, which introduces a systematic methodology for combining SFT and RL to maximize accuracy and efficiency.
  • KAT-V1, which presents an open-source 40B large language model that addresses the overthinking problem in reasoning-intensive tasks through automatic thinking training paradigms.
  • GHPO, which proposes a novel difficulty-aware reinforcement learning framework that adaptively balances direct imitation learning and exploration-based reinforcement learning.

Sources

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

KAT-V1: Kwai-AutoThink Technical Report

wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models

Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them

GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training

Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)

Built with on top of