Advancements in Large Language Models

The field of large language models (LLMs) is rapidly advancing, with a focus on improving their reasoning and generation capabilities. Recent developments have centered around enhancing policy optimization strategies, such as Group Relative Policy Optimization (GRPO), to better adapt LLMs to diverse tasks. Additionally, there is a growing interest in exploring the use of LLMs for solving complex numerical problems, such as ordinary differential equations. The application of LLMs to tasks like mathematical reasoning, code generation, and function calling is also becoming increasingly prominent. Noteworthy papers in this area include: On the Theory and Practice of GRPO, which proposes a new algorithm called Trajectory level Importance Corrected GRPO (TIC GRPO) that replaces token level importance ratios with a single trajectory level probability ratio. Compressing Chain-of-Thought in LLMs via Step Entropy, which introduces a novel CoT compression framework based on step entropy to identify redundancy in reasoning steps. EmbedGrad, which optimizes text prompt embeddings through gradient-based refinement to enhance the reasoning capability of LLMs. GTPO and GRPO-S, which propose dynamic entropy weighting to create more fine-grained reward signals for precise policy updates. Multi-module GRPO, which defines a simple multi-module generalization of GRPO to improve modular programs that mix together multiple LM calls. Making Prompts First-Class Citizens for Adaptive LLM Pipelines, which describes a language and runtime that makes prompts structured, adaptive, and first-class components of the execution model. Exploring Superior Function Calls via Reinforcement Learning, which presents a novel reinforcement learning framework to enhance group relative policy optimization for function calling tasks.

Sources

On the Theory and Practice of GRPO: A Trajectory-Corrected Approach with Fast Convergence

From Text to Trajectories: GPT-2 as an ODE Solver via In-Context

CardiffNLP at CLEARS-2025: Prompting Large Language Models for Plain Language and Easy-to-Read Text Rewriting

Compressing Chain-of-Thought in LLMs via Step Entropy

Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis

Error Detection and Correction for Interpretable Mathematics in Large Language Models

EmbedGrad: Gradient-Based Prompt Optimization in Embedding Space for Large Language Models

GTPO: Trajectory-Based Policy Optimization in Large Language Models

GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

Making Prompts First-Class Citizens for Adaptive LLM Pipelines

Exploring Superior Function Calls via Reinforcement Learning

Built with on top of