The field of large language models (LLMs) is rapidly advancing, with a focus on improving their reasoning and generation capabilities. Recent developments have centered around enhancing policy optimization strategies, such as Group Relative Policy Optimization (GRPO), to better adapt LLMs to diverse tasks. Additionally, there is a growing interest in exploring the use of LLMs for solving complex numerical problems, such as ordinary differential equations. The application of LLMs to tasks like mathematical reasoning, code generation, and function calling is also becoming increasingly prominent. Noteworthy papers in this area include: On the Theory and Practice of GRPO, which proposes a new algorithm called Trajectory level Importance Corrected GRPO (TIC GRPO) that replaces token level importance ratios with a single trajectory level probability ratio. Compressing Chain-of-Thought in LLMs via Step Entropy, which introduces a novel CoT compression framework based on step entropy to identify redundancy in reasoning steps. EmbedGrad, which optimizes text prompt embeddings through gradient-based refinement to enhance the reasoning capability of LLMs. GTPO and GRPO-S, which propose dynamic entropy weighting to create more fine-grained reward signals for precise policy updates. Multi-module GRPO, which defines a simple multi-module generalization of GRPO to improve modular programs that mix together multiple LM calls. Making Prompts First-Class Citizens for Adaptive LLM Pipelines, which describes a language and runtime that makes prompts structured, adaptive, and first-class components of the execution model. Exploring Superior Function Calls via Reinforcement Learning, which presents a novel reinforcement learning framework to enhance group relative policy optimization for function calling tasks.