The field of large language models is moving towards more efficient and stable optimization methods. Recent research has focused on improving the trade-off between optimization granularity and training stability, with several novel frameworks and techniques being proposed. These advancements have led to state-of-the-art performance on various benchmarks and have the potential to improve the overall quality of large language models. Notable papers in this area include ESPO, which introduces a new framework for policy optimization that reconciles fine-grained control with training stability, and DVPO, which combines conditional risk theory with distributional value modeling to balance robustness and generalization. Additionally, papers such as Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective and Auxiliary-Hyperparameter-Free Sampling: Entropy Equilibrium for Text Generation have made significant contributions to the field by introducing new perspectives and approaches to optimization and sampling.