The field of large language models (LLMs) is witnessing significant advancements in their reasoning capabilities. Recent developments have focused on improving the models' ability to engage in multi-turn problem-solving, reason abstractly, and provide more accurate and reliable outputs. One key area of research is the use of reinforcement learning with verifiable rewards (RLVR) to enhance LLMs' reasoning abilities. Researchers are also exploring new methods to mitigate the limitations of RLVR, such as the introduction of entropy-aware RLVR approaches and the use of counterfactual reasoning to improve the models' ability to generalize. Furthermore, there is a growing interest in developing more efficient and scalable architectures for LLMs, such as the use of hierarchical reinforcement learning frameworks and the integration of retrieval-augmented generation systems. Notable papers include MiroMind-M1, which introduces a fully open-source RLM that matches or exceeds the performance of existing open-source RLMs, and LEAR, which proposes a method for extracting rational evidence via reinforcement learning for retrieval-augmented generation. Additionally, the paper 'Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty' describes RLCR, an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation.