The field of reinforcement learning is moving towards more efficient and stable methods for policy optimization and value function learning. Recent developments have focused on improving the sample efficiency and convergence of algorithms, with a particular emphasis on addressing the challenges of high-variance and distribution shift. Noteworthy papers include: A Variance-Reduced Cubic-Regularized Newton for Policy Optimization, which introduces a novel algorithm that achieves state-of-the-art sample complexity. Relative Entropy Pathwise Policy Optimization, which proposes an efficient on-policy algorithm that combines the benefits of pathwise policy gradients and standard on-policy learning. Kevin: Multi-Turn RL for Generating CUDA Kernels, which demonstrates the effectiveness of multi-turn reinforcement learning for generating and optimizing CUDA kernels.