The field of large language model (LLM) agents is rapidly evolving, with a focus on developing more robust and data-efficient training methods. Recent research has shifted towards dynamic, environment-based exploration, moving away from traditional supervised fine-tuning on static trajectories. This paradigm shift enables agents to learn complex behaviors directly from problem instances, leading to improved out-of-distribution generalization and performance. Notable papers in this area include: ARM-FM, which introduces a framework for automated, compositional reward design in reinforcement learning, leveraging the high-level reasoning capabilities of foundation models. Information Gain-based Policy Optimization, a simple yet effective RL framework that provides dense and intrinsic supervision for multi-turn agent training, consistently outperforming strong baselines in multi-turn scenarios.