The field of large language model-based decision making and exploration is rapidly advancing, with a focus on improving the efficiency and robustness of these models in complex environments. Recent developments have centered around integrating large language models with reinforcement learning, self-play, and goal-oriented planning to enhance their ability to reason, search, and adapt in dynamic settings. Notable advancements include the use of structured goal planners, self-correction mechanisms, and erasable reinforcement learning to overcome limitations in traditional methods. These innovations have led to significant improvements in performance, sample efficiency, and robustness, paving the way for more autonomous decision-making in real-world applications. Noteworthy papers include: AceSearcher, which proposes a cooperative self-play framework that trains a single large language model to alternate between decomposing complex queries and integrating retrieved contexts for answer generation. ReSeek, which introduces a self-correcting framework for training search agents that empowers them to dynamically identify and recover from erroneous search paths during an episode. Erase to Improve, which proposes Erasable Reinforcement Learning, a novel framework that transforms fragile reasoning into a robust process by explicitly identifying faulty steps, erasing them, and regenerating reasoning in place.