The field of reinforcement learning is witnessing significant advancements in efficiency, with a focus on reducing the sample complexity and improving the replicability of algorithms. Recent developments have led to the introduction of novel frameworks, such as multi-armed sampling, which challenge traditional notions of exploration-exploitation trade-offs. Furthermore, innovative approaches like recycling data to bridge on-policy and off-policy learning, as well as leveraging causal bounds, have demonstrated substantial improvements in sample efficiency. Additionally, researchers have made notable progress in episodic settings, achieving near-optimal replicable reinforcement learning algorithms. Noteworthy papers include:
- Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound, which establishes a causal bound on the factual loss, resulting in up to 2,427% higher reward ratio and 96% reduction in experience replay buffer size.
- From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning, which presents a replicable RL algorithm that bridges the gap between generative and episodic settings, achieving near-optimality with respect to the state space.