Reinforcement Learning Efficiency Advances

The field of reinforcement learning is witnessing significant advancements in efficiency, with a focus on reducing the sample complexity and improving the replicability of algorithms. Recent developments have led to the introduction of novel frameworks, such as multi-armed sampling, which challenge traditional notions of exploration-exploitation trade-offs. Furthermore, innovative approaches like recycling data to bridge on-policy and off-policy learning, as well as leveraging causal bounds, have demonstrated substantial improvements in sample efficiency. Additionally, researchers have made notable progress in episodic settings, achieving near-optimal replicable reinforcement learning algorithms. Noteworthy papers include:

  • Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound, which establishes a causal bound on the factual loss, resulting in up to 2,427% higher reward ratio and 96% reduction in experience replay buffer size.
  • From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning, which presents a replicable RL algorithm that bridges the gap between generative and episodic settings, achieving near-optimality with respect to the state space.

Sources

Multi-Armed Sampling Problem and the End of Exploration

Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

Reinforcement Learning from Adversarial Preferences in Tabular MDPs

From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning

Improving Reinforcement Learning Sample-Efficiency using Local Approximation

Built with on top of