Reinforcement Learning Efficiency and Convergence

The field of reinforcement learning is witnessing significant advancements in efficiency and convergence guarantees. Recent developments indicate a shift towards designing more efficient algorithms that can learn effective policies with fewer environmental interactions. Notable contributions include innovative approaches to action interpolation, hyperparameter optimization, and dynamic programming. These advancements have the potential to greatly improve the sample efficiency of reinforcement learning methods, making them more viable for real-world applications. Noteworthy papers include Dynamic Action Interpolation, which presents a universal framework for accelerating reinforcement learning with expert guidance, and HyperController, which introduces a computationally efficient algorithm for hyperparameter optimization during training of reinforcement learning neural networks. Return Capping is also notable for its reformulation of the CVaR optimisation problem, resulting in improved sample efficiency. Overall, the field is moving towards more efficient and stable training methods, with a focus on innovative solutions that can accelerate convergence and improve performance.

Sources

Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance

$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation

HyperController: A Hyperparameter Controller for Fast and Stable Training of Reinforcement Learning Neural Networks

DeeP-Mod: Deep Dynamic Programming based Environment Modelling using Feature Extraction

Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

Approximation to Deep Q-Network by Stochastic Delay Differential Equations

Wasserstein Policy Optimization

Built with on top of