Advances in Offline Reinforcement Learning

The field of reinforcement learning is moving towards more effective integration of offline and online learning methods. Recent developments have focused on addressing the challenges of distributional shift, inaccurate value estimation, and the need for explicit reward annotations. Researchers have proposed innovative approaches, such as novel learning phases, regularization techniques, and reward annotation frameworks, to improve the performance and efficiency of offline reinforcement learning algorithms. Notable papers include those that demonstrate significant improvements in performance across a range of benchmarks, such as the D4RL dataset. Noteworthy papers include: Online Pre-Training for Offline-to-Online RL, which introduces a new learning phase to address inaccurate value estimation. Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data, which proposes a new algorithm that guides the gradual decrease of Q-values outside the data range. Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning, which trains each layer of the neural network using local signals during the forward pass, eliminating the need for backward passes. From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning, which generates intrinsic rewards from expert demonstrations using a simple yet effective embedding discrepancy measure.

Sources

Online Pre-Training for Offline-to-Online Reinforcement Learning

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps

Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning

Online Training and Pruning of Deep Reinforcement Learning Networks

From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

GradNetOT: Learning Optimal Transport Maps with GradNets

Built with on top of