The field of reinforcement learning is moving towards addressing the challenges of offline learning, causal policy learning, and multi-objective optimization. Recent developments have focused on improving the sample efficiency and robustness of offline reinforcement learning algorithms, with a particular emphasis on tackling the issue of out-of-distribution actions and generalization. Additionally, there is a growing interest in incorporating causal reasoning into policy learning to correct for hidden confounding variables and improve the reliability of learned policies. Noteworthy papers in this area include those that propose novel frameworks for offline model-based planning, such as Reflect-then-Plan, and those that develop innovative methods for addressing the overestimation bias in deep value-based reinforcement learning, like Ensemble Elastic DQN.
Advances in Offline Reinforcement Learning and Causal Policy Learning
Sources
Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning
BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning