Advances in Offline Reinforcement Learning and Causal Policy Learning

The field of reinforcement learning is moving towards addressing the challenges of offline learning, causal policy learning, and multi-objective optimization. Recent developments have focused on improving the sample efficiency and robustness of offline reinforcement learning algorithms, with a particular emphasis on tackling the issue of out-of-distribution actions and generalization. Additionally, there is a growing interest in incorporating causal reasoning into policy learning to correct for hidden confounding variables and improve the reliability of learned policies. Noteworthy papers in this area include those that propose novel frameworks for offline model-based planning, such as Reflect-then-Plan, and those that develop innovative methods for addressing the overestimation bias in deep value-based reinforcement learning, like Ensemble Elastic DQN.

Sources

Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

Learning Design-Score Manifold to Guide Diffusion Models for Offline Optimization

Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning

BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

Reusing Trajectories in Policy Gradients Enables Fast Convergence

Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Inverse Design in Distributed Circuits Using Single-Step Reinforcement Learning

FairDICE: Fairness-Driven Offline Multi-Objective Reinforcement Learning

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning

How to Provably Improve Return Conditioned Supervised Learning?

Semi-gradient DICE for Offline Constrained Reinforcement Learning

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

MOORL: A Framework for Integrating Offline-Online Reinforcement Learning

Provable Sim-to-Real Transfer via Offline Domain Randomization

Wasserstein Barycenter Soft Actor-Critic