Advances in Model-Based Reinforcement Learning and Decision-Making

The field of reinforcement learning is moving towards the development of more efficient, safe, and interpretable agents. Recent research has focused on combining model-free and model-based approaches to improve sample efficiency and safety. Model-based methods, such as model predictive control, can leverage prior system knowledge to inform and constrain agent decisions, while model-free methods can help remedy model mismatch. Additionally, there is a growing interest in using expert demonstrations and offline data to guide learning and improve decision-making. Noteworthy papers in this area include: On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes, which highlights the limitations of dual CVaR decompositions in MDPs. Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning, which proposes a framework for utilizing expert demonstrations to guide exploration in RL. Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games, which introduces a kernel-based approach to inverse reinforcement learning for mean-field games.

Sources

Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents

On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes

Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning

Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown

RAD: Retrieval High-quality Demonstrations to Enhance Decision-making

Data-Efficient Safe Policy Improvement Using Parametric Structure

EBaReT: Expert-guided Bag Reward Transformer for Auto Bidding

Pragmatic Policy Development via Interpretable Behavior Cloning

ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search

Generalized Low-Rank Matrix Contextual Bandits with Graph Information

Built with on top of