Advances in Safe Reinforcement Learning and Multi-Agent Systems

The field of reinforcement learning is moving towards safer and more efficient methods, with a focus on developing algorithms that can adapt to complex and dynamic environments. Recent research has made significant progress in safe reinforcement learning, with the development of new algorithms and frameworks that can ensure safety throughout the learning process. Multi-agent systems are also becoming increasingly important, with research focusing on developing methods that can coordinate and communicate effectively between agents. Notable papers include one that proposes a novel bi-level reinforcement learning approach for designing recommender mechanisms in Bayesian stochastic games, and another that presents a method for apprenticeship learning with prior beliefs using inverse optimization. Overall, the field is advancing rapidly, with new methods and techniques being developed to address the challenges of safe and efficient learning in complex environments. Noteworthy papers include:

  • A paper that proposes an Optimistic Mirror Descent Primal-Dual algorithm for online safe reinforcement learning with anytime adversarial constraints, achieving optimal regret and strong constraint violation bounds.
  • A paper that introduces a novel framework for apprenticeship learning with prior beliefs using inverse optimization, demonstrating the importance of regularization in learning cost vectors and apprentice policies.

Sources

Comparator-Adaptive $\Phi$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games

A Survey of Safe Reinforcement Learning and Constrained MDPs: A Technical Survey on Single-Agent and Multi-Agent Safety

Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information Acquisition

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Asymptotically optimal regret in communicating Markov decision processes

Apprenticeship learning with prior beliefs using inverse optimization

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints

A Provable Approach for End-to-End Safe Reinforcement Learning

Pure Exploration with Infinite Answers

Non-Asymptotic Analysis of (Sticky) Track-and-Stop

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

Learning Recommender Mechanisms for Bayesian Stochastic Games

Built with on top of