Advances in Reinforcement Learning and Multi-Agent Systems

The field of reinforcement learning and multi-agent systems is moving towards more efficient and effective methods for learning in complex environments. Recent developments have focused on addressing challenges such as sparse rewards, partial observability, and limited feedback. Notably, researchers have proposed novel frameworks that integrate online inverse preference learning, multi-agent on-policy optimization, and kernelized temporal-difference critics to improve learning in these settings. Additionally, there is a growing interest in exploring the parameter space of policies to improve performance and stability. The use of privileged signals, such as those from large language models, is also being investigated to enhance learning efficiency. Overall, these advances have the potential to significantly improve the performance and applicability of reinforcement learning and multi-agent systems in real-world domains. Noteworthy papers include: Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning, which proposes a novel framework for learning in sparse-reward environments. Polychromic Objectives for Reinforcement Learning, which introduces a new objective for policy gradient methods that enforces exploration and refinement of diverse generations. Informed Asymmetric Actor-Critic, which enables conditioning the critic on arbitrary privileged signals without requiring access to the full state.

Sources

Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning

Learning Admissible Heuristics for A*: Theory and Practice

Sampling Complexity of TD and PPO in RKHS

Polychromic Objectives for Reinforcement Learning

Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space

Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback

MM-LMPC: Multi-Modal Learning Model Predictive Control via Bandit-Based Mode Selection

Built with on top of