Advances in Reinforcement Learning for Robust and Adaptive Control

The field of reinforcement learning (RL) is moving towards developing more robust and adaptive control methods for complex systems, such as legged robots and humanoid locomotion. Recent research has focused on addressing the challenges of sparse and delayed rewards, as well as improving the generalization of RL policies to new environments and tasks. Innovative approaches, such as attention-based reward shaping and bidirectional distillation, have shown promising results in improving the learning efficiency and robustness of RL agents. Additionally, the use of multi-expert distillation and reinforcement learning fine-tuning has enabled the development of general and extensible agile locomotion policies for legged robots. Noteworthy papers include Attention-Based Reward Shaping for Sparse and Delayed Rewards, which proposes a general and robust algorithm for generating shaped rewards, and GROQLoco, which presents a scalable and attention-based framework for learning a single generalist locomotion policy across multiple quadruped robots and terrains. Another notable paper is Bidirectional Distillation, which introduces a novel mixed-play framework for multi-agent generalizable behaviors. Overall, these advances have the potential to significantly improve the performance and adaptability of RL agents in a wide range of applications.

Sources

Attention-Based Reward Shaping for Sparse and Delayed Rewards

GROQLoco: Generalist and RObot-agnostic Quadruped Locomotion Control using Offline Datasets

Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors

Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning

TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion

Toward Real-World Cooperative and Competitive Soccer with Quadrupedal Robot Teams

Sampling-Based System Identification with Active Exploration for Legged Robot Sim2Real Learning

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Reference Free Platform Adaptive Locomotion for Quadrupedal Robots using a Dynamics Conditioned Policy

Motion Priors Reimagined: Adapting Flat-Terrain Skills for Complex Quadruped Mobility

Reward-Aware Proto-Representations in Reinforcement Learning

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Built with on top of