Advances in Reinforcement Learning for Robotics

The field of reinforcement learning (RL) is moving towards more efficient and effective methods for controlling robots. Recent research has focused on using implicit human feedback, such as electroencephalography (EEG) signals, to improve policy learning in sparse reward conditions. Additionally, there is a growing interest in incorporating morphological symmetries and equivariance into policy learning frameworks to enhance sample efficiency and generalization. Other notable developments include the use of partially equivariant Markov Decision Processes (MDPs) to mitigate error propagation from locally broken symmetries and the introduction of reset-based approaches to restore plasticity in RL agents without performance degradation. Noteworthy papers in this area include: Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control, which proposes a novel framework for using EEG signals to provide continuous, implicit feedback. MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion, which introduces a morphological-symmetry-equivariant policy learning framework that encodes robot kinematic structure and morphological symmetries directly into the policy network. AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning, which presents a reset-based approach that restores plasticity without performance degradation by leveraging twin networks.

Sources

Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

AltNet: Addressing the Plasticity-Stability Dilemma in Reinforcement Learning

On the Tension Between Optimality and Adversarial Robustness in Policy Optimization

Does Flatness imply Generalization for Logistic Loss in Univariate Two-Layer ReLU Network?

Variable-Impedance Muscle Coordination under Slow-Rate Control Frequencies and Limited Observation Conditions Evaluated through Legged Locomotion

Towards better dense rewards in Reinforcement Learning Applications

Built with on top of