Advances in Preference-Based Reinforcement Learning

The field of reinforcement learning is moving towards more robust and efficient methods for handling noisy preference feedback. Recent developments have focused on integrating few-shot expert demonstrations, tri-teaching strategies, and discriminability-aware approaches to improve query efficiency and policy learning. These advancements have shown significant improvements in handling high levels of noise and achieving high performance in various tasks. Noteworthy papers include TREND, which proposes a novel framework for effective noise mitigation, and DAPPER, which introduces a discriminability-aware policy-to-policy preference-based reinforcement learning approach. Additionally, the integration of behavior trees and dynamic movement primitives has enhanced policy interpretability and adaptability for autonomous systems. Overall, these innovations are advancing the field of preference-based reinforcement learning, enabling more efficient and effective learning from human feedback.

Sources

TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations

Policy-labeled Preference Learning: Is Preference Enough for RLHF?

DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition

Lagrange Oscillatory Neural Networks for Constraint Satisfaction and Optimization

Beyond Predefined Actions: Integrating Behavior Trees and Dynamic Movement Primitives for Robot Learning from Demonstration

Preference Optimization for Combinatorial Optimization Problems

A Generative Neural Annealer for Black-Box Combinatorial Optimization

Built with on top of