Advancements in Reinforcement Learning with Preferences and Constraints

The field of reinforcement learning is moving towards incorporating preferences and constraints to improve the learning process. Researchers are exploring ways to fuse rewards and preferences, learning from human feedback, and incorporating physics-informed reward machines to enable more programmable and efficient learning. Action-constrained imitation learning and offline imitation learning are also being investigated to ensure safe behaviors and improve sample efficiency. Notable papers include:

  • Fusing Rewards and Preferences in Reinforcement Learning, which introduces the Dual-Feedback Actor algorithm that combines individual rewards and pairwise preferences into a single update rule.
  • Learning from Preferences and Mixed Demonstrations in General Settings, which develops a new framing for learning from human data and introduces the LEOPARD algorithm to learn from a broad range of data.
  • Physics-Informed Reward Machines, which introduces a symbolic machine to express complex learning objectives and reward structures for RL agents.
  • Action-Constrained Imitation Learning, which proposes the DTWIL algorithm to tackle the mismatch of occupancy measure between the expert and the imitator caused by action constraints.
  • Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations, which proposes a pre-training stage to learn dynamics representations and enhances IL performance under limited expert data.

Sources

Fusing Rewards and Preferences in Reinforcement Learning

Learning from Preferences and Mixed Demonstrations in General Settings

Physics-Informed Reward Machines

Action-Constrained Imitation Learning

Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations

Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning

Demystifying Reward Design in Reinforcement Learning for Upper Extremity Interaction: Practical Guidelines for Biomechanical Simulations in HCI

Built with on top of