The field of reinforcement learning is moving towards incorporating preferences and constraints to improve the learning process. Researchers are exploring ways to fuse rewards and preferences, learning from human feedback, and incorporating physics-informed reward machines to enable more programmable and efficient learning. Action-constrained imitation learning and offline imitation learning are also being investigated to ensure safe behaviors and improve sample efficiency. Notable papers include:
- Fusing Rewards and Preferences in Reinforcement Learning, which introduces the Dual-Feedback Actor algorithm that combines individual rewards and pairwise preferences into a single update rule.
- Learning from Preferences and Mixed Demonstrations in General Settings, which develops a new framing for learning from human data and introduces the LEOPARD algorithm to learn from a broad range of data.
- Physics-Informed Reward Machines, which introduces a symbolic machine to express complex learning objectives and reward structures for RL agents.
- Action-Constrained Imitation Learning, which proposes the DTWIL algorithm to tackle the mismatch of occupancy measure between the expert and the imitator caused by action constraints.
- Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations, which proposes a pre-training stage to learn dynamics representations and enhances IL performance under limited expert data.