The field of reinforcement learning is moving towards incorporating more complex and nuanced reward structures, with a focus on temporal causality and non-Markovian rewards. This is being achieved through the development of new methods and frameworks that leverage automaton-based feedback, temporal logic, and probabilistic reward machines. These advancements are enabling reinforcement learning agents to learn effective policies for tasks with complex temporal dependencies, and are providing a more scalable and efficient alternative to traditional reward engineering. Noteworthy papers in this area include:
- A paper that proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments.
- A paper that introduces a novel approach that leverages automaton-based feedback to guide the learning process, replacing explicit reward functions with preferences derived from a deterministic finite automaton.