Advances in Reinforcement Learning and Automata Theory

The field of reinforcement learning is moving towards incorporating more complex and nuanced reward structures, with a focus on temporal causality and non-Markovian rewards. This is being achieved through the development of new methods and frameworks that leverage automaton-based feedback, temporal logic, and probabilistic reward machines. These advancements are enabling reinforcement learning agents to learn effective policies for tasks with complex temporal dependencies, and are providing a more scalable and efficient alternative to traditional reward engineering. Noteworthy papers in this area include:

  • A paper that proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments.
  • A paper that introduces a novel approach that leverages automaton-based feedback to guide the learning process, replacing explicit reward functions with preferences derived from a deterministic finite automaton.

Sources

Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment

RLAF: Reinforcement Learning from Automaton Feedback

Expressive Reward Synthesis with the Runtime Monitoring Language

Castor Ministerialis

Inference of Deterministic Finite Automata via Q-Learning

Stochastic Languages at Sub-stochastic Cost

Built with on top of