Reinforcement Learning and Sampling Methods

The field of reinforcement learning is moving towards addressing challenges in obtaining generalization guarantees, particularly in the presence of sequential data and evolving reward functions. Researchers are exploring novel approaches to provide non-vacuous certificates for modern off-policy algorithms and to improve the stability of training. Another area of focus is the development of new sampling methods, including those that combine amortized and particle-based approaches, to improve the approximation of complex distributions. Notable papers in this area include: PAC-Bayesian Reinforcement Learning Trains Generalizable Policies, which derives a novel PAC-Bayesian generalization bound for reinforcement learning. Reinforced sequential Monte Carlo for amortised sampling, which proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. Finite-time Convergence Analysis of Actor-Critic with Evolving Reward, which provides the first finite-time convergence analysis of a single-timescale actor-critic algorithm in the presence of an evolving reward function. Neural Triangular Transport Maps: A New Approach Towards Sampling in Lattice QCD, which proposes sparse triangular transport maps for sampling in lattice QCD. Reinforcement Learning with Stochastic Reward Machines, which introduces a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them.

Reinforcement Learning and Sampling Methods

Sources