Reinforcement Learning under Uncertainty and Non-Linearity

The field of reinforcement learning is moving towards addressing complex challenges posed by non-linear dynamics and partial observability. Researchers are exploring innovative approaches to improve the stability and efficiency of policy optimization algorithms. One notable direction is the incorporation of auxiliary frameworks, such as Koopman Operator Theory, to learn approximately linear latent-space representations of complex systems. Another significant trend is the development of methods that can effectively leverage privileged information, such as simulations, to enhance training in partially observable environments. Noteworthy papers include: KIPPO, which introduces a Koopman-approximation auxiliary network to improve policy learning in continuous control tasks. Guided Policy Optimization, a framework that co-trains a guider and a learner to achieve optimality comparable to direct RL. AM-PPO, a novel enhancement of PPO that adaptively modulates advantage estimates to stabilize gradient updates. Solving General-Utility Markov Decision Processes, which contributes the first approach to solve infinite-horizon discounted GUMDPs in the single-trial regime. Sequential Monte Carlo for Policy Optimization, a novel policy optimization framework for continuous POMDPs that explicitly addresses the challenge of balancing exploration and exploitation.

Sources

KIPPO: Koopman-Inspired Proximal Policy Optimization

Guided Policy Optimization under Partial Observability

AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization

Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

Sequential Monte Carlo for Policy Optimization in Continuous POMDPs

Built with on top of