Advances in Bandit Learning and Reinforcement Learning

The field of bandit learning and reinforcement learning is rapidly evolving, with a focus on developing innovative algorithms and frameworks to tackle complex problems in non-stationary environments. Recent developments have centered around improving the efficiency and adaptability of bandit algorithms, incorporating concepts such as fairness, regularity, and curriculum learning. Notably, researchers have made significant progress in addressing the challenges of non-stationary environments, where changing dynamics and rewards require adaptive learning strategies. The introduction of new metrics, such as the Discrepancy of Environment Dynamics, and the development of prioritized experience replay methods, have enabled more sample-efficient learning in these environments. Furthermore, the incorporation of constrained feedback models has expanded the applicability of bandit algorithms to real-world scenarios with limited feedback. Overall, the field is moving towards more robust, efficient, and adaptable learning algorithms that can handle complex, dynamic environments. Noteworthy papers include: the proposal of the multi-play combinatorial semi-bandit problem, which overcomes the limitation of binary decision spaces in traditional combinatorial semi-bandit problems. The introduction of Dynamic Sampling with Curriculum Learning, which targets the unique characteristics of tool learning and achieves superior results. The development of Discrepancy of Environment Prioritized Experience Replay, which enables more sample-efficient learning in non-stationary environments.

Sources

Multi-Play Combinatorial Semi-Bandit Problem

On the Regularity and Fairness of Combinatorial Multi-Armed Bandit

Optimal Algorithms for Bandit Learning in Matching Markets

ToolSample: Dual Dynamic Sampling Methods with Curriculum Learning for RL-based Tool Learning

Sample Efficient Experience Replay in Non-stationary Environments

Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits

Built with on top of