Advances in Reinforcement Learning and Bandit Algorithms

The field of reinforcement learning and bandit algorithms is rapidly evolving, with a focus on developing more efficient, robust, and interpretable methods. Recent research has explored the use of shift-aware upper confidence bound algorithms, adaptive spectral-based linear approaches, and martingale-driven Fisher prompting for sequential test-time adaptation. Additionally, there has been a surge in interest in robust batched bandit algorithms, gradient bandit methods beyond softmax, and EVaR-optimal arm identification in bandits. Noteworthy papers in this area include: Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning, which proposes a shift-aware Q-learning UCB algorithm for non-stationary reinforcement learning. Balancing Interpretability and Performance in Reinforcement Learning: An Adaptive Spectral Based Linear Approach, which presents a spectral-based linear RL method that extends the ridge regression-based approach through a spectral filter function. Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting, which introduces a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. Beyond Softmax: A New Perspective on Gradient Bandits, which establishes a link between a class of discrete choice models and the theory of online learning and multi-armed bandits. Rethinking Langevin Thompson Sampling from A Stochastic Approximation Perspective, which introduces TS-SA, a stochastic approximation-based Thompson Sampling algorithm for multi-armed bandits. Distributed Algorithms for Multi-Agent Multi-Armed Bandits with Collision, which proposes a distributed algorithm with an adaptive, efficient communication protocol for the stochastic Multiplayer Multi-Armed Bandit problem.

Advances in Reinforcement Learning and Bandit Algorithms

Sources