The field of multi-armed bandits is witnessing significant advancements, with a growing focus on collaborative and decentralized learning methods. Recent developments have led to the creation of novel algorithms that enable efficient exploration and exploitation in complex scenarios, such as bandit problems with multiple predictors, kernelized bandit algorithms, and cooperative multi-armed bandits. These innovative approaches have achieved state-of-the-art regret bounds, often with logarithmic or near-optimal performance guarantees. Notably, some papers have introduced groundbreaking concepts like exploration distributions and collaborative hypothesis testing protocols, which have far-reaching implications for the field.
Noteworthy papers include:
- The paper on Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors, which achieves regret of O(OPT^2/3) and proves a tight lower bound.
- The work on Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms, which proposes a decentralized algorithm A-CAPELLA that achieves logarithmic regret in a generalized regime.
- The study on Collaborative Min-Max Regret in Grouped Multi-Armed Bandits, which introduces an algorithm Col-UCB that dynamically coordinates exploration across groups and achieves optimal minimax and instance-dependent collaborative regret.