The field of bandit learning and optimization is moving towards addressing complex real-world problems by incorporating fairness, budget constraints, and multi-agent decision-making. Recent developments have focused on designing algorithms that balance exploration and exploitation in various settings, including stochastic multi-armed bandits, restless multi-armed bandits, and distributed multi-agent bandits. Noteworthy papers in this area include 'Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need', which demonstrates that a standard Upper Confidence Bound (UCB) algorithm can achieve near-optimal Nash regret, and 'Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets', which introduces a Neural Index Policy for multi-action restless multi-armed bandits with heterogeneous budget constraints. These advancements have the potential to impact various applications, such as clinical trials, energy communities, and online advertising.
Advances in Bandit Learning and Optimization
Sources
Fair Cost Allocation in Energy Communities: A DLMP-based Bilevel Optimization with a Shapley Value Approach
Multi-Task Surrogate-Assisted Search with Bayesian Competitive Knowledge Transfer for Expensive Optimization