Advances in Bandit Learning and Optimization

The field of bandit learning and optimization is moving towards addressing complex real-world problems by incorporating fairness, budget constraints, and multi-agent decision-making. Recent developments have focused on designing algorithms that balance exploration and exploitation in various settings, including stochastic multi-armed bandits, restless multi-armed bandits, and distributed multi-agent bandits. Noteworthy papers in this area include 'Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need', which demonstrates that a standard Upper Confidence Bound (UCB) algorithm can achieve near-optimal Nash regret, and 'Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets', which introduces a Neural Index Policy for multi-action restless multi-armed bandits with heterogeneous budget constraints. These advancements have the potential to impact various applications, such as clinical trials, energy communities, and online advertising.

Sources

Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need

Cost-Sensitive Freeze-thaw Bayesian Optimization for Efficient Hyperparameter Tuning

Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making

Neural Index Policies for Restless Multi-Action Bandits with Heterogeneous Budgets

Fair Cost Allocation in Energy Communities: A DLMP-based Bilevel Optimization with a Shapley Value Approach

UCB-type Algorithm for Budget-Constrained Expert Learning

Distributed Multi-Agent Bandits Over Erd\H{o}s-R\'enyi Random Networks

Last Iterate Analyses of FTRL in Stochasitc Bandits

Multi-Task Surrogate-Assisted Search with Bayesian Competitive Knowledge Transfer for Expensive Optimization

Infrequent Exploration in Linear Bandits

Engineering Social Optimality via Utility Shaping in Non-Cooperative Games under Incomplete Information and Imperfect Monitoring

Empirical Bayesian Multi-Bandit Learning

Budgeted Multiple-Expert Deferral