Advances in Online Learning and Decision Making

The field of online learning and decision making is witnessing significant developments, with a focus on improving the efficiency and optimality of algorithms in various settings. Researchers are exploring new approaches to address complex problems, such as combinatorial settings, episodic Markov decision processes, and multi-objective reinforcement learning. A key direction is the development of near-optimal algorithms that can adapt to different environments and provide robust performance guarantees. Notably, the study of Hedge algorithms has led to a deeper understanding of their near-optimality in combinatorial settings, while new algorithms for episodic MDPs with aggregate bandit feedback have achieved optimal regret bounds. Furthermore, the convergence of regret matching algorithms in potential games and constrained optimization problems has been established, providing new insights into their theoretical foundations. Overall, these advances are contributing to the development of more efficient and effective online learning and decision-making systems. Noteworthy papers include: The paper On the Universal Near Optimality of Hedge in Combinatorial Settings, which establishes the near-optimality of Hedge algorithms in combinatorial settings. The paper Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback, which proposes the first best-of-both-worlds algorithms for episodic tabular MDPs with aggregate bandit feedback.

Sources

On the Universal Near Optimality of Hedge in Combinatorial Settings

Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback

Convergence of Regret Matching in Potential Games and Constrained Optimization

Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring

Online Two-Stage Submodular Maximization

Multi-Objective Reinforcement Learning with Max-Min Criterion: A Game-Theoretic Approach

Built with on top of