Advances in Multi-Armed Bandit Algorithms and User Choice Modeling

The field of multi-armed bandit algorithms and user choice modeling is experiencing significant developments, with a focus on improving the exploration-exploitation trade-off and understanding user behavior. Recent studies have highlighted the limitations of offline evaluation protocols for bandits and the need for more robust assessment methodologies. Additionally, there is a growing interest in developing non-parametric choice models that can accurately capture user preferences and behavior. Noteworthy papers in this area include: Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation, which exposes significant inadequacies in offline evaluation protocols. TS-Insight: Visualizing Thompson Sampling for Verification and XAI, which introduces a visual analytics tool to shed light on the internal decision mechanisms of Thompson Sampling-based algorithms. A Non-Parametric Choice Model That Learns How Users Choose Between Recommended Options, which proposes a non-parametric method for estimating the choice model and provides robust user preference inference.

Sources

Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation

TS-Insight: Visualizing Thompson Sampling for Verification and XAI

A Non-Parametric Choice Model That Learns How Users Choose Between Recommended Options

Online Learning with Probing for Sequential User-Centric Selection

Learning with Episodic Hypothesis Testing in General Games: A Framework for Equilibrium Selection

Built with on top of