Advances in Multi-Armed Bandit Algorithms and User Choice Modeling

The field of multi-armed bandit algorithms and user choice modeling is experiencing significant developments, with a focus on improving the exploration-exploitation trade-off and understanding user behavior. Recent studies have highlighted the limitations of offline evaluation protocols for bandits and the need for more robust assessment methodologies. Additionally, there is a growing interest in developing non-parametric choice models that can accurately capture user preferences and behavior. Noteworthy papers in this area include: Exploitation Over Exploration: Unmasking the Bias in Linear Bandit Recommender Offline Evaluation, which exposes significant inadequacies in offline evaluation protocols. TS-Insight: Visualizing Thompson Sampling for Verification and XAI, which introduces a visual analytics tool to shed light on the internal decision mechanisms of Thompson Sampling-based algorithms. A Non-Parametric Choice Model That Learns How Users Choose Between Recommended Options, which proposes a non-parametric method for estimating the choice model and provides robust user preference inference.

Advances in Multi-Armed Bandit Algorithms and User Choice Modeling

Sources