Lower Bounds for Best Arm Identification and Regret Minimization for Bandit Strategie

Shubhada Agrawal
Anamay Tengse
Friday, 7 Sep 2018, 17:15 to 18:15
A-201 (STCS Seminar Room)
Abstract : The stochastic multi-armed bandit model is a simple abstraction that has been proven useful in many different contexts in statistics and machine learning. The problem is studied in a number of settings. In 1985, Lai and Robbins proposed a lower bound for the regret incurred by any strategy trying to minimize the cumulative regret. Their idea of the proof can be used to prove lower bounds for other bandit problems as well. In 2016, Kaufmann et al. encapsulated their main idea in terms of an inequality, which can be directly used to prove the lower bounds. In this talk, we will look at this inequality and derive the lower bounds for the best arm identification problem and the regret minimization setting.