Best arm identification in the fixed confidence setting

Anirban Bhattacharjee
Friday, 4 Nov 2022, 16:00 to 17:00
Multi-armed bandits are sequential decision-making problems represented as a collection of probability distributions that one can sample from at every time step. One way to approach bandit problems is with the target of minimizing the expected regret (penalty for not sampling from the distribution with the highest mean), given a fixed number of times that one can draw samples from this distribution (called the sampling budget). The second way of approaching these problems is with the target of minimizing the expected number of times these distributions need to be sampled from (called the sampling complexity), in order to declare the "best arm" with reasonable certainty, given the extent of certainty that is desired. We shall look at the "best arm" approach to multi-armed bandits when all the probability distributions belong to a single parameter exponential family, the lower bound on the sampling complexity, and how this lower bound may be asymptotically met.