The Kullback-Leibler Upper Confidence Bound (KLUCB) Algorithm for regret minimization in K-armed bandits

Speaker:
Anirban Bhattacharjee
Organiser:
Sushant Vijayan
Date:
Friday, 25 Sep 2020, 17:15 to 18:15
Venue:
Zoom link: https://zoom.us/j/98132227553?pwd=K2cyQllKVjExdUhlRm0vc0ZHcEt0Zz09
Abstract
The K-armed bandit problem is a sequential decision making problem wherein one has to sequentially sample from a given set of K probability distributions (belonging to a known family) informally called 'arms of the bandit'. The goal is to minimize the total opportunity cost of not selecting the arm with the highest expected reward, called the regret. We shall look at the Kullback-Leibler Upper Confidence Bound (KLUCB) Algorithm for regret minimization in K-armed bandits, and see how it meets the lower bound on expected regret for our problem.