PAC Subset Selection in Stochastic Multi-armed Bandits

Rahul Vaze
Friday, 1 Nov 2013, 11:00 to 12:00
AG-66 (Lecture Theatre)
Abstract: We consider the problem of selecting, from among n real-valued random variables, a subset of size m of those with the highest means, based on efficiently sampling the random variables. This problem, which we denote Explore-m, finds application in a variety of areas, such as stochastic optimization, simulation and industrial engineering, and on-line advertising. The theoretical basis of our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the single best of n random variables (Explore-1). Under a PAC setting, we provide algorithms for Explore-m and bound their sample complexity.

Our main contribution is the $LUCB$ algorithm, which, interestingly, bears a close resemblance to the well-known $UCB$ algorithm for regret minimization.  We derive an expected sample complexity bound for $LUCB$ that is novel even for single-arm selection. We then improve the problem-dependent constant in this bound through a novel algorithmic variant called $KL-LUCB$.  Experiments affirm the relative efficiency of $KL-LUCB$ over other algorithms for Explore-m. Our contributions also include a lower bound on the worst case sample complexity of such algorithms.

Bio: Shivaram Kalyanakrishnan is a scientist at Yahoo Labs Bangalore. His primary research interests lie in the fields of artificial intelligence and machine learning, spanning areas such as reinforcement learning, agents and multiagent systems, humanoid robotics, multi-armed bandits, and on-line advertising. He obtained his Ph.D. in Computer Science from the University of Texas at Austin (2011), and his B.Tech. in Computer Science and Engineering from the Indian Institute of Technology Madras (2004). He has extensively used robot soccer as a test domain for his research, and has actively contributed to initiatives such as RoboCup and the Reinforcement Learning competitions.