Tata Institute of Fundamental Research

Distribution oblivious risk-aware bandit algorithms

Speaker: Jayakrishnan Nair (Indian Institute of Technology Bombay)
Organiser: Rahul Vaze
Date: Tuesday, 5 May 2020, 16:00 to 17:00
Venue: A-201 (STCS Seminar Room)

(Scan to add to calendar)
Abstract:  Classical multi-armed bandit algorithms are tuned to work well for a pre-specified class of reward distributions, defined via their support, or moment/tail bounds. In this work, we consider the problem of oblivious best arm identification, i.e., where the algorithm has no prior information about the class of reward distributions. We establish fundamental limits on the performance of oblivious algorithms, and further propose algorithms that asymptotically meet these limits. Additionally, we allow for risk-aware arm selection, where we balance the expected reward associated with an arm with the corresponding risk.
This talk is based on joint work with Anmol Kagrecha, Ashutosh Kumar, and Krishna Jagannathan.