Classical multi-armed bandit algorithms are tuned to work well for a pre-specified class of reward distributions, defined via their support, or moment/tail bounds. In this work, we consider the problem of oblivious best arm identification, i.e., where the algorithm has no prior information about the class of reward distributions. We establish fundamental limits on the performance of oblivious algorithms, and further propose algorithms that asymptotically meet these limits. Additionally, we allow for risk-aware arm selection, where we balance the expected reward associated with an arm with the corresponding risk.
This talk is based on joint work with Anmol Kagrecha, Ashutosh Kumar, and Krishna Jagannathan.