| Speaker: | Sayak Ray Chowdhury (IIT Kanpur) |
| Organiser: | Abhishek Sinha |
| Date: | Wednesday, 4 Mar 2026, 16:00 to 17:00 |
| Venue: | A-201 (STCS Seminar Room) |
Traditional regret minimization in multi-armed bandits focuses on maximizing cumulative reward, often overlooking fairness across individuals receiving outcomes. Motivated by applications such as clinical trials and resource allocation, Nash regret has been proposed as a fairness-aware metric based on the geometric mean of rewards. In this talk, I will present recent results showing that near-optimal Nash regret can be achieved using simple and general bandit algorithms under mild assumptions in stochastic bandits, along with extensions to a broader class of power-mean fairness objectives. I will then discuss fairness in linear bandits, where we obtain the first order-optimal Nash regret bounds in dimension and introduce a generic meta-algorithm that converts standard linear bandit methods into fairness-aware versions with provable guarantees. Empirical results demonstrate consistent improvements over existing approaches. Overall, the work highlights that fairness can be incorporated into bandit learning without sacrificing statistical efficiency.
Short Bio: Sayak Ray Chowdhury is an Assistant Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Kanpur. His research focuses on sequential decision making under uncertainty, including multi-armed bandits, reinforcement learning, and language model alignment, as well as on privacy and fairness in ML. Prior to joining IIT Kanpur, he was a Postdoctoral Researcher at Microsoft Research India and Boston University. He received his Ph.D. from the Indian Institute of Science (IISc) Bangalore. He is a recipient of the INAE Young Associate award from the Indian National Academy of Engineering.