Learning in Large and Structured Environments: Algorithms, Guarantees and Applications

Abhishek Sinha
Wednesday, 29 May 2024, 16:00 to 17:00
A-201 (STCS Seminar Room)
In this talk, I will present some of my work on designing and analyzing algorithms for learning in large and structured environments, where the state and action spaces are huge or even infinite. I will focus on two main topics: (1) Bayesian optimization for hyperparameter tuning in large-scale machine learning models, and (2) Policy optimization for language models using human feedback. For the first topic, I will introduce the Gaussian process optimization framework and design multi-armed-bandit algorithms for hyperparameter optimization. I will show sublinear regret bounds for the proposed algorithms that depend on the information complexity of the objective function to be optimized. Along the way, I will present a self-normalized concentration inequality for vector-valued martingales of arbitrary, possibly infinite, dimension, and discuss some applications of this concentration bound. For the second topic, I will talk about the effects of noisy preference data that can negatively impact language model alignment. I will propose a robust loss function for language model policy optimization in the presence of random preference flips. I will show that the proposed language model policy is provably tolerant to noise and characterize its sub-optimality gap as a function of noise rate, dimension of the policy parameter, and sample size. I will also demonstrate the empirical performance of the proposed policy on various tasks, such as dialogue generation and sentiment analysis. I will conclude with some open problems and future directions of research in large scale machine learning.
Short Bio:
Sayak Ray Chowdhury is a postdoctoral researcher at Microsoft Research, India. Prior to this he was a postdoctoral fellow at Boston University, USA. He obtained his PhD from the Dept of ECE, Indian Institute of Science, where he was a recipient of Google PhD fellowship. His research interests include reinforcement learning, Bayesian optimization, multi-armed bandits and differential privacy. Recently, he has been working towards mathematical and empirical understandings of language models. More details about his research can be found here: https://sites.google.com/view/sayakraychowdhury/home