Sequential Decision-Making under Uncertainty and Constraints: A Synthesis of Differential Games and Stochastic Bandits

Speaker:

Sushant Vijayan

Organiser:

Hariharan Narayanan

Date:

Friday, 13 Jun 2025, 15:30 to 16:30

Venue:

via Zoom in A201

Category:

Synopsis Seminar

event.ics Print

Abstract

In this thesis, we study sequential decision making problems under uncertainty and constraints. We consider four different problems:

For online regret minimization in linear bandits with ${prior}$ observations, we propose a phased elimination algorithm ${OOPE}$ and give its regret upper bound in terms of the offline data's Grammian eigenspectrum. We establish regret lower bounds that help establish the minimax optimality of OOPE in certain regimes and improve on existing work. This also shows that the quality of offline data is well measured by the eigenspectrum of the Grammian matrix of the offline data.
In automated bidding systems with ${unknown value}$ and long-term budget and Return-on-Spend (RoS) constraints, a UCB-based computationally efficient algorithm is developed. We do away with restrictive assumptions like strict Slater feasibility, truthfulness of the auction, and obtain optimal regret and constraint violation bounds with logarithmic dependence on the bid domain size.
We model infectious disease spread as a differential game between a planner and the populace, characterizing ${open-loop}$ Nash equilibria using Pontryagin's Minimum Principle. We use the developed model to study the qualitative characteristics in equilibrium and bring out the crucial roles of infection detection rates and public trust in reported infection numbers.
For Active Simple Hypothesis Testing (ASHT) with a fixed budget, we characterize the minimax error exponent as the value of a zero-sum differential game. This reformulation leads to a more computationally tractable algorithm compared to prior work. This problem has attracted significant interest in many different research communities like Simulation, Operation Research, Information Theory and Bandits. This characterization of the error exponent and the development of provably optimal, computationally tractable algorithm constitute significant progress into the problem. However, even this provably optimal algorithm suffers from a \emph{curse of dimensionality}. We propose a more efficient algorithm leveraging a novel link to Blackwell Approachability. This more efficient approachability algorithm provably outperforms non-adaptive strategies and is numerically verified to attain the optimal exponent in certain instances.

The first two problems utilize only bandit techniques and the third employs methods of differential games; the final problem combines differential game theory with a classical bandit setup, showcasing a ${synthesis}$ of both distinct and integrated applications of these two frameworks.