BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/1255
DTSTAMP:20230914T125956Z
SUMMARY:Bandits with Heavy Tails: Algorithms\, Analysis and Optimality
DESCRIPTION:Speaker: Shubhada Agrawal\n\nAbstract: \nMulti-armed bandit (MA
 B) is a popular framework for sequential decision-making in an uncertain e
 nvironment. In the classical setup of MAB\, the algorithm has access to a 
 fixed and finite set K of unknown\, independent probability distributions 
 or arms. At each time step\, having observed the outcomes of all the previ
 ous actions\, the algorithm chooses one of the K arms and receives an inde
 pendent sample drawn from the underlying distribution\, which may be consi
 dered a reward. The algorithm's goal is either to maximize the accumulated
  rewards or to identify the best arm in as few samples as possible for an 
 appropriate notion of best.\nVariants of these classical formulations have
  been widely studied in the literature. Tight lower bounds and optimal alg
 orithms have been developed under the assumption that the arm distribution
 s are either Sub-Gaussian or come from a single parameter exponential fami
 ly (SPEF)\, for example\, Gaussian distributions with a known variance or 
 Bernoulli distributions. However\, in practice\, these distributional assu
 mptions may not hold. Developing lower bounds and optimal algorithms for t
 he general distributions largely remained open\, mainly because of the nee
 d for new tools for the analysis in this generality.\nIn this dissertation
 \, we undertake a detailed study of the MAB problems allowing for all the 
 distributions with a known uniform bound on their (1+\\epsilon)^{th} momen
 ts for some $\\epsilon > 0$. This class subsumes a large class of heavy-ta
 iled distributions. We develop a framework with essential tools and concen
 tration inequalities and use it to design optimal algorithms for three key
  variants of the MAB problem\, including the classical frameworks of regre
 t minimization and best-arm identification.\nA key component of designing 
 an optimal algorithm for MAB is constructing tight\, anytime valid confide
 nce intervals (CIs) for mean. We develop new concentration inequalities to
  this end\, which may be of independent interest.\nThe above results were 
 obtained in collaborations involving Sandeep Juneja\, Wouter M. Koolen and
  Peter Glynn.\n\nZoom link: https://zoom.us/j/94294491152?pwd=MUZiNkk5Sy9
 2Z3VEc3laZUNCTGdDdz09\n
URL:https://www.tcs.tifr.res.in/web/events/1255
DTSTART;TZID=Asia/Kolkata:20221209T113000
DTEND;TZID=Asia/Kolkata:20221209T123000
LOCATION:A201
END:VEVENT
END:VCALENDAR