BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/889
DTSTAMP:20230914T125942Z
SUMMARY:Regret Analysis of Multi Armed Bandits
DESCRIPTION:Speaker: Anand Deo\n\nAbstract: \nConsider a system with K arms
 \, arm $i$ yielding a payoff $X_{i}$\, according to the distribution $P_{i
 }$. Suppose now\, a gambler who is unaware of $P_{i}$\, samples rewardsfro
 m a sequence of arms\, with the objective of maximizing his total expected
  reward. TheMulti Armed Bandit (MAB) problem consists of finding a (stocha
 stic) sequence of armpulls that achieves a large expected reward. This is 
 equivalent to minimizing regret\, orthe difference between the expected pa
 yoff of a genie strategy\, which knows the best armis beforehand\, and the
  chosen sequence of arm pulls. In 1985\, Lai and Robbins provedthat regret
  grows at least at the rate log(n)\, where n is the total number of armspu
 lled. In this talk\, we will see the construction of simple strategies\, w
 hichasymptotically achieve this bound.  This talk is based on the work of
  Auer\, Cesa-Bianchi\, Fischer (2002). I will assume only basic familiarit
 y with probability for this talk.\n
URL:https://www.tcs.tifr.res.in/web/events/889
DTSTART;TZID=Asia/Kolkata:20180706T171500
DTEND;TZID=Asia/Kolkata:20180706T181500
LOCATION:A-201 (STCS Seminar Room)
END:VEVENT
END:VCALENDAR
