BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/889
DTSTAMP:20230914T125942Z
SUMMARY:Regret Analysis of Multi Armed Bandits
DESCRIPTION:Speaker: Anand Deo\n\nAbstract: \nConsider a system with K arms
\, arm $i$ yielding a payoff $X_{i}$\, according to the distribution $P_{i
}$. Suppose now\, a gambler who is unaware of $P_{i}$\, samples rewardsfro
m a sequence of arms\, with the objective of maximizing his total expected
reward. TheMulti Armed Bandit (MAB) problem consists of finding a (stocha
stic) sequence of armpulls that achieves a large expected reward. This is
equivalent to minimizing regret\, orthe difference between the expected pa
yoff of a genie strategy\, which knows the best armis beforehand\, and the
chosen sequence of arm pulls. In 1985\, Lai and Robbins provedthat regret
grows at least at the rate log(n)\, where n is the total number of armspu
lled. In this talk\, we will see the construction of simple strategies\, w
hichasymptotically achieve this bound. This talk is based on the work of
Auer\, Cesa-Bianchi\, Fischer (2002). I will assume only basic familiarit
y with probability for this talk.\n
URL:https://www.tcs.tifr.res.in/web/events/889
DTSTART;TZID=Asia/Kolkata:20180706T171500
DTEND;TZID=Asia/Kolkata:20180706T181500
LOCATION:A-201 (STCS Seminar Room)
END:VEVENT
END:VCALENDAR