BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/1256
DTSTAMP:20230914T125956Z
SUMMARY:Demystifying Approximate Reinforcement Learning Algorithms that use
  epsilon-greedy Exploration
DESCRIPTION:Speaker: Aditya Gopalan (Indian Institute of Science (IISc)\nBe
 ngaluru)\n\nAbstract: \nIn reinforcement learning\, value-function methods
  such as Q-learning and SARSA(0) with $\\epsilon$-greedy exploration are a
 mong the state of the art\, and their tabular (exact) forms converge to th
 e optimal Q-function under reasonable conditions. However\, with function 
 approximation\, these methods are known to exhibit strange behaviors\, e.g
 .\, policy oscillation and chattering\, convergence to different attractor
 s (possibly even the worst policy) on different runs\, etc.\, apart from t
 he well-known instability of iterates. Accordingly\, a theory to explain t
 hese phenomena has been a long-standing open problem\, even for basic line
 ar function approximation (Sutton\, 1999). Our work uses differential incl
 usion theory to provide the first framework for resolving this problem. We
  further illustrate via numerical examples how this framework helps comple
 tely explain these algorithms' asymptotic behaviors. (Joint work with Guga
 n Thoppe\, IISc)\n
URL:https://www.tcs.tifr.res.in/web/events/1256
DTSTART;TZID=Asia/Kolkata:20221209T143000
DTEND;TZID=Asia/Kolkata:20221209T153000
LOCATION:A201
END:VEVENT
END:VCALENDAR