BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/1716
DTSTAMP:20260508T043152Z
SUMMARY:Teaching your computer to play chess (Part 2): Bandits\, MCTS\, and
  Approximate Policy Iteration
DESCRIPTION:Speaker: Aakash Ghosh (TIFR)\n\nAbstract: \nBuilding on the the
 oretical limitations of pure TD-learning explored in Part 1\, how do moder
 n neural engines like AlphaZero actually master the game? In the second pa
 rt of this series\, we shift our focus to local search framed as a sequent
 ial decision-making problem under uncertainty. We will introduce the Multi
 -Armed Bandit problem and the UCB1 algorithm\, extending these concepts to
  game trees via Monte Carlo Tree Search (MCTS). Finally\, we will deconstr
 uct the AlphaZero framework\, demonstrating how it utilizes MCTS as a form
 al policy improvement operator to perform Approximate Policy Iteration and
  successfully evade the "Deadly Triad" of deep reinforcement learning.\n
URL:https://www.tcs.tifr.res.in/web/events/1716
DTSTART;TZID=Asia/Kolkata:20260508T160000
DTEND;TZID=Asia/Kolkata:20260508T170000
LOCATION:A-201 (STCS Seminar Room)
END:VEVENT
END:VCALENDAR
