BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/1311
DTSTAMP:20230914T125958Z
SUMMARY:The unreasonable effectiveness of mathematics in large scale deep l
 earning
DESCRIPTION:Speaker: Greg Yang (Microsoft New England Research and Developm
 ent Center\, USA)\n\nAbstract: \nRecently\, the theory of infinite-width n
 eural networks led to the first technology\, muTransfer\, for tuning enorm
 ous neural networks that are too expensive to train more than once. For ex
 ample\, this allowed us to tune the 6.7 billion parameter version of GPT-3
  using only 7% of its pretraining compute budget\, and with some asterisks
 \, we get a performance comparable to the original GPT-3 model with twice 
 the parameter count. In this talk\, I will explain the core insight behind
  this theory. In fact\, this is an instance of what I call the Optimal Sca
 ling Thesis\, which connects infinite-size limits for general notions of "
 size" to the optimal design of large models in practice. I'll end with sev
 eral concrete key mathematical research questions whose resolutions will h
 ave an incredible impact on the future of AI.\n
URL:https://www.tcs.tifr.res.in/web/events/1311
DTSTART;TZID=Asia/Kolkata:20230712T110000
DTEND;TZID=Asia/Kolkata:20230712T123000
LOCATION:A201
END:VEVENT
END:VCALENDAR
