BEGIN:VCALENDAR
PRODID:-//eluceo/ical//2.0/EN
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
UID:www.tcs.tifr.res.in/event/1630
DTSTAMP:20251017T064231Z
SUMMARY:Calibrated Language Models Must Hallucinate
DESCRIPTION:Speaker: Nishant Das (TIFR)\n\nAbstract: \nRecent language mode
 ls generate false but plausible-sounding text with surprising frequency. S
 uch “hallucinations” are an obstacle to the usability of language-base
 d AI systems and can harm people who rely upon their outputs. This work sh
 ows that there is an inherent statistical lower-bound on the rate that pre
 trained language models hallucinate certain types of facts\, having nothin
 g to do with the transformer LM architecture or data quality. For “arbit
 rary” facts whose veracity cannot be determined from the training data\,
  we show that hallucinations must occur at a certain rate for language mod
 els that satisfy a statistical calibration condition appropriate for gener
 ative language models. Specifically\, if the maximum probability of any fa
 ct is bounded\, we show that the probability of generating a hallucination
  is close to the fraction of facts that occur exactly once in the training
  data (a “Good-Turing” estimate)\, even assuming ideal training data w
 ithout errors. One conclusion is that models pretrained to be sufficiently
  good predictors (i.e.\, calibrated) may require post-training to mitigate
  hallucinations on the type of arbitrary facts that tend to appear once in
  the training set. However\, our analysis also suggests that there is no s
 tatistical reason that pretraining will lead to hallucination on facts tha
 t tend to appear more than once in the training data (like references to p
 ublications such as articles and books\, whose hallucinations have been pa
 rticularly notable and problematic) or on systematic facts (like arithmeti
 c calculations). Therefore\, different architectures and learning algorith
 ms may mitigate these latter types of hallucinations\n
URL:https://www.tcs.tifr.res.in/web/events/1630
DTSTART;TZID=Asia/Kolkata:20251017T160000
DTEND;TZID=Asia/Kolkata:20251017T170000
LOCATION:A-201 (STCS Seminar Room)
END:VEVENT
END:VCALENDAR
