The principles of robust estimation, designed to maintain performance under distributional deviations and data corruption, are a cornerstone of reliable statistical inference. Despite their impressive capabilities, large language models (LLMs) often produce inconsistent outputs when presented with semantically equivalent inputs, undermining their reliability in critical applications. This inconsistency mirrors classical robustness problems in statistics, where small perturbations in input can lead to dramatically different outcomes.
I will present two interconnected lines of work in this area. First, I will talk about robust high-dimensional regression using density power divergence, a generalization of the maximum likelihood technique, and a number of fundamental results on the asymptotics and influence function of the resulting robust estimators. Second, I discuss my work on the consistency problem---the equivalent of robustness in LLMs---concerned with measuring and minimizing the sensitivity of LLM outputs to input variations through a combination of controlled synthetic data generation and fine-tuning.
The talk will conclude with a discussion of open problems at the intersection of classical robustness methods and AI consistency, including robust training techniques and influence function analysis, for ensuring AI system reliability in high-stakes applications.
Short Bio:
Subho Majumdar is co-founder and head of AI at Vijil, a US-based startup that helps enterprises build and operate trustworthy AI agents. Previously, he was a senior scientist in the security research team at Splunk and the Data Science and AI Research team at AT&T Labs. He has pioneered the use of trustworthy AI methods in multiple companies, wrote a book, and founded multiple nonprofit efforts in this area. He is a recipient of the International Indian Statistical Association (IISA) Early Career Award in Statistics and Data Sciences. His research interests are on the security and reliability of LLMs and statistical machine learning.