Information Theoretic Perspectives on Learning Algorithms
R. Narasimhan Lecture
Speaker:
Varun Jog (University of Wisconsin-Madison)
Date:
Wednesday, 2 Jan 2019, 16:00 to 17:00
Venue:
Homi Bhabha Auditorium
(Scan to add to calendar)
Abstract:
Abstract: In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. We overview some recent work [Xu and Raginsky (2017)] that bounds generalization error of empirical risk minimization based on the mutual information I(S;W) between the algorithm input S and the algorithm output W. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure, such as stochastic gradient Langevin dynamics (SGLD). We describe certain shortcomings of mutual information-based bounds, and propose alternate bounds that employ the Wasserstein metric from optimal transport theory. We compare the Wasserstein metric-based bounds with the mutual information-based bounds and show that for a class of data generating distributions, the former leads to stronger bounds on the generalization error