Wharton School of the University of Pennsylvania
Pennsylvania, United States.
- Zoom meeting: https://zoom.us/j/97246120231?pwd=OGhsUTY4UnpyblkrcUxHMnlvbGxmdz09
One of the paramount mathematical mysteries of our times is to be able to explain the phenomenon of deep-learning i.e training neural nets. Neural nets can be made to paint while imitating classical art styles or play chess better than any machine or human ever and they seem to be the closest we have ever come to achieving "artificial intelligence". But trying to reason about these successes quickly lands us into a plethora of extremely challenging mathematical questions - typically about discrete stochastic processes. Some of these questions remain unsolved for even the smallest neural nets! In this talk we will give a brief overview of the major themes of our work in this direction in the last few years.
Firstly we will give highlights of some of our major depth hierarchy theorems and landscape results about neural nets. Then we will explain how for certain nets under mild distributional conditions our iterative algorithms like ``Neuro-Tron", which do not use a gradient oracle, can be proven to train nets in the infinity-norm loss - using as much time/sample complexity as expected from gradient based methods but in regimes where usual algorithms like (S)GD remain unproven. Our theorems include the particularly challenging regime of dealing with non-realizable data while the net is of finite size. Next we will briefly look at our first-of-its-kind results about sufficient conditions for fast convergence of a standard adaptive gradient deep-learning algorithm, the RMSProp.
In the second half of the talk, we will focus on the recent rise of the PAC-Bayesian technology in being able to explain the low risk of certain over-parameterized nets on standardized tests. We will present our recent results in this domain which give bounds which empirically supersede some of the existing theoretical benchmarks in this field and this we achieve via our new proofs about the key property of noise resilience of nets.
This is joint work with Amitabh Basu (JHU), Ramchandran Muthukumar (JHU), Jiayao Zhang (UPenn), Dan Roy (UToronto, Vector Institute), Pushpendre Rastogi (JHU, Amazon), Soham De (DeepMind, Google), Enayat Ullah (JHU), Jun Yang (UToronto, Vector Institute) and Anup Rao (Adobe).
Bio: Anirbit Mukherjee finished his Ph.D. in applied mathematics at the Johns Hopkins University advised by Prof. Amitabh Basu. He is now a post-doc at Wharton (UPenn), Statistics with Prof. Weijie Su. He specializes in deep-learning theory and has been awarded 2 fellowships from JHU for this research - the Walter L. Robb Fellowship and the inaugural Mathematical Institute for Data Science Fellowship. Earlier, he was a researcher in Quantum Field Theory, while doing his undergrad in physics at the Chennai Mathematical Institute (CMI) and masters in theoretical physics at the Tata Institute of Fundamental research (TIFR).