SGD from the Lens of Markov process: An Algorithmic Stability Perspective

Jatin Batra
Wednesday, 1 Nov 2023, 20:30 to 22:00
via Zoom in A201

In this talk, we delve into the intricate relationships between heavy-tailed distributions, generalization error, and algorithmic stability in the realm of noisy stochastic gradient descent. Recent research has illustrated the emergence of heavy tails in stochastic optimization and their intriguing links to generalization error. However, these studies often relied on challenging topological and statistical assumptions. Empirical evidence has further challenged existing theory, suggesting that the relationship between heavy tails and generalization is not always monotonic. In response, we introduce novel insights, exploring the relationship between tail behavior and generalization properties through the lens of algorithmic stability. Our analysis reveals that the stability of stochastic gradient descent (SGD) varies based on how we measure it, leading to interesting conclusions about its behavior.Expanding upon these findings, we extend the scope to a broader class of objective functions, including non-convex ones. Leveraging Wasserstein stability bounds for heavy-tailed stochastic processes, our research sheds light on the non-monotonic connection between generalization error and heavy tails, offering a more comprehensive perspective.Additionally, we introduce a unified approach for proving Wasserstein stability bounds in stochastic optimization, emphasizing time-uniform stability and its role in various scenarios, including convex and non-convex losses. Our approach is versatile and applicable to popular optimizers, highlighting the importance of ergodicity.