Indian Institute of Technology Hyderabad.
- In person @ A-201 and also via Zoom
Since its invention by Robbins and Monro in 1951, the stochastic approximation (SA) algorithm has been a widely used tool for finding solutions of equations, or minimizing functions, with noisy measurements. Current methods for proving its convergence make use of the "ODE" method whereby the sample paths of the algorithm are approximated by the trajectories of an associated ODE. This method requires a lot of technicalities. Interestingly, as far back as 1965, there was a paper by Gladyshev that gave a simple convergence proof based on martingale methods; however, this proof worked for only a class of problems. In this talk I will combine martingale methods with a new "converse theorem" for Lyapunov stability, to arrive at a simple proof that works for the same situations where the ODE method applies. The advantage of this approach is that it can potentially be applied to several problems in Reinforcement Learning (RL), such as actor-critic learning (which is two time-scale SA), or RL with value approximation (which is SA with projections onto a lower-dimensional subspace). These directions are under investigation.