Improved Strong Upper Bounds for Policy Iteration

Shivaram Kalyanakrishnan
Hariharan Narayanan
Tuesday, 28 Aug 2018, 14:00 to 15:00
A-201 (STCS Seminar Room)
Abstract: Markov Decision Problems (MDPs) are a well-studied abstraction of sequential decision making. Policy Iteration (PI) is a classical, widely-used family of algorithms to compute an optimal policy for a given MDP. PI is extremely efficient on MDPs typically encountered in practice. PI is also appealing from a theoretical standpoint, since it naturally yields "strong" running time bounds for MDP planning. Strong bounds depend only on the number of states and actions in the MDP, and not on additional parameters such as the discount factor and the size of the real-valued coefficients.
It has proven surprisingly difficult to establish tight theoretical upper bounds on the running time of PI. On MDPs with n states and 2 actions per state, the trivial upper bound on the number of iterations taken by PI is 2^{n}. It was not until 1999---nearly four decades after the PI algorithm was first published---that this trivial bound was improved, and that by a mere linear factor, to O(2^{n}/n). In this talk, I will present a recent line of work that has yielded exponential improvements over existing bounds. Our results include a bound of O(1.6001^{n}) iterations for 2-action MDPs, and (2 + ln(k - 1))^{n} iterations for k-action MDPs, k >=3. These upper bounds are currently the tightest known for the PI family.
I will begin by providing background on PI, and then describe the essence of our analysis. I will also present some open problems in the area. The talk is expected to be widely accessible, since the analysis only uses basic ideas from discrete structures and algorithms.

Bio: Shivaram Kalyanakrishnan is an Assistant Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Bombay. His research interests include artificial intelligence and machine learning, spanning topics such as sequential decision making, multi-agent learning, multi-armed bandits, and humanoid robotics. Kalyanakrishnan received a Ph.D. in computer science from the University of Texas at Austin. Subsequently was a Research Scientist at Yahoo Labs Bangalore and an INSPIRE Faculty Fellow at the Indian Institute of Science, Bangalore. His contributions to robot soccer have received two Best Student Paper awards at the annual RoboCup competitions. Kalyanakrishnan was also a member of the first study panel of the One Hundred Year Study on Artificial Intelligence (AI100), which in 2016 released its report titled "Artificial Intelligence and Life in 2030".