Two Time Scale Algorithms for Variance-penalized Control and Risk-neutral Semi-Markov Control

Abhijit Gosavi Missouri Univ. of Science and Technology Dept. of Engg. Management & Systems Engg. 219 Engineering
Thursday, 16 Dec 2010 (all day)
A-212 (STCS Seminar Room)
The two time scale framework has been applied in numerous settings for solving Markov decision processes. It has some remarkable properties that allow it to develop solution algorithms for problems that are difficult to solve with single-time-scale algorithms, such as classical value iteration or Q-Learning. In this talk, we will discuss two applications of this framework. The first will be for solving a variance-penalized Markov decision process using dynamic programming. The second application will be for developing an actor critic algorithm that can solve a risk-neutral semi-Markov decision process. For the actor critic, we will present some numerical results from a case study in airline revenue management (this is joint work with Sean Meyn of the University of Illinois and Susan Murray, Ketaki Kulkarni, and Katie Grantham of Missouri S & T).