Policy Improvement Algorithm for Zero Sum Two Person Stochastic Games of Perfect Information

Sandeep K Juneja
Thursday, 2 Jan 2014, 16:00 to 17:00
Abstract: If the data defining a problem and at least one solution to the problem lie in the same Archimedean ordered field induced by the data, we say that the problem has order field property.
When this property holds one may hope to do only finitely many arithmetic operations on the data to arrive at such a solution. For example if we start with rational data, the value and a pair of optimal strategies for the players in a zero sum two person game have rational entries. This was first noticed by Herman Weyl, and it was a precursor to solving them via the simplex method. For bimatrix games while Nash exhibited an equilibrium in mixed strategies, it was Vorobev and Kuhn who checked that the order field property holds for bimatrix games. Later Lemke and Howson gave the so called linear complementarity algorithm to locate an equilibrium pair in the same data field. For three  person games, Nash with Shapley constructed simple counter examples for lack of order field property. In general stochastic games fail to have order field property.   
In our search for finite arithmetic step algorithms for stochastic games, perfect information stochastic games stand out for their simplicity of optimal strategies. While we know that they have order field property,  the key issue is how to adapt the policy improvement algorithm of Markov decision processes to stochastic games.  We show how to extend this algorithm for zero sum two person stochastic games of perfect information in discounted as well as Cesaro average payoffs and show how one avoids cycling.