Tata Institute of Fundamental Research

Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning

STCS Seminar

Speaker:	Anirban Majumdar (TIFR)
Organiser:	Raghuvansh Saxena
Date:	Tuesday, 2 Jun 2026, 16:00 to 17:00
Venue:	A-201 (STCS Seminar Room)

(Scan to add to calendar)

Abstract:

Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's $L^*$ algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Our experimental results demonstrate that this method successfully solves threshold-safety problems that remain challenging for existing formal synthesis tools.

This is a joint work with Debraj Chakraborty, Sayan Mukherjee, Prince Mathew and Jean-François Raskin.