Publication Details
Search and Explore: Symbiotic Policy Synthesis in POMDPs
ALEXANDER, B.
Češka Milan, doc. RNDr., Ph.D. (DITS)
JUNGES, S.
KATOEN, J.
Macák Filip, Ing. (DITS)
partially observable Markov decision processes, finite-state controllers, beliefs, inductive synthesis
This paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller - that solely decides based on the observations seen so far - to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.
@inproceedings{BUT185190,
author="ANDRIUSHCHENKO, R. and ALEXANDER, B. and ČEŠKA, M. and JUNGES, S. and KATOEN, J. and MACÁK, F.",
title="Search and Explore: Symbiotic Policy Synthesis in POMDPs",
booktitle="Computer Aided Verification",
year="2023",
series="Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
volume="13966",
pages="113--135",
publisher="Springer Verlag",
address="Cham",
doi="10.1007/978-3-031-37709-9\{_}6",
isbn="978-3-031-37708-2"
}