Options
Task-agnostic constraining in average reward POMDPs
Publikationstyp
Preprint
Date Issued
2021-03-09
Sprache
English
Author(s)
Citation
Preprint (2021-03-09)
We study the shape of the average reward as a function over the memoryless stochastic policies in infinite-horizon partially observed Markov decision processes. We show that for any given instantaneous reward function on state-action pairs, there is an optimal policy that satisfies a series of constraints expressed solely in terms of the observation model. Our analysis extends and improves previous descriptions for discounted rewards or which covered only special cases.