Montúfar, GuidoGuidoMontúfarRauh, JohannesJohannesRauhAy, NihatNihatAy2021-09-232021-09-232021-03-09Preprint (2021-03-09)http://hdl.handle.net/11420/10380We study the shape of the average reward as a function over the memoryless stochastic policies in infinite-horizon partially observed Markov decision processes. We show that for any given instantaneous reward function on state-action pairs, there is an optimal policy that satisfies a series of constraints expressed solely in terms of the observation model. Our analysis extends and improves previous descriptions for discounted rewards or which covered only special cases.enTask-agnostic constraining in average reward POMDPsPreprinthttps://www.mis.mpg.de/de/publications/mis-preprints/2021/2021-9.htmlOther