Selecting Belief-State Approximations in Simulators with Latent States

Description

This research focuses on the complex problem of selecting the optimal approximation for the **belief state**—the posterior distribution over unobservable **latent states**—which is necessary for enabling state resetting in advanced simulators. The authors reduce this to a **conditional distribution-selection** task and develop an algorithm that operates with only sampling access to the simulator and candidate belief states. Two distinct selection formulations are proposed: **latent state-based selection**, which targets the accuracy of the hidden states, and **observation-based selection**, which targets the accuracy of the induced observable dynamics. Crucially, the paper investigates how the selected approximation influences downstream tasks like estimating Q-values using **Monte-Carlo roll-outs**, differentiating between two protocols, **Single-Reset** and **Repeated-Reset**. They find that observation-based selection surprisingly fails to provide guarantees under the natural **Single-Reset** procedure but succeeds when using the unconventional **Repeated-Reset** roll-out.

Listen

Description

Want to check another podcast?