This podcast explores building world models using generative neural networks for reinforcement learning. The authors propose a three-component agent model—vision (V), memory (M), and controller (C)—inspired by human cognition, where V compresses visual input, M predicts future states, and C determines actions. Experiments in CarRacing and VizDoom demonstrate the effectiveness of this approach, achieving state-of-the-art results by training agents within their own hallucinated environments generated by the world model. The paper also discusses challenges like model exploitability and proposes an iterative training procedure to address limitations. Finally, it explores the implications of this approach for more complex tasks and real-world applications.