Listen

Description

This paper investigates whether large language models (LLMs) can effectively utilize novel information they learn during a single conversation. While models successfully create internal representations of new concepts from context, the study reveals these representations are often inert and cannot be applied to downstream tasks. Experiments on next-token prediction and adaptive world modeling show that both open-weights and closed-source reasoning models struggle to use this in-context knowledge reliably. Even when a model's latent states reflect a new logical structure, it frequently fails to deploy that structure to solve problems. These findings suggest a significant gap between encoding information and the flexible deployment required for truly adaptable artificial intelligence. Overall, the authors conclude that current architectures require new training methods to move beyond simple data recognition toward functional in-context reasoning.