Listen

Description

This position paper argues against the anthropomorphization of intermediate tokens in large language models, commonly referred to as "reasoning traces" or "chains of thought." The authors contend that these outputs are not genuine reflections of human-like thinking but are instead statistically generated patterns that may lack semantic validity. Research indicates that model performance can improve even when these traces are factually incorrect or nonsensical, suggesting that the connection between a trace and the final answer is often tenuous. Consequently, viewing these tokens as an interpretable window into a model’s logic can lead to a dangerous overestimation of its reliability. The authors call on the scientific community to move away from human-centric metaphors and focus on external verification of solutions. By treating intermediate tokens as a computational tool for the model rather than an explanation for the user, researchers can pursue more effective and honest AI development.