Are we doing this again? It looks like we are doing this again.
This time it involves giving LLMs several ‘new’ tasks including effectively a Tower of Hanoi problem, asking them to specify the answer via individual steps rather than an algorithm then calling a failure to properly execute all the steps this way (whether or not they even had enough tokens to do it!) an inability to reason.
The actual work in the paper seems by all accounts to be fine as far as it goes if presented accurately, but the way it is being presented and discussed is not fine.
Not Thinking Clearly
Ruben Hassid (12 million views, not how any of this works): BREAKING: Apple just proved AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini don’t actually reason at all.
They just memorize patterns really well.
Here's what Apple discovered:
(hint: we’re not as close to [...]
---
Outline:
(00:53) Not Thinking Clearly
(01:59) Thinking Again
(07:24) Inability to Think
(08:56) In Brief
(10:01) What's In a Name
---
First published:
June 10th, 2025
Source:
https://www.lesswrong.com/posts/tnc7YZdfGXbhoxkwj/give-me-a-reason-ing-model
---
Narrated by TYPE III AUDIO.
---
Images from the article:
The text details technical issues with task design and scoring methodology, focusing on exponential move lists in Tower-of-Hanoi problems and scoring constraints that affect the study's evaluation metrics." style="max-width: 100%;" />
Appears to be a satirical research paper with humorous author names and a comical abstract arguing against AI reasoning capabilities.
The paper's authors are listed as Crimothy Timbleton, Stevephen Pronkeldink, and Grunch Brown." style="max-width: 100%;" />
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.