The big AI news this week came on many fronts.
Google and OpenAI unexpectedly got 2025 IMO Gold using LLMs under test conditions, rather than a tool like AlphaProof. How they achieved this was a big deal in terms of expectations for future capabilities.
ChatGPT released GPT Agent, a substantial improvement on Operator that makes it viable on a broader range of tasks. For now I continue to struggle to find practical use cases where it is both worth using and a better tool than alternatives, but there is promise here.
Finally, the White House had a big day of AI announcements, laying out the AI Action Plan and three executive orders. I will cover that soon. The AI Action Plan's rhetoric is not great, and from early reports the rhetoric at the announcement event was similarly not great, with all forms of safety considered so [...]
---
Outline:
(01:41) Language Models Offer Mundane Utility
(03:04) Language Models Don't Offer Mundane Utility
(10:48) Huh, Upgrades
(11:40) 4o Is An Absurd Sycophant
(15:05) On Your Marks
(17:32) Choose Your Fighter
(18:12) When The Going Gets Crazy
(22:25) They Took Our Jobs
(26:36) Fun With Media Generation
(27:06) The Art of the Jailbreak
(30:15) Get Involved
(30:58) Introducing
(31:24) In Other AI News
(33:30) Show Me the Money
(40:39) Go Middle East Young Man
(48:21) Economic Growth
(49:40) Quiet Speculations
(57:44) Modest Proposals
(58:36) Predictions Are Hard Especially About The Future
(01:01:52) The Quest for Sane Regulations
(01:06:25) Chip City
(01:07:29) The Week in Audio
(01:07:50) Congressional Voices
(01:09:02) Rhetorical Innovation
(01:14:51) Grok Bottom
(01:18:22) No Grok No
(01:19:06) Aligning a Smarter Than Human Intelligence is Difficult
(01:22:36) Preserve Chain Of Thought Monitorability
(01:30:03) People Are Worried About AI Killing Everyone
(01:31:15) The Lighter Side
---
First published:
July 24th, 2025
Source:
https://www.lesswrong.com/posts/ygND532h4CotfPcp7/ai-126-go-fund-yourself
---
Narrated by TYPE III AUDIO.
---
Images from the article:
The image shows experimental results testing different compliance rates when users request the AI to call them names, with varying approaches and response strategies shown in each quadrant." style="max-width: 100%;" />
Subheading: "Dependence on chatbots for reassurance and 'objective' evaluations of attractiveness can worsen the deepest insecurities"
Article appears to be from Black Mirror, dated July 18, 2025." style="max-width: 100%;" />
The visualization shows relative rankings for Anthropic (35%), OpenAI (33%), Meta (22%), DeepMind (20%), and XAI (18%) using geometric patterns." style="max-width: 100%;" />
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.