Listen

Description

Life comes at you increasingly fast. Two months after Claude Opus 4.5 we get a substantial upgrade in Claude Opus 4.6. The same day, we got GPT-5.3-Codex.

That used to be something we’d call remarkably fast. It's probably the new normal, until things get even faster than that. Welcome to recursive self-improvement.

Before those releases, I was using Claude Opus 4.5 and Claude Code for essentially everything interesting, and only using GPT-5.2 and Gemini to fill in the gaps or for narrow specific uses.

GPT-5.3-Codex is restricted to Codex, so this means that for other purposes Anthropic and Claude have only extended the lead. This is the first time in a while that a model got upgraded while it was still my clear daily driver.

Claude also pulled out several other advances to their ecosystem, including fast mode, and expanding Cowork to Windows, while OpenAI gave us an app for Codex.

For fully agentic coding, GPT-5.3-Codex and Claude Opus 4.6 both look like substantial upgrades. Both sides claim they’re better, as you would expect. If you’re serious about your coding and have hard problems, you should try out both, and see what combination works [...]

---

Outline:

(01:55) On Your Marks

(17:35) Official Pitches

(17:56) It Compiles

(21:42) It Exploits

(22:45) It Lets You Catch Them All

(23:16) It Does Not Get Eaten By A Grue

(24:10) It Is Overeager

(25:24) It Builds Things

(27:58) Pro Mode

(28:24) Reactions

(28:36) Positive Reactions

(42:12) Negative Reactions

(50:40) Personality Changes

(56:28) On Writing

(59:11) They Banned Prefilling

(01:00:27) A Note On System Cards In General

(01:01:34) Listen All Yall Its Sabotage

(01:05:00) The Codex of Competition

(01:06:22) The Niche of Gemini

(01:07:55) Choose Your Fighter

(01:12:17) Accelerando

---

First published:

February 11th, 2026


Source:

https://www.lesswrong.com/posts/5JNjHNn3DyxaGbv8B/claude-opus-4-6-escalates-things-quickly

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Illuminated gauge displaying numerical values from 1.0 to 5.3.
Benchmark comparison table showing performance scores across five AI models on various tasks.
Graph showing FrontierMath Tiers 1-3 accuracy over time with model releases highlighted.
Line graph titled
Bar chart titled
Table comparing AI model performance across 17 tasks, showing accuracy, cost, and code metrics.
Scatter plot titled
Table comparing AI models across abilities including Humanity, Safety, Assertiveness, and total scores.
Bar graph titled
Two scatter plots showing performance vs cost per task for ARC-AGI-1 and ARC-AGI-2 benchmarks.
Five bar graphs comparing average scores across three models for different scientific benchmarks, with MCQ and open-ended question types shown.
Cartoon horse riding astronaut in space with planet.
Top-down pixel art game showing character in wooden cabin interior with sleeping cat.
Dog sitting at table surrounded by fire and tentacled monsters with coffee mug.
Zvi Mowshowitz tweets:
Bar graph titled
Two bullet points discussing Claude Opus 4.6 safety analysis and API changes regarding prefill mechanisms.
Poll results showing Claude leading with 54.8% for
Zvi Mowshowitz tweets:
Zvi Mowshowitz tweets:
Zvi Mowshowitz tweets:
Zvi Mowshowitz tweets:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.