Listen

Description

Claude Opus 4.5 did so well on the METR task length graph they’re going to need longer tasks, and we still haven’t scored Gemini 3 Pro or GPT-5.2-Codex. Oh, also there's a GPT-5.2-Codex.

At week's end we did finally get at least a little of a Christmas break. It was nice.

Also nice was that New York Governor Kathy Hochul signed the RAISE Act, giving New York its own version of SB 53. The final version was not what we were hoping it would be, but it still is helpful on the margin.

Various people gave their 2026 predictions. Let's put it this way: Buckle up.

Table of Contents

  1. Language Models Offer Mundane Utility. AI suggests doing the minimum.
  2. Language Models Don’t Offer Mundane Utility. Gemini 3 doesn’t believe in itself.
  3. Huh, Upgrades. ChatGPT gets some personality knobs to turn.
  4. On Your Marks. PostTrainBench shows AIs below human baseline but improving.
  5. Claude Opus 4.5 Joins The METR Graph. Expectations were exceeded.
  6. Sufficiently Advanced Intelligence. You’re good enough, you’re smart enough.
  7. Deepfaketown and Botpocalypse Soon. Don’t worry, the UK PM's got this.
  8. Fun With Media Generation. Slop [...]

---

Outline:

(00:53) Language Models Offer Mundane Utility

(02:15) Language Models Don't Offer Mundane Utility

(02:55) Huh, Upgrades

(03:21) On Your Marks

(05:15) Claude Opus 4.5 Joins The METR Graph

(12:41) Sufficiently Advanced Intelligence

(15:09) Deepfaketown and Botpocalypse Soon

(18:12) Fun With Media Generation

(22:33) You Drive Me Crazy

(25:29) They Took Our Jobs

(28:45) The Art of the Jailbreak

(28:56) Get Involved

(29:12) Introducing

(32:24) In Other AI News

(33:46) Show Me the Money

(34:44) Quiet Speculations

(38:59) Whistling In The Dark

(40:27) Bubble, Bubble, Toil and Trouble

(42:20) Americans Really Dislike AI

(48:30) The Quest for Sane Regulations

(52:32) Chip City

(55:55) The Week in Audio

(56:45) Rhetorical Innovation

(01:00:53) Aligning a Smarter Than Human Intelligence is Difficult

(01:02:58) Mom, Owain Evans Is Turning The Models Evil Again

(01:06:10) Messages From Janusworld

(01:15:22) The Lighter Side

---

First published:

December 25th, 2025


Source:

https://www.lesswrong.com/posts/GHW2rhYtnYgEn3tuq/ai-148-christmas-break

---

Narrated by TYPE III AUDIO.

---

Images from the article:

ChatGPT personalization settings showing style, tone, and characteristic adjustments.
Bar chart titled
Graph comparing AI model accuracy over time on FrontierMath benchmark, showing Chinese models lagging frontier models.
Graph showing task completion time versus model release date for AI language models from 2019 to 2026.
Graph showing
Graph showing Frontier AI's software R&D capabilities growth from 2019 to 2028, logarithmic scale.
Prediction market showing probability distribution for Claude Opus 4.5 METR time horizon.
Graph showing Epoch Capabilities Index scores from 2022 to 2026, with two trend lines.
Real person in athletic wear and digitally rendered character in superhero suit, both with New York City skyline background.
Bar chart titled
Bar chart showing
A bar chart showing voting preferences based on AI policy positions across different voter groups.
Chart 3: Bar graph showing voter support for AI-related national law policies.</p><p>Chart 4: Bar graph showing support for President Trump issuing executive order on AI protections.
Diagram showing a two-step process for detecting misaligned LLMs using activation analysis.
Matt Bruenig tweets:
Survey table showing public understanding of how ChatGPT works, broken down by gender and party.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.