“AI #148: Christmas Break” by Zvi

Description

Claude Opus 4.5 did so well on the METR task length graph they’re going to need longer tasks, and we still haven’t scored Gemini 3 Pro or GPT-5.2-Codex. Oh, also there's a GPT-5.2-Codex.

At week's end we did finally get at least a little of a Christmas break. It was nice.

Also nice was that New York Governor Kathy Hochul signed the RAISE Act, giving New York its own version of SB 53. The final version was not what we were hoping it would be, but it still is helpful on the margin.

Various people gave their 2026 predictions. Let's put it this way: Buckle up.

Table of Contents

Language Models Offer Mundane Utility. AI suggests doing the minimum.
Language Models Don’t Offer Mundane Utility. Gemini 3 doesn’t believe in itself.
Huh, Upgrades. ChatGPT gets some personality knobs to turn.
On Your Marks. PostTrainBench shows AIs below human baseline but improving.
Claude Opus 4.5 Joins The METR Graph. Expectations were exceeded.
Sufficiently Advanced Intelligence. You’re good enough, you’re smart enough.
Deepfaketown and Botpocalypse Soon. Don’t worry, the UK PM's got this.
Fun With Media Generation. Slop [...]

---

Outline:

(00:53) Language Models Offer Mundane Utility

(02:15) Language Models Don't Offer Mundane Utility

(02:55) Huh, Upgrades

(03:21) On Your Marks

(05:15) Claude Opus 4.5 Joins The METR Graph

(12:41) Sufficiently Advanced Intelligence

(15:09) Deepfaketown and Botpocalypse Soon

(18:12) Fun With Media Generation

(22:33) You Drive Me Crazy

(25:29) They Took Our Jobs