Listen

Description

Last night, on the heels of some rather unfortunate incidents involving the Twitter version of Grok 3, xAI released Grok 4. There are some impressive claimed benchmarks. As per usual, I will wait a few days so others can check it out, and then offer my take early next week, and this post otherwise won’t discuss Grok 4 further.

There are plenty of other things to look into while we wait for that.

I am also not yet covering Anthropic's latest alignment faking paper, which may well get its own post.

Table of Contents

  1. Language Models Offer Mundane Utility. Who is 10x more productive?
  2. Language Models Don’t Offer Mundane Utility. Branching paths.
  3. Huh, Upgrades. DR in the OAI API, plus a tool called Study Together.
  4. Preserve Our History. What are the barriers to availability of Opus 3?
  5. Choose Your Fighter. GPT-4o offers [...]

---

Outline:

(00:43) Language Models Offer Mundane Utility

(05:08) Language Models Don't Offer Mundane Utility

(06:57) Huh, Upgrades

(07:53) Preserve Our History

(11:18) Choose Your Fighter

(12:36) Wouldn't You Prefer A Good Game of Chess

(14:30) Fun With Media Generation

(14:40) No Grok No

(16:29) Deepfaketown and Botpocalypse Soon

(19:15) Unprompted Attention

(20:11) Overcoming Bias

(22:18) Get My Agent On The Line

(23:40) They Took Our Jobs

(27:59) Get Involved

(28:27) Introducing

(30:11) In Other AI News

(32:28) Show Me the Money

(34:59) The Explanation Is Always Transaction Costs

(37:56) Quiet Speculations

(44:23) Genesis

(46:29) The Quest for Sane Regulations

(52:06) Chip City

(52:28) Choosing The Right Regulatory Target

(01:00:42) The Week in Audio

(01:01:15) Rhetorical Innovation

(01:04:33) Aligning a Smarter Than Human Intelligence is Difficult

(01:10:10) Don't Worry We Have Human Oversight

(01:14:09) Don't Worry We Have Chain Of Thought Monitoring

(01:18:47) Sycophancy Is Hard To Fix

(01:21:43) The Lighter Side

---

First published:

July 10th, 2025


Source:

https://www.lesswrong.com/posts/FczrW2kQ7WxGW39Yv/ai-124-grokless-interlude

---

Narrated by TYPE III AUDIO.

---

Images from the article:

This cartoon shows a scientist vaping in a server room, with exaggerated dripping effects.
A workflow diagram showing deep research process and client alerts system.
Bar graphs comparing Gemini 2.5 Flash and Pro models' hint response patterns.</p><p>This visualization shows how different AI models respond to hints, with data separated into
Table comparing AI language models with their philosophical and stylistic characteristics across 6 columns. The table shows different models (Opus 4, Opus, GPT-4o, Gemini, Claude Sonnet, Claude Haiku, Grok) and their corresponding traits in categories like existential style, spiritual-cognitive frame, and narrative approach.
The image shows three visualization plots analyzing miss rates and false alarm rates across different thresholds. The main components are two heatmaps at the top showing miss rates (left, in red) and false alarm rates (right, in blue), plus a scatter plot below displaying model predictions with green and red dots representing correct and incorrect predictions respectively. The visualization includes detailed annotations explaining the metrics and interpretation of the results.</p><p>The plots compare

The bottom scatter plot provides an example case showing the distribution of correct and incorrect model predictions, with decision boundaries marked by dashed lines. The plots are accompanied by a comprehensive legend and explanation of what constitutes ideal performance scenarios." style="max-width: 100%;" />


Pliny the Liberator tweets:
Analysis chart comparing AI models with

The table details characteristics of four AI models (Opus 4, Opus 3/Classic, GPT-4o, Gemini) across categories including existential style, spiritual mode, infection metaphors, and dissolution language." style="max-width: 100%;" />


Table showing precursor and target evaluations for scheming behavior assessment.
David Sacks tweets:
White play button icon on blue background.
White play button icon on blue background.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.