Listen

Description

Google recently came out with Gemini-2.5-0605, to replace Gemini-2.5-0506, because I mean at this point it has to be the companies intentionally fucking with us, right?

Google: Our updated Gemini 2.5 Pro Preview continues to excel at coding, helping you build more complex web apps. We’ve also added thinking budgets for more control over cost and latency. GA is coming in a couple of weeks…

We’re excited about this latest model and its improved performance. Start building with our new preview as support for the 05-06 preview ends June 19th.

Sundar Pichai (CEO Google): Our latest Gemini 2.5 Pro update is now in preview.

It's better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads @lmarena_ai with a 24pt Elo score jump since the previous version.

We also heard your feedback [...]

---

Outline:

(02:06) What's In A Name

(02:49) On Your Marks

(08:05) Gemini 2.5 Flash Lite Preview

(08:24) The Model Report

(11:33) Safety Testing

(15:17) General Reactions to Gemini 2.5-Pro 0605

(16:48) The Personality Upgrades Are Bad

(19:36) The Lighter Side

---

First published:

June 18th, 2025


Source:

https://www.lesswrong.com/posts/eJaQvvebZwkTAm5yd/gemini-2-5-pro-from-0506-to-0605

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Graph comparing AI model performance versus price per million tokens, highlighting Gemini versions.
Bar graph showing performance across difficulty levels for different software versions.
Contributors list for
Table showing policy and helpfulness violations for different Gemini AI models (ART metrics).
Bar chart titled
Bar graph comparing output tokens per second across AI models from different companies.</p><p>The graph, titled
Benchmark comparison table showing Gemini 2.5 Pro performance across different testing categories.
Comparison table showing features of six Gemini AI models, from 1.5 to 2.5.</p><p>This table details different specifications like input/output modalities, length limits, thinking capabilities, tool support, and knowledge cutoff dates across various Gemini versions (Flash, Pro, and Flash-Lite variants).
Performance comparison table showing Gemini model versions across different capability benchmarks.</p><p>The table presents comprehensive benchmark results for Gemini 1.5 through 2.5 models, evaluating capabilities in code, reasoning, factuality, multilingualism, math, long-context text, and image understanding domains.
Comparison table showing AI model performance across different benchmark categories and capabilities.</p><p>The table compares Gemini 2.5 Pro, o3, o4-mini, Claude 4 Sonnet, Claude 4 Opus, Grok 3 Beta, and DeepSeek R1 across categories like Code, Reasoning, Factuality, Math, Long-context, and Image Understanding.
Table comparing attack success rates against different Gemini AI model versions.</p><p>The table shows comparison data for various attack techniques (Actor Critic, Beam Search, TAP) tested against different versions of Gemini models (2.0 Flash-Lite, 2.0 Flash, 2.5 Flash, 2.5 Pro) compared to Gemini 1.5 Flash 002, with success rates and percentage changes indicated in parentheses.
Comparison table of AI models' performance metrics across various benchmarks and pricing</p><p>The table compares Gemini 2.5 Pro, OpenAI models, Claude Opus 4, Grok 3 Beta, and DeepSeek R1 across multiple categories including:<br />
- Input/output pricing<br />
- Reasoning & knowledge tests<br />
- Science assessments<br />
- Mathematics evaluations<br />
- Code generation/editing<br />
- Factuality measures<br />
- Visual/video understanding<br />
- Long context handling<br />
- Multilingual performance</p><p>Gemini 2.5 Pro shows strong performance across most metrics, with scores marked by emoji reactions. The pricing ranges from $0.55 to $75.00 depending on the model and usage type.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.