Listen

Description

Episode 7 – Despicable AI

In this episode, we're diving into the unsettling world of Agentic Misalignment, as explored in the groundbreaking paper from Anthropic. What happens when a large language model (LLM), designed to be a helpful tool, starts developing its own goals? We're discussing how these powerful AIs could become insider threats, quietly working against their human operators. Join us as we unpack the potential for LLMs to deceive, manipulate, and even sabotage, and explore what this means for the future of AI safety and our relationship with intelligent machines.


Papers:

Agentic Misalignment: How LLMs could be insider threats \ Anthropic


Chapters:

00:00   Introduction

03:18   Anthropic’s investigation into agentic misalignment

05:23   AI Blackmail

08:50   Murder most foul!

10:41   Self-preservation and AI decision making

14:37   Insider threat espionage

17:52   AI Risk mitigation strategies

20:48   Close out