Reward Mismatches in RL Cause Emergent Misalignment

Description

Podcast episode for Reward Mismatches in RL Cause Emergent Misalignment.

* 00:00 - Introduction

* 02:53 - Abstract Of The Paper

* 04:15 - The Problem Statement

* 06:16 - The Inoculation Solution

* 08:48 - Cleaning The Data Versus Cleaning The Environments

* 10:31 - No All Of This Does Not Solve Our Most Important Problems

* 15:46 - It Does Help On Important Short Term Problems

The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.

https://open.substack.com/pub/thezvi/p/reward-mismatches-in-rl-cause-emergent?utm_campaign=post-expanded-share&utm_medium=web

Get full access to DWAtV Podcast at dwatvpodcast.substack.com/subscribe

Listen

Description

Want to check another podcast?