Listen

Cast

Description

Sean McGregor and I discuss about why evaluating AI systems has become so difficult; we cover everything from the breakdown of benchmarking, how incentives shape safety work, and what approaches like BenchRisk (his recent paper at NeurIPS) and AI auditing aim to fix as systems move into the real world. We also talk about his history and journey in AI safety, including his PhD on ML for public policy, how he started the AI Incident Database, and what he's working on now: AVERI, a non-profit for frontier model auditing.

Chapters


Links

BenchRisk

AIID

Hot AI Summer

Measuring Generalization

Insurers Exclude AI

Section 230

Relevant Kairos.fm Episodes

Other Links