Listen

Description

This episode of "Two Minds, One Model" explores the critical concept of interpretability in AI systems, focusing on Anthropic's research paper "Toy Models of Superposition." Hosts John Jezl and Jon Rocha from Sonoma State University's Computer Science Department delve into why neural networks are often "black boxes" and what this means for AI safety and deployment.

Credits

Cover Art by Brianna Williams

TMOM Intro Music by Danny Meza

A special thank you to these talented artists for their contributions to the show.

—---------------------------------------------------

Links and Reference

Academic Papers

News

Harvard Business School study on companion chatbots 

Misc

We mention Waymo a lot in this episode and felt it was important to link to their safety page: https://waymo.com/safety/

Abandoned Episode Titles

"404: Interpretation Not Found"

"Neurons Gone Wild: Spring Break Edition"

"These Aren't the Features You're Looking For”

"Bigger on the Inside"