Principles of Evals: The Future of GenAI Evaluation (E.43)

Description

LLMs are optimized to sound convincing—not to know when they’re wrong. In this episode, Deanna Emery breaks down why hallucinations are fundamentally tied to how language models work, why confidence is often disconnected from correctness, and how better evaluation strategies can make AI systems more reliable in production. We also get into uncertainty, semantic reasoning, and what humans still do better than models.

00:00 — Why LLMs hallucinate confidently
09:00 — The limits of current eval systems
18:00 — Why uncertainty matters in AI
27:00 — Semantic reasoning vs memorization
38:00 — What humans still do better than models

The biggest risk in AI isn’t wrong answers. It’s wrong answers delivered with confidence.

Principles of Evals: The Future of GenAI Evaluation (E.43)

Listen

Description

Want to check another podcast?