Brooke Hopkins, Founder at Coval. Coval accelerates AI agent development with automated testing for chat, voice, and other objective-oriented systems.
Many engineering teams are racing to market with AI agents, but slow manual testing processes are holding them back. Teams currently play whack-a-mole just to discover that fixing one issue introduces another.
At Coval, they use automated simulation and evaluation techniques inspired by the autonomous vehicle industry to boost test coverage, speed up development, and validate consistent performance.
Episode Highlights:
- Brooke draws a powerful parallel between autonomous vehicles and voice agents, explaining how both rely on simulation to model real-world complexities. She describes simulation as “real-time synthetic data,” allowing systems to interact with synthetic but realistic environments—be it a city street or a conversation with a frustrated customer.
- Not all simulations are useful: some fail by being too unrealistic (e.g., users tolerating rude agents), while others miss edge cases found in real-world scenarios (e.g., unexpected language switches mid-call). Brooke emphasizes the challenge of simulating both the expected and unexpected to build resilient systems.
- To address confusion around choosing speech models (e.g., STT, TTS, VAD), Coval builds public benchmarks measuring speed, consistency, cost, and audio quality. This helps teams make informed decisions, especially in a fast-moving field where new models launch constantly.
- Brooke reframes evals not as brittle unit tests but as defining the product itself—like a living PRD (Product Requirements Document). She advocates for simulation-driven development, where agents are evaluated across scenarios to guide product design, model selection, and compliance (especially in regulated industries like healthcare and finance).
- Coval supports CI/CD pipelines and scheduled evals, enabling teams to run simulations automatically on new releases. Importantly, human review remains essential—especially for subjective metrics like tone, naturalness, or prosody. Brooke calls for a blended system: automation where possible, human judgment where needed.
-----------------------------------------------
Connect with Brooke Hopkins:
https://www.linkedin.com/in/bnhop/
https://www.coval.dev/
Connect with Demetrios:
https://www.linkedin.com/in/dpbrinkm/
Connect with Deepgram:
https://deepgram.com/
https://www.linkedin.com/company/deepgram
https://x.com/deepgramai
https://www.facebook.com/deepgram/
Join the Deepgram Discord Server!
https://discord.com/invite/xWRaCDBtW4