Listen

Description

In this episode of Inference Time Tactics, Cooper and Byron break down NeuroMetric's Thinking Algorithm Leaderboard and what it reveals about building production-ready AI agents. They share why prompt engineering with a single model won't cut it for enterprise use cases, explore the impact of inference-time compute strategies, and discuss what they learned from testing 10 models across real CRM tasks—from surprising token inefficiency to catastrophic failures in SQL generation.

 

We talked about:

 

Resources Mentioned:

CRMArena-Pro from Saleforce:

https://www.salesforce.com/blog/crmarena-pro/

Thinking Algorithm Leaderboard: 

https://leaderboard.neurometric.ai/ 

Connect with Neurometric:
Website: https://www.neurometric.ai/ 

Substack: https://neurometric.substack.com/ 

X: https://x.com/neurometric/ 

Bluesky: https://bsky.app/profile/neurometric.bsky.social

 

Hosts:

Calvin Cooper

https://x.com/cooper_nyc_ 

https://www.linkedin.com/in/coopernyc

 

Guest/s:

Byron Galbraith

https://x.com/bgalbraith 

https://www.linkedin.com/in/byrongalbraith