Stop Interviews: Use a 90-Minute AI-Graded Skills Test

Description

Replace timezone-juggling interviews with a paid, 90-minute async skills test graded by a calibrated LLM judge. Santi and Kira walk through the complete system: golden-set calibration, pairwise comparison with permutation debiasing, confidence bands for pass/borderline/reject decisions, and targeted human sampling on edge cases. Includes anti-cheat strategies, regional pay bands, and a transparent appeal process. Based on Stanford SCALE research and Chatbot Arena methodology, this episode delivers a deployable hiring pipeline that respects contractor time while ensuring quality.

Listen

Description

Want to check another podcast?