Send us Fan Mail
Find out more
Highlights
* A University of Cambridge study found that top AI models like Claude and ChatGPT matched human degree classifications only about 50% of the time when grading university essays.
* AI consistently undervalued top-tier work and overvalued lowest-ranked essays, exhibiting a "central tendency bias" by assigning middling marks to most submissions.
* AI systems were overly sensitive to linguistic features like essay length, vocabulary variation, and sentence complexity, often rewarding style over substance rather than deep critical thinking.
* The research reinforces that current assessment tasks may not demand enough "depth, care, and imagination" if AI can score well based on surface-level features.
* AI can serve as a supportive tool for error detection, consistency checks, or triaging feedback, freeing educators for higher-order tasks, but it's not ready for final grading.
* Both students and staff emphasized that human assessment is fundamental to trust, motivation, and the "social contract" of education, which AI cannot replicate.
* School leaders should adopt AI strategically, focusing on enhancing human capabilities and addressing existing workflows, rather than solely automating grading for efficiency.
Mentioned
* Dr. Deborah Talmi
* Dr. Alexandru Marcoci
* Dr. Yael Benn
* Claude
* ChatGPT
* Three Ps of assessment (Product, Process, Performance)
* Cognitive stretch
Support the show