In this session, we will explore how we evaluated the translation quality of Google’s Gemma model using the MQM framework and a human-in-the-loop review process.
The case study walks through how LLM-generated translations were assessed using structured error typology, how linguistic quality was benchmarked, and how AI-enhanced workflows can combine automated generation with professional post-editing and evaluation.
We’ll discuss:
How MQM works in real-world AI evaluation
What kinds of errors LLMs produce across languages
Where AI performs well — and where it still struggles
How to design scalable human-in-the-loop evaluation workflows
What this means for localization vendors and enterprise buyers
The session is based on a real case study conducted by Alconost’s MT evaluation team using our MQM evaluation tool.
Full case:
https://alconost.mt/mqm-tool/case-studies/translategemma/