We can imagine our health as a jigsaw, with each individual piece representing a different aspect of our medical history. These pieces might include blood test results, X-ray images or the notes taken by a doctor as we describe our symptoms. These jigsaw pieces are ultimately recorded and stored in electronic health records (or EHRs). EHRs are a valuable resource, providing an overview of someone’s health and they could have the potential to allow clinicians and researchers to unlock new medical insights. However, there’s a fly in the ointment - not all the pieces in such records always fit together correctly, and they may not completely capture the required information. Some clinical event documentation may not be complete, others do not align with related pieces, and some events are even missing entirely. This data quality problem was tackled by Dr. Hanieh Razzaghi of the Children’s Hospital of Philadelphia, and her colleagues, in their innovative work on the PRESERVE study, a research project exploring chronic kidney disease in children (the PRESERVE study itself was led by Drs. Michelle Denburg and Christopher Forrest). Using EHRs from 15 different hospitals across the United States, the team aimed to understand how various treatments could potentially slow down chronic kidney disease progression. However, initially, they had to make sure that the data they were relying on were accurate, reliable, and suitable for the required complex analyses.