Design and refinement of a data quality assessment workflow for a large pediatric research network

Clinical data research networks (CDRNs)—consisting of a variety of health care delivery organizations that share deidentified clinical data for clinical research purposes—constitute yet another collaboratory mechanism for scientific researchers to pool data and make new discoveries. However, one of the faults of CDRN data is that it typically comes from electronic health records (EHRs), which contain data with a lean more towards supporting “clinical operations rather than clinical research.” This means data quality is of the utmost importance when pooling and putting to effective use such disparate data sources. In this research, Khare et al.propose a systematic workflow for making quality assessments of CRDN’s data before use, a workflow that includes hundreds of systematic data checks and a GitHub-based reporting system to track and correct issues in a more timely fashion. They conclude that their publicly available toolkit definitively has value, though implementers should be advised that “sufficient resources should be dedicated for investigating problems and optimizing data” due to the time-intensive nature of the entire process.

