2CEM DETECTING ERRORS IN A CENTRAL LABORATORY: AN EVALUATION OF AN AUTOMATED METHOD TO ASSIST IN DETECTING ERRORS IN CLINICAL TRIAL DATA

Monday, October 19, 2009
Grand Ballroom, Salons 1 & 2 (Renaissance Hollywood Hotel)
Gregory B. Strylewicz, PhD, University of Washington, Seattle, WA and Jason N. Doctor, PhD, University of Southern California, Los Angeles, CA

Purpose: The current research’s objective is to develop and evaluate a method to assist in the detection of potential errors in laboratory data for an interventional clinical trial where treatment effects may be influenced by small errors in the data.

Method: We utilized data from a clinical trial investigating the effect of intensive glycemic control on major cardiovascular disease events to construct training and testing datasets.  Using the training dataset we constructed a Bayesian network to describe the relationship between a subject’s previous fasting glucose and glycated hemoglobin results and their current fasting glucose and glycated hemoglobin results.  We introduced errors into the testing dataset using a synthetic error model and then evaluated the Bayesian network’s performance in identifying those errors by computing the posterior probability of error in each subject’s set of results.  This probability was then used to compute a receiver operating characteristics curve and to compute the area under that curve.  All three laboratory experts from Northwest Lipid Research Laboratories were recruited and completed a survey consisting of 200 sets of laboratory results from the testing dataset.  Their task was to evaluate each subject’s set of results and decide if the results presented were erroneous or not and to provide a confidence rating on a 6-point subjective probability scale.  We then computed a receiver operating characteristics curve for each expert by rank ordering their responses and computed the area under that curve using a smoothing function.

Results: The Bayesian network’s overall area under the receiver operating characteristics curve was calculated to be 0.7948 with a standard error of 0.03 whereas the three laboratory experts had areas under their receiver operating characteristics curve of 0.7285, 0.7292, and 0.7165 with standard errors of 0.04.  This difference in performance was statistically significant for all three experts.  Human experts were also generally overconfident in their ability to detect errors.

Conclusions: The model described herein is, by design, specific to a novel intervention in a specific diabetic population and, therefore, the specific results will have limited generalizability.  Our approach, however, appears to be generalizable to a wider population and the results of this study suggest continuous Bayesian networks, suitably constructed, may serve as an effective tool to assist experts in the review of voluminous laboratory data.

Candidate for the Lee B. Lusted Student Prize Competition