Meeting Brochure and registration form      SMDM Homepage

Monday, October 22, 2007 - 11:15 AM
A-2

CLINICAL PREDICTION MODELS IN DAILY PRACTICE: PRACTICAL SOLUTIONS FOR MISSING PREDICTOR VALUES

Kristel J.M. Janssen, MSc1, Y. Vergouwe, PhD1, A.R.T. Donders, PhD2, F.E. Harrell, PhD3, Q. Cheng, PhD3, Diederick E. Grobbee, PhD, MD1, and K.G.M. Moons, PhD1. (1) University Medical Center Utrecht, Utrecht, Netherlands, (2) Copernicus Institute, Utrecht University, Utrecht, Netherlands, (3) Vanderbilt University School of Medicine, Nashville, TN

Purpose: Clinical prediction models combine patient characteristics and test results to predict the presence of a disease (diagnosis) or the occurrence of an event in the future (prognosis). A physician applying a prediction model for a patient with a missing predictor value needs to be advised how to handle the missing values. We present five strategies that handle missing values and compare the effects on the predictive accuracy of a prediction model. Methods: We developed and externally validated a prediction model consisting of seven predictors that predicts the presence of deep venous thrombosis (DVT) in respectively 1295 and 532 primary care patients. In an application set (259 patients) we mimicked three scenario's, in which an important predictor, a weak predictor and both predictors simultaneously were missing. We used five strategies that handle the missing values: no imputation, applying the submodel of the observed predictors, mean imputation, subgroup mean imputation, and out of sample multiple imputation. We compared the accuracy of the strategies in the application set, by assessing the discrimination (ability to distinguish between patients with the outcome and patients without the outcome, quantified with the area under the Receiver Operating Characteristic curve (ROC area)) and the calibration (agreement between the predicted probabilities and observed frequencies, expressed by a slope and intercept, ideally equal to respectively 1 and 0). Results: The ROC area was 0.90 (95% CI: 0.84-0.96) when there were no missing values in the application set (reference situation). Out of sample multiple imputation led to the best ROC area and no imputation to the worst. The calibration slope was 1.06 when there were no missing values (reference). We could not distinguish a best strategy to improve the slope when there were missing data. The calibration intercept was -0.10 when there were no missing values (reference). No imputation led to the worst and out of sample multiple imputation to the best intercept in all three scenario's.