THE VALUE OF MARKERS IN PREDICTION MODELS: NET BENEFIT AND TEST HARM RATHER THAN THE C-INDEX

Monday, October 24, 2011
Grand Ballroom AB (Hyatt Regency Chicago)
Poster Board # 51
(MET) Quantitative Methods and Theoretical Developments

Ben Van Calster, PhD, Katholieke Universiteit Leuven, Leuven, Belgium, Dirk Timmerman, University Hospitals Leuven, Leuven, Belgium and Ewout W. Steyerberg, PhD, Department of Public Health, AE 236, Rotterdam, Netherlands

Purpose: Prediction models are popular in medicine, and many attempts are made to improve models with risk markers or new tests. We aimed to assess methods for the evaluation of the value of markers or tests in prediction models.

Method: Models are often compared using the difference in c-indexes (Δc), but the c-index does not reflect model performance in clinical practice where one decides about giving treatment or not. This decision can be directed by a prediction model using a sensible risk threshold C. One appealing approach to evaluate models within this framework is the net benefit (NB): NB = (TP – wFP)/N, with TP and FP the number of true and false positives, N the sample size and w the relative cost of false versus true positives. This weight can be conveniently derived from the risk threshold as the odds of C. Thus, NB corrects the proportion of true positives with a weighted proportion of false positives (Vickers, MDM 2006). A model with higher NB is clinically more useful at a given threshold C. However, if the difference in test harm (ΔTH) compensates the difference in NB (ΔNB), the model with lower NB may still be preferable. Since ΔNB is a number on the scale of TP/N, 1/ΔNB gives the number of patients that one should be willing to trade for one extra true positive when using the model with highest NB (the test threshold). If fewer patients per extra true positive are acceptable, then the additional test harm of the superior model is too high.

Result: We compared two logistic regression models to diagnose malignancy of ovarian tumors, one with 12 predictors and another where 6 predictors are dropped. On a dataset of 2,757 patients, Δc was 0.015, ΔNB was 0.006 for a decision threshold C=0.10. The test threshold was 172 measurements of the extra predictors for one additional true positive.

Conclusion: Net benefit of a marker or test can meaningfully be expressed in the test threshold. This measure may aid in promoting utility-based evaluation of prediction models with additional measurements of markers or tests.