Purpose: This study aims to determine whether propensity regression techniques can improve the accuracy of comparative prediction models.
Method: Many clinical decisions can be enhanced by having accurate predicted risks of important outcomes, particularly when these risks are broken down by treatment. Observational data, however, is complicated because treatment allocation may be confounded with subject characteristics. Propensity scores have been proposed as a statistical method for addressing such concerns. This study was conducted in a cohort of ~33,000 type 2 diabetic patients previously used to publish a Cox regression model for predicting 6-year risk of overall mortality. This model compared four oral hypoglycemics for which there was a clear treatment bias. The published model was compared with various Cox regression models that included adjustment for or weighting with the propensity score: (1) adjustment for a logistic propensity score (Adjustment Logistic); (2) inverse probability treatment weighting (IPTW) with the logistic propensity score (IPTW Logistic); (3) adjustment for the multinomial propensity scores (Adjustment Multinomial); (4) IPTW with the multinomial propensity scores (IPTW Multinomial). The methods were compared in their ability to accurately predict 6- year mortality as measured by Harrell’s C-statistic (using 100 random cross-validations), calibration curves and reclassification tables.
Result: The Adjustment Logistic model offered the best median predicted accuracy and outperformed the other methods in 60 out of the 100 cross-validations. Both versions of IPTW created models that on average were less accurate than the published model without propensity adjustment. The C-statistics associated which each of the models in order from best to worst: Adjustment Logistic (0.754), Adjustment Multinomial (0.753), No Propensity (0.752), IPTW Logistic (0.751), IPTW Multinomial (0.737). The median difference in C-statistic for model Adjustment Logistic versus model No Propensity was 0.001. Similar results were obtained for the other performance measures.
Conclusion: Including a covariate for the probability of receiving treatment offered an incremental improvement in bias corrected predictive accuracy. The benefit was negligible and may not be worth the additional analytic complexity in all circumstances. Interestingly, the use of propensity weighting (IPTW) appeared to harm the prediction accuracy of the published model. These findings question the use of propensity adjustment when the goal is to create a comparative prediction model.