|
Methods: We constructed a logistic regression model to assess breast cancer risk and evaluated its performance using area under the nonparametric ROC curves (AROC) and PR curves (APR). Precision and recall are the measure of sensitivity and positive predictive value (PPV) of a test, respectively. We plotted sensitivity (X-axis) versus PPV (Y-axis) at different cut-off points to obtain a PR curve. Our data set consists of 62,219 mammography abnormalities (510 malignant and 61,709 benign) observed by radiologists. We simulated the outcome of the model by adding bias to the probability of cancer for malignant cases. First, we added negative bias (underestimating the probability of cancer) to the malignant cases thus reducing the performance of the model. We measured AROC and APR for various bias values. Secondly, we compared one of the biased risk assessment models (model-1) with the radiologists' prediction of breast cancer as measured by Breast Imaging and Reporting Data Systems (BI-RADS) assessment codes, using both ROC and PR curves.
Results: As we increased the magnitude of bias, the best and the worst performance as measured by AROC were 0.965 and 0.815, respectively; whereas, the corresponding APR decreased from 0.550 to 0.035 for the same bias values. AROC of radiologists and model-1 were equal to 0.939 and 0.934, showing no statistical difference (p-value = 0.599); whereas, PR curves showed that radiologists (APR = 0.496) performed significantly better (p-value < 0.001) than the model-1 (APR = 0.448).
Conclusions: ROC curves over-estimate the performance of a model when the test data are unbalanced. In contrast, APR avoids this over-estimation and can demonstrate a statistically significant difference between tests, which is not detected by AROC.