36SDM COMPARING ARTIFICIAL NEURAL NETWORK TRAINING STRATEGIES FOR BREAST CANCER RISK PREDICTION

Sunday, October 19, 2008
Columbus A-C (Hyatt Regency Penns Landing)
Turgay Ayer1, Oguzhan Alagoz, PhD1, Jagpreet Chhatwal, MS1, Jude W. Shavlik1, Charles E. Kahn, Jr., MD, MS2 and Elizabeth S. Burnside, MD, MPH, MS1, (1)University of Wisconsin, Madison, WI, (2)Medical College of Wisconsin, Milwaukee, WI

Purpose: Artificial Neural Networks (ANNs) are currently in clinical use in systems to detect breast cancer and are being proposed for breast cancer risk prediction. The purpose of this study was to compare the performance of two approaches to build ANNs: the conventionally accepted methodology using a training set enriched with cancers and an alternate approach using a training set representing actual breast cancer prevalence.

Methods:  Our dataset consisted of 62,219 consecutively collected mammographic records matched with our State Cancer Reporting System which includes 510 breast cancers. We built two three-layer feedforward ANNs (Model-I and Model-II) with 1000 nodes in each hidden layers. Model-I was trained on a balanced subset (255 benign and 255 malignant abnormalities) of a large dataset of mammography records, which is the conventional method for training ANNs for breast cancer. Model-II was trained on an unbalanced subset (30,855 benign and 255 malignant abnormalities), where the prevalence of malignant abnormalities reflects clinical practice. For both the models, we kept a validation set to prevent overfitting. We tested both models on the remaining 31,109 abnormalities (30,854 benign and 255 malignant).

Results: We evaluated and compared the performance of the two models using area under the Receiver Operating Characteristic curves (AUROC), and of the calibration curves. The AUROC of 0.971 for Model II was significantly better than 0.921 for Model I (P<0.001). The graphical comparison of the calibration curves demonstrates that Model II has superior calibration as compared to Model I (Figure 1).

Figure 1

Conclusions:  We demonstrate that an ANN trained on data reflecting the prevalence of breast cancer in the population performs significantly better than a model that is trained on a data set enriched with cancer cases. This finding may indicate that the conventional way of training ANNs (on a data set with high cancer prevalence) may diminish performance in both discrimination and calibration for prediction of breast cancer risk.

See more of: Poster Session I

See more of: 30th Annual Meeting of the Society for Medical Decision Making (October 19-22, 2008)