Methods: A logistic regression (LR) model and a Bayesian network (BN) were developed based on the demographic data and Breast Imaging Reporting and Data System (BI-RADS) descriptors. American College of Radiology recommends all radiologists to use BI-RADS while interpreting and reporting mammograms. 65,890 consecutive findings were recorded between 1999-2004 at an academic tertiary care referral hospital in a National Mammography Database (NMD) standard, which is a national standard for matching mammography practice with state cancer registries.
We have developed a BN model using 37 nodes and 42 probabilistic relationships based on radiologists' knowledge and review of literature, while conditional probabilities were trained on NMD data. We tested the BN using a 10-fold cross-validation procedure. Based on NMD data, an LR model was developed using stepwise selection procedure, which was also tested using 10-fold cross-validation. Outcomes of the two models (probability of malignancy) were compared using area under the receiver operating characteristic curves (Az).
Results: Both BN (Az = 0.940) and LR model (Az = 0.927) performed better than the average performance of radiologists in the literature (Az = 0.85). BN performed significantly better (p-value < 0.05) than the LR model in predicting the risk of malignancy. LR model selected 15 significant BI-RADS descriptors and 5 interaction effects. In addition, most important variables associated with the risk of breast cancer were also identified.
Conclusions: Our BN performs better than a traditional LR model in predicting breast disease based on mammography findings and patient risk factors. In contrast to our LR model, our BN contains expert knowledge in addition to parameters trained on data. Statistical assumptions of additivity, linearity of logits, and no multi-collinearity are also relaxed in BN. These factors may contribute to the BN's superior performance. Our future research will investigate whether the strengths of LR models, such as identification of significant variables and interaction effects, may contribute to further improvement in breast cancer prediction.