Methods: We developed a Bayesian Network (BN) to predict the risk of invasive and in-situ breast cancer based on 1,533 prospectively collected consecutively diagnosed breast cancers (573 in-situ and 960 invasive) at the University of California San Francisco Medical Center between 1/6/1997 to 6/29/2007. We used a combination of structured data and natural language processing on dictated reports to extract variables including age, personal and family history of breast cancer, and imaging features according to the standardized breast imaging lexicon BI-RADS to train the BN using a Tree Augmented Naïve Bayes (TAN) algorithm. We validated the BN using 10-fold cross-validation, and measured performance in discriminating between invasive and in-situ cancer by estimating the area under the receiver operating characteristic (ROC) curve.
Results: The BN was able to discriminate between invasive and in-situ breast carcinoma with an area under the ROC curve of 0.832. Using TAN, we could identify conditional dependencies among the mammography features and patient demographic factors that were predictive of invasive disease.
Conclusions: Our BN, which is constructed from the variables observed by radiologists during their daily clinical practice, quantifies the risk of invasive versus in-situ breast cancer. Understanding the risk of invasive disease can aid in the clinical management decisions such as the need for increased sampling at biopsy and the appropriate selection of surgical interventions.