BAYESIAN NETWORK VERSUS LOGISTIC REGRESSION MODEL FOR COMPUTER AIDED DIAGNOSIS OF BREAST CANCER

Monday, 16 October 2006 - 5:15 PM

BAYESIAN NETWORK VERSUS LOGISTIC REGRESSION MODEL FOR COMPUTER AIDED DIAGNOSIS OF BREAST CANCER

Jagpreet Chhatwal, MS¹, Oguzhan Alagoz, PhD¹, Charles E. Kahn, Jr., MD, MS², and Elizabeth S. Burnside, MD, MPH, MS¹. (1) University of Wisconsin, Madison, WI, (2) Medical College of Wisconsin, Milwaukee, WI

Purpose: We have created two breast cancer risk prediction models based on demographic risk factors and mammography findings. Our objective is to compare the methodology and performance of an artificial intelligence technique (Bayesian network) with a traditional statistical approach (logistic regression) in quantifying the risk of breast cancer.

Methods: A logistic regression (LR) model and a Bayesian network (BN) were developed based on the demographic data and Breast Imaging Reporting and Data System (BI-RADS) descriptors. American College of Radiology recommends all radiologists to use BI-RADS while interpreting and reporting mammograms. 65,890 consecutive findings were recorded between 1999-2004 at an academic tertiary care referral hospital in a National Mammography Database (NMD) standard, which is a national standard for matching mammography practice with state cancer registries.

We have developed a BN model using 37 nodes and 42 probabilistic relationships based on radiologists' knowledge and review of literature, while conditional probabilities were trained on NMD data. We tested the BN using a 10-fold cross-validation procedure. Based on NMD data, an LR model was developed using stepwise selection procedure, which was also tested using 10-fold cross-validation. Outcomes of the two models (probability of malignancy) were compared using area under the receiver operating characteristic curves (Az).

Results: Both BN (Az = 0.940) and LR model (Az = 0.927) performed better than the average performance of radiologists in the literature (Az = 0.85). BN performed significantly better (p-value < 0.05) than the LR model in predicting the risk of malignancy. LR model selected 15 significant BI-RADS descriptors and 5 interaction effects. In addition, most important variables associated with the risk of breast cancer were also identified.

Conclusions: Our BN performs better than a traditional LR model in predicting breast disease based on mammography findings and patient risk factors. In contrast to our LR model, our BN contains expert knowledge in addition to parameters trained on data. Statistical assumptions of additivity, linearity of logits, and no multi-collinearity are also relaxed in BN. These factors may contribute to the BN's superior performance. Our future research will investigate whether the strengths of LR models, such as identification of significant variables and interaction effects, may contribute to further improvement in breast cancer prediction.

See more of Concurrent Abstracts C: Methodological Advances and Applications: Regression
See more of The 28th Annual Meeting of the Society for Medical Decision Making (October 15-18, 2006)