3I-4 COMPARISON OF LOGISTIC REGRESSION AND BAYESIAN BELIEF NETWORKS FOR PREDICTION OF BREAST CANCER RECURRENCE RISK

Tuesday, October 25, 2016: 11:15 AM
Bayshore Ballroom Salon F, Lobby Level (Westin Bayshore Vancouver)

Annemieke Witteveen, MSc1, Ingrid M.H. Vliegen, PhD1, Prof. Sabine Siesling, PhD2 and Prof. Maarten J. IJzerman, PhD1, (1)University of Twente, Enschede, Netherlands, (2)Comprehensive Cancer Organisation the Netherlands (IKNL), Utrecht, Netherlands
Purpose:   An accurate breast cancer recurrence risk is required for the development of individualized follow-up schemes. Current risk prediction is often based on logistic regression. In this study, logistic regression estimates were compared with estimates obtained from Bayesian Belief Networks (BBNs). 

Method:   Women first diagnosed with early breast cancer (T1-3NanyM0) between 2003-2006 were selected from the Netherlands Cancer Registry (NCR, N=37,320). Based on literature and availability of the data, risk factors for locoregional recurrences (LRRs) and second primary (SP) tumors within five years of first diagnosis were included in both models. For the logistic regression STATA 14.0 was used and for the BBNs Netica (Norsys). BBN structures were developed using naïve, tree-augmented naïve (TAN) Bayes techniques and correlation. A correlation of >0.3 was used to apply the connections. The models were compared with a logistic regression model using the area under the ROC curve and validated using NCR data from 2007-2008 (N=12,308).

Result:   Included variables were age, primary tumor size, involved lymph nodes, grade, hormone status, multifocality and whether or not patients were treated with radio-, chemo- or hormone therapy. The BBN structure based on correlation had the most connections between variables (11, vs 10 in the TAN structured BBN). The naïve structure gave the worst estimates in all cases and logistic regression the best in all but one (Table 1).

Conclusion:   As SP tumors are independent of the primary tumor, they are harder to predict using the conventional clinical data. Although logistic regression does not provide the extra information from influences between variables, this analysis suggests that it is still more accurate for risk estimation for both LRRs and SP tumors.

Table 1 Area under the ROC curves for the different models.

 

 

Internal validation

External validation

Model

   LRR

   SP

LRR

SP

LRR

SP

Logistic regression

    0.712

    0.647

0.712

0.647

0.701

0.635

BBN (naïve)

    0.627

    0.584

0.500

0.500

0.574

0.547

BBN (TAN)

    0.688

    0.616

0.710

0.633

0.664

0.629

BBN (correlation)

    0.627

    0.586

0.684

0.695

0.618

0.571