3I-4
COMPARISON OF LOGISTIC REGRESSION AND BAYESIAN BELIEF NETWORKS FOR PREDICTION OF BREAST CANCER RECURRENCE RISK
Method: Women first diagnosed with early breast cancer (T1-3NanyM0) between 2003-2006 were selected from the Netherlands Cancer Registry (NCR, N=37,320). Based on literature and availability of the data, risk factors for locoregional recurrences (LRRs) and second primary (SP) tumors within five years of first diagnosis were included in both models. For the logistic regression STATA 14.0 was used and for the BBNs Netica (Norsys). BBN structures were developed using naïve, tree-augmented naïve (TAN) Bayes techniques and correlation. A correlation of >0.3 was used to apply the connections. The models were compared with a logistic regression model using the area under the ROC curve and validated using NCR data from 2007-2008 (N=12,308).
Result: Included variables were age, primary tumor size, involved lymph nodes, grade, hormone status, multifocality and whether or not patients were treated with radio-, chemo- or hormone therapy. The BBN structure based on correlation had the most connections between variables (11, vs 10 in the TAN structured BBN). The naïve structure gave the worst estimates in all cases and logistic regression the best in all but one (Table 1).
Conclusion: As SP tumors are independent of the primary tumor, they are harder to predict using the conventional clinical data. Although logistic regression does not provide the extra information from influences between variables, this analysis suggests that it is still more accurate for risk estimation for both LRRs and SP tumors.
Table 1 Area under the ROC curves for the different models.
|
|
Internal validation |
External validation |
|||
Model |
LRR |
SP |
LRR |
SP |
LRR |
SP |
Logistic regression |
0.712 |
0.647 |
0.712 |
0.647 |
0.701 |
0.635 |
BBN (naïve) |
0.627 |
0.584 |
0.500 |
0.500 |
0.574 |
0.547 |
BBN (TAN) |
0.688 |
0.616 |
0.710 |
0.633 |
0.664 |
0.629 |
BBN (correlation) |
0.627 |
0.586 |
0.684 |
0.695 |
0.618 |
0.571 |