Tuesday, October 21, 2014
Poster Board # PS3-52

Mucahit Cevik, MS, University of Wisconsin Madison, Madison, WI, Mark Craven, PhD, University of Wisconsin - Madison, Madison, WI, Natasha K. Stout, Ph.D., Department of Population Medicine, Boston, MA, Amy Trentham-Dietz, PhD, University of Wisconsin, Madison, WI and Oguzhan Alagoz, PhD, University of Wisconsin-Madison, Madison, WI

Purpose: Model calibration to estimate unobservable model input parameters often involves evaluating large numbers of candidate parameter sets to find suitable values. As such, it can be a laborious and time-consuming activity. Our objective was to improve the calibration process by utilizing machine learning tools such as artificial neural networks (ANN). ANNs are computational algorithms shown to be successful in prediction and coupled with active learning processes are ideally suited to efficiently identify promising candidate model input parameter sets.

Methods: As our test bed, we used a representative calibration process for a breast cancer simulation model that involved examination of 378,000 input parameter sets from which 69 were considered to produce good model fit. We first evaluated a random subset of parameter sets and constructed an ANN model to predict which of all possible sets are more likely to generate desired outputs. We further improved the predictive accuracy of the ANN model by using an active learning approach, where we do the model training iteratively by selecting a small number of input vectors to evaluate by the simulation at every iteration. Active learning allowed us to start with a smaller number of initial simulation runs to form our training set and gradually enlarge the training set by choosing the most promising parameter sets.

Results: We initially evaluated 2000 parameter sets and built our ANN model based on this training set. Using the active learning approach, we found all 69 good fitting parameter sets by evaluating only 4500 of the 378,000 combinations. Compared to the active learning approach, our initial ANN model required evaluating more than 15000 input vectors. Figure 1 shows number of good vectors found by ANN model only and ANN coupled with active learning as number of evaluations increase.

Conclusions: For many simulation models using calibration, evaluating all parameter combinations is prohibitive. Machine learning methods can guide model developers for selecting more promising parameter combinations and hence speeding up the calibration process. Our tests on a previously developed breast cancer simulation model showed that evaluating only 1.2% of all combinations would be sufficient for the calibration of this model.

Figure  SEQ Figure \* ARABIC 1: Active Learning vs ANN