3I ORAL ABSTRACTS: IMPROVING MODELING RESEARCH

Tuesday, October 25, 2016: 10:30 AM - 12:00 PM
Bayshore Ballroom Salon F, Lobby Level (Westin Bayshore Vancouver)
Moderator:

Mark S. Roberts, MD, MPH
University of Pittsburgh School of Medicine

10:30 AM
3I-1

Yao-Hsuan Chen, Ph.D.1, Daniel Brachey, B.S.2, Matthew Farkas, B.S.2, Shabbir Ahmed, Ph.D.3, Joel Sokol, Ph.D.2, Paul G. Farnham, Ph.D.1, Brian M. Gurbaxani, Ph.D.1 and Stephanie L. Sansom, PhD, MPP, MPH1, (1)Centers for Disease Control and Prevention, Atlanta, GA, (2)School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA, (3)School of Industrial & Systems Engineering, Atlanta, GA

Purpose:

Modelers can improve unsatisfactory model calibration results by changing the requirements for model parameters or outcomes to be calibrated, which can increase model uncertainty. In this study, we show that the implementation of the Optimization algorithm-based Calibration Approach (OCA) can significantly improve calibration results without increasing model uncertainty.

Methods:

Using OCA, modelers first transform a calibration task into an optimization problem. In the problem, the objective function quantifies the calibration gap (distance between model outcomes and calibration targets), and constraint functions enforce calibration requirements, such as maintaining feasible bounds of calibration parameters. Modelers then choose an optimization algorithm to minimize the calibration gap, keeping all feasible calibration sets for uncertainty analysis, and report the optimal calibration set for future base-case analysis.

We followed this procedure to calibrate the HIV Optimization and Prevention Economics (HOPE) model, a compartmental model of HIV disease progression and transmission in the United States. We calibrated 123 out of 870 parameters to fit three model outcomes, prevalence, incidence, and deaths, to their targets. We compared the difference in calibration precision when the standard OCA procedure was altered by the choice of (1) the optimization algorithm—pattern search (PS) algorithm versus simulated annealing (SA) algorithm; (2) the search starting point—random point versus good point (low objective function value); and (3) the search strategy—general search over all calibrated parameters versus concentrated search over prioritized parameters. We used the best calibration solution from the Latin Hypercube algorithm and one-way sensitivity analysis results around this solution to inform (2) and (3), respectively.

Results:

Table 1 summarizes the results collected from running each of the 6 settings of OCA for no more than 48 hours to solve the calibration problem. Although the choice of optimization algorithm and starting point did not seem to significantly impact the calibration precision, the concentrated search strategy based on insights from the one-way sensitivity analysis of calibrated parameters significantly improved the calibration performance, decreasing the gaps between model outcomes and their corresponding targets shown in the standard OCA by 61% ((4.75-1.85)/4.75) and 51% ((5.68-2.81)/5.68) for PS and SA, respectively.

Conclusion:

Modelers should explore how different OCA options can close the calibration gap before resorting to relaxing calibration requirements, such as altering the model outcome target values.


10:45 AM
3I-2

Stavroula Chrysanthopoulou, PhD, University of Massachusetts Medical School, N. Worcester, MA
Purpose:

   The purpose of this study is to discuss statistical methods for calibrating and assessing the predictive accuracy of continuous time, dynamic Microsimulation Models used in Medical Decision Making.

Method:

   We apply two fundamental approaches, a Bayesian and an Empirical one, to calibrate the Microsimulation Lung Cancer (MILC) model, a streamlined MSM that describes the natural history of lung cancer and predicts important outcomes, such as lung cancer incidence and mortality. We compare these two methods in terms of theoretical aspects, the potential overlap in the resulting values, as well as the validity of the predictions of the final calibrated model each one produces.

   Furthermore we discuss statistical methods for an important yet rather overlooked aspect of MSMs, namely the assessment of the predictive accuracy of this type of models. In particular, we run a simulation study to compare the individual predictions from the calibrated MILC model with simulated outcomes, using C-statistics, a group of methods that has been widely used for assessing the predictive accuracy of survival models. We also compare the performance of C-statistics with other methods aimed at testing the deviations of survival distributions predicted by the calibrated MSM from the simulated truth.

Result:

   While Empirical Calibration methods prove more efficient, Bayesian methods seem to perform better especially when calibration targets involve rare outcomes. C-statistics are not very sensitive in capturing deviations of the individual predictions from the simulated truth.  Methods based on the comparison of predicted with observed survival distributions prove more effective for assessing the predictive accuracy of continuous time MSMs.

Conclusion:

   An effective calibration procedure of an MSM should combine an Empirical approach, for a more efficientl specification of plausible values for the model parameters, with a Bayesian method that will provide more accurate results choosing appropriate starting values from the previously defined ranges. In addition, techniques based on the comparison of predicted with observed survival curves seem to outperform C-statistics with regards to the assessment of the accuracy of the individual predictions received from a continuous time MSM.

11:00 AM
3I-3

Christoph Zimmer, PhD, Reza Yaesoubi, PhD and Ted Cohen, DPH, MD, MPH, Yale School of Public Health, New Haven, CT

Purpose:

During the period of initial emergence of novel pathogens, the accurate estimation of key epidemic parameters (such as expected number of secondary cases) is challenging because observed metrics (e.g. the number of pathogen-associated hospitalizations) only partially reflect the true state of the epidemic. Stochastic transmission dynamic models are especially useful for guiding decisions during the emergence of novel pathogens given the importance of chance events and the fluctuations in observations when the number of infectious individuals is small. Our goal is to develop and evaluate a method for real-time calibration of stochastic compartmental models using observed, but likely imperfect, epidemic data.

Method:

We develop a calibration method, called Multiple Shooting for Stochastic systems (MSS), that seeks to maximize the likelihood of the epidemic observations. MSS applies a linear noise approximation to describe the size of the fluctuations, and uses each new surveillance observation to update the belief about the true epidemic state. Using simulated novel viral pathogen outbreaks (Figure A), we evaluate our method's performance throughout epidemics of various magnitudes and host population sizes. In this analysis, we assume that the weekly number of new diagnosed cases is available and serves as an imperfect proxy of disease incidence. We further compare the performance of MSS to that of three state-of-the-art and commonly used benchmark methods; Method A: a likelihood approximation with an assumption of independent Poisson observations; Method B: a particle filter method; and Method C: an ensemble Kalman filter method. We use Wilcoxon Signed-Rank test to evaluate the hypothesis that the median of relative errors for MSS is smaller than that of the benchmark methods.

Results:

Our results (Figure B-D) show that MSS produces accurate estimates of basic reproductive number R0, effective R0, and the unobserved number of infectious individuals throughout epidemics. MSS also allows for accurate prediction of the number and timing of future cases and the overall attack rate (Figures E-F). The p-values displayed in Figures B-F confirms that for the majority of scenarios studied here, MSS statistically outperforms the three competing benchmark methods.

Conclusions:

MSS improves on current approaches for model-based parameter estimation and prediction for epidemics and may thus allow for policy makers to respond more effectively and use resources more efficiently in the face of emerging epidemic threats.

11:15 AM
3I-4

Annemieke Witteveen, MSc1, Ingrid M.H. Vliegen, PhD1, Prof. Sabine Siesling, PhD2 and Prof. Maarten J. IJzerman, PhD1, (1)University of Twente, Enschede, Netherlands, (2)Comprehensive Cancer Organisation the Netherlands (IKNL), Utrecht, Netherlands
Purpose:   An accurate breast cancer recurrence risk is required for the development of individualized follow-up schemes. Current risk prediction is often based on logistic regression. In this study, logistic regression estimates were compared with estimates obtained from Bayesian Belief Networks (BBNs). 

Method:   Women first diagnosed with early breast cancer (T1-3NanyM0) between 2003-2006 were selected from the Netherlands Cancer Registry (NCR, N=37,320). Based on literature and availability of the data, risk factors for locoregional recurrences (LRRs) and second primary (SP) tumors within five years of first diagnosis were included in both models. For the logistic regression STATA 14.0 was used and for the BBNs Netica (Norsys). BBN structures were developed using naïve, tree-augmented naïve (TAN) Bayes techniques and correlation. A correlation of >0.3 was used to apply the connections. The models were compared with a logistic regression model using the area under the ROC curve and validated using NCR data from 2007-2008 (N=12,308).

Result:   Included variables were age, primary tumor size, involved lymph nodes, grade, hormone status, multifocality and whether or not patients were treated with radio-, chemo- or hormone therapy. The BBN structure based on correlation had the most connections between variables (11, vs 10 in the TAN structured BBN). The naïve structure gave the worst estimates in all cases and logistic regression the best in all but one (Table 1).

Conclusion:   As SP tumors are independent of the primary tumor, they are harder to predict using the conventional clinical data. Although logistic regression does not provide the extra information from influences between variables, this analysis suggests that it is still more accurate for risk estimation for both LRRs and SP tumors.

Table 1 Area under the ROC curves for the different models.

 

 

Internal validation

External validation

Model

   LRR

   SP

LRR

SP

LRR

SP

Logistic regression

    0.712

    0.647

0.712

0.647

0.701

0.635

BBN (naïve)

    0.627

    0.584

0.500

0.500

0.574

0.547

BBN (TAN)

    0.688

    0.616

0.710

0.633

0.664

0.629

BBN (correlation)

    0.627

    0.586

0.684

0.695

0.618

0.571

11:30 AM
3I-5

Stephen Sy, MS1, Ankur Pandya, PhD1 and Thomas Gaziano, MD, MSc2, (1)Harvard T.H. Chan School of Public Health, Boston, MA, (2)Harvard Medical School, Boston, MA

Purpose:

Disease modelers often conduct external validation using cross-sectional population-level outcomes (e.g. mortality rates), but a unique validation opportunity presents itself when modelers have access to individual-level longitudinal data.

Methods:

We developed a cardiovascular disease (CVD) micro-simulation model that simulates lifetime CVD incidence and mortality.  The model requires individual-level data, drawing randomly with replacement from a representative individual-level dataset and simulating the remainder of each individual's life.  For this exercise, we used individual-level CVD risk factor data (age, sex, cholesterol, etc.) from the 1999-2000 National Health and Nutrition Examination Survey (NHANES) population, which has follow-up all-cause mortality and CVD mortality data for each individual through 2011.  We validated our simulation model to mortality outcomes using two distinct approaches.

Survival curves: We simulated 1,000,000 individuals through the model and tracked their yearly survival.  We compared annual average model population-level all-cause and CVD mortality rates against that observed in the NHANES population.  Non-parametric bootstrapping was used to calculate 95% confidence intervals for observed mortality rates.

ROC curves: We used the same NHANES population and simulated each individual through the model 1,000 times, calculating the percent of iterations each individual died (all-cause or CVD) at five- and ten-year intervals. Individuals were ranked by these values to characterize model-based risk. We then compared these individual-level model-based risk rankings to observed individual-level mortality outcomes in the NHANES data, treating the model as a diagnostic test for mortality risk (where observed outcomes were the reference standard). Receiver operating characteristic (ROC) curves were constructed to calculate area under the curve (AUC) values.

Results:

Using survival curves, five-year all-cause mortality for the simulation model compared to NHANES observed outcomes (n=2,689) was 4.6% versus 4.3% (95% CI: 3.7-4.9%); five-year CVD mortality was 1.2% versus 1.1% (0.8-1.4%).  At ten years, corresponding values were 10.9% versus 11.2% (10.3-12.2%) and 2.6% versus 2.2% (1.8-2.7%).  AUCs for all-cause and CVD mortality at 5 years were 0.80 (0.77-0.83) and 0.82 (0.75-0.88) respectively, and at ten years, 0.83 (0.81-0.85) and 0.85 (0.81-0.88) respectively (Figure).

Conclusion:

Solely relying on population-level survival curves could lead to individual-level mismatch of risk and outcomes; AUC performance alone does not take absolute risk into account.  Our CVD model validation exercise demonstrates that both methods in tandem can provide a well-rounded model performance summary.

11:45 AM
3I-6

Fernando Alarid-Escudero, MS, PhD Candidate, Division of Health Policy and Management, University of Minnesota, Minneapolis, MN, Eva A. Enns, MS, PhD, University of Minnesota, Minneapolis, MN, Chung Yin Kong, PhD, Harvard Medical School, Boston, MA and Lauren E. Cipriano, Ph.D., Ivey Business School at Western University, London, ON, Canada

Purpose:  Disease natural history models often contain parameters that are unknown or unobservable for different reasons (e.g., ethical or financial). Calibration is the process of estimating these parameters by matching model outputs to observed clinical or epidemiological data. Our objective is to compare four different calibration methods on how they perform recovering the true parameters.

Method:  Using a known set of parameters, we used a state-transition model with four health states: Healthy, two stages of illness (S1, S2), and Dead to simulate 1,000 individuals over 30 years in a microsimulation fashion. We produced three different sets of targets: survival, disease prevalence and log-ratio between the two stages of illness. We repeated this procedure 100 times to generate multiple sets of calibration targets. We calibrated a cohort version of the model assuming three input parameters were unknown using four different approaches: 1) two goodness-of-fit (GoF) approaches based on absolute differences with equal and unequal weights, 2) a Bayesian sampling-importance-resampling (SIR) approach, and 3) a Pareto frontier approach. We considered scenarios of varying calibration target data availability with observations every 1, 2, 5 and 10 years. We compared the calibration approaches using three metrics: 1) root mean square error (RMSE) between best-fitting input sets and true parameter values, 2) the proportion of simulations in which true parameter values are contained within the bounding ellipse of best-fitting parameters (coverage), and 3) minimum quantile ellipse that contains the true parameter values.

Result:  For the scenario with targets every 5 years (i.e., 18 calibration targets), the Bayesian approach yielded the smallest RMSE, followed by the Pareto frontier. Pareto frontier had the highest coverage, with 94% of the 95% bounding ellipse including the true parameters, followed by the GoF with unequal weights with 82%. Both GoF with equal weights and Pareto frontier had the lowest minimum coverage with 76%. The rest of the results for this scenario are shown in the table. As the number of targets increased all calibration approaches improved.

Conclusion:  Recovering the truth depends on many system and model properties. The choice of calibration targets matter and contrary to what we expected, more targets may not necessarily be better.

https://smdm.confex.com/data/abstract/smdm/16BEC/Paper_9884_abstract_9241_0.gif