OPTIMIZING PARAMETERS FOR SCREENING MODELS: A REVIEW AND COMPARISON OF DIFFERENT GOODNESS OF FIT CRITERIA
Candidate for the Lee B. Lusted Student Prize Competition
Purpose:
We systematically compared the performance of four commonly used goodness of fit (GOF) criteria, an important aspect of model calibration (parameter estimation).
Methods:
We implemented four commonly used GOF criteria (sum of squared errors (SSE), Pearson chi-square, Poisson deviance and binomial deviance) in the Nelder-and-Mead simplex calibration algorithm of the MISCAN-Colon microsimulation model for colorectal cancer. We first used the MISCAN-Colon model with fixed, known parameters to generate a hypothetical dataset of observations for calibration. Next, we used the hypothetical dataset to estimate the sensitivity for cancers and large adenomas with the four different GOF-criteria. Each calibration was repeated with 100 unique sets of random starting values for the parameters that were calibrated to address parameter and stochastic uncertainty. The performance of each GOF criterion was assessed by comparing the estimated parameters with the parameters used to generate the hypothetical dataset in terms of bias and Root Mean Squared Prediction Error (RMSPE). In addition, we compared the computation time of the calibration procedure (in terms of average number of iterations). We performed sensitivity analyses on the type and number of parameters to be estimated and the datasets available for calibration to assess the robustness of our results.
Results:
In the base case scenario, the mean estimated parameters for the sensitivity of the screen test for cancers varied from 0.6838 to 0.6843 (underlying value 0.684, Table). For the sensitivity of the screen test for large adenomas the mean estimated values were exactly equal to the underlying value of 0.179 for all four GOF-criteria. The RMSPE and required number of iterations were slightly higher for the SSE compared to the other GOF-criteria. In all sensitivity analyses, the mean estimated parameters were close to the underlying values using the Pearson chi-square, the Poisson and binomial deviance (Table). The use of the SSE criterion resulted in a biased parameter estimates when calibrating highly correlated parameters, or calibrating to datasets with differing sample sizes.
Conclusion:
Among the most applied goodness of fit criteria likelihood based criteria (Poisson deviance and binomial deviance) and the Pearson chi-square are the best. They lead to accurate estimation of parameters under various circumstances. This study demonstrates that the use of the SSE can easily lead to biased parameter estimates.