M-6 MONTE CARLO APPROACH TO CALIBRATION OF DISEASE HISTORY MODELS FOR HEALTH TECHNOLOGY ASSESSMENT: A CASE STUDY

Wednesday, October 27, 2010: 11:30 AM
Grand Ballroom Centre (Sheraton Centre Toronto Hotel)
Ba' Pham, MSc, PhD, (c)1, George Tomlinson, PhD2, Paul Grootendorst, PhD3 and Murray D. Krahn, MD, MSc2, (1)Toronto Health Economics and Technology Assessment Collaborative, Toronto, ON, Canada, (2)University of Toronto, Toronto, ON, Canada, (3)Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada

Purpose: Conceptually simple, Monte Carlo calibration (i.e., random search) is frequently used in the development of disease history models for economic evaluation. We evaluate whether MC calibration determines at least approximately correct values of unknown inputs to a hypothetical model.

Methods: Hypothetical model: a simplified history model of pressure ulcers (i.e., bed sores) in individuals receiving home care. The Markov model includes 3 health states (i.e., ulcer stage 0-1, 2, 3-4) with four transition parameters: weekly incidence of developing a stage-2 ulcer (°0 ), healing rates of stage-2 (q1) and stage 3-4 ulcers (q3), and progression rate from stage 2 à 3-4 (q2). “True” values of °0 and q-q3 were a-priori estimated. Base case analysis: Given the incidence°0, the model was calibrated to observed stage-specific prevalence data to determine the calibration parameters q1-q3. Prevalence was generated from the model using Kolmogorov's forward equations: for observed prevalence, we used true values of °0 and q1- q3; and for projected prevalence, the true value of°0 and randomly generated values for q1-q3.   Sensitivity analysis: MC calibration was evaluated with respect to: i) uncertain incidence°0, ii) multiple calibration targets (i.e., prevalence observed at multiple time points), iii) target misalignment (i.e., different timing between observed and projected prevalence), iv) goodness-of-fit assessment (i.e., Pearson's, likelihood-ratio fit-statistics), v) acceptance criterion for good-fit parameter sets, vi) prior ranges of q1-q3, vii) sampling methods (i.e., random or Latin-hypercube sampling), and viii) sample size (e.g., 1000 to 100,000 random parameter sets). Outcome measures: i) number of good-fit parameter sets from the MC calibration, ii) number of unbiased good-fit parameter sets (i.e., calibrated q1-q3 were within 95% confidence intervals of their true values), and iii) relative errors of individual good-fit parameters.

Results: The MC calibration yielded an ensemble of the good-fit parameter sets, representing post-calibration uncertainty. MC calibration performed well with accurate input data, multiple calibration targets, and perfect alignment. Otherwise, the number of biased good-fit parameter sets increased. MC calibration was robust with respect to variation in methods for goodness-of-fit assessment, acceptance criterion, varied ranges of calibration parameters, sampling methods, and sample size of the random parameter sets.

Conclusions:  Our results provide evidence in support of recently proposed components for the standardized calibration reporting checklist, and suggest areas for further methodological development of model calibration.