Purpose: Calibrating disease natural history models involves changing model inputs to match multiple targets. To identify the best-fitting input set(s), the model's fit to each individual target is often combined into an overall “goodness-of-fit” measure. We apply a new approach which utilizes the principle of Pareto-optimality for selecting best-fitting inputs and explore implications for cost-effectiveness analysis and estimates of decision uncertainty.
Methods: A set of model inputs is Pareto-optimal if no other input set simultaneously fits all calibration targets as well or better. The Pareto frontier is the set of these undominated input sets, none of which is clearly superior to any other. Constructing the Pareto frontier thus identifies best-fitting inputs without collapsing multiple fits into a single measure. We demonstrate the Pareto frontier approach in the calibration of a simple model, developed for illustrative purposes, and a previously-published cost-effectiveness model of transcatheter aortic valve replacement (TAVR). For each model, we identify input sets on the Pareto frontier and the top input sets ranked by the weighted sum of individual calibration target fits with (i) equal weightings and (ii) a weighting emphasizing a subset of targets. We evaluate the incremental costs and QALYs of the intervention for best-fitting input sets and assess its cost-effectiveness.
Results: After calibrating the simple model, 506 of 10,000 initial input sets were Pareto-optimal. While the 506 top-ranked input sets under the two weighting schemes yielded results localized to different regions of the cost-effectiveness plane, the Pareto frontier set spanned both regions (Figure 1). This resulted in different estimates of intervention cost-effectiveness and the level of decision uncertainty. At a willingness-to-pay of $100,000/QALY, the intervention was cost-effective when evaluated over the Pareto frontier, and optimal for 70% of Pareto-optimal input sets, but not cost-effective when evaluated over top-ranked input sets under weighting (ii) and optimal for just 17% of input sets. The intervention was optimal for 100% of top input sets under weighting (i). Calibrating the previously-published model also yielded differences. At a $100,000/QALY threshold, TAVR was optimal for 38% of the 260 Pareto-optimal input sets, while it was optimal for 55% and 33% of top input sets under weightings (i) and (ii).
Conclusions: The method of identifying best-fitting input sets in model calibration has the potential to influence cost-effectiveness conclusions.