ORAL ABSTRACTS: ECONOMIC EVALUATION METHODS
David O. Meltzer, MD, PhD
University of Chicago
Professor of Medicine
Section of Hospital Medicine
Purpose: Parameter estimates used in cost-effectiveness analysis (CEA) are informed by empirical studies. However, existing studies might not be rigorous or directly relevant to the CEA's setting. We quantify the opportunity cost of biased research using a CEA that informed UK antenatal care policies.
Methods: We used a CEA of antenatal prophylaxis strategies in rhesus-negative pregnant women: anti-D prophylaxis for all; for primigravidae only; or for none. Existing anti-D prophylaxis studies are neither historically controlled, nor rigorously designed and conducted, and not directly relevant to the modern era. We incorporated corrections for bias and lack of relevance in parameters used in the CEA. We obtained estimates of crude and bias-corrected anti-D prophylaxis effectiveness from a previously published quantitative bias analysis, which used elicited opinion and constructed prior distributions to represent the biases of each study. We conducted separate CEAs, with and without such bias corrections. We estimated the opportunity cost of lack of rigor and relevance of research with a value of information analysis (VOI).
Results: Bias corrections changed the point estimates of anti-D prophylaxis effects little but inflated their variances substantially. Without bias-corrections, treating all rhesus negative pregnant women becomes the optimal strategy at a willingness-to-pay (WTP) of £13,000 per quality-adjusted life year (QALY). On the bias-corrected CEA, the optimality threshold increases to £16,000/QALY. In the VOI, effectiveness of anti-D prophylaxis was the most valuable parameter over the examined range of WTP thresholds, reaching a population expected value of partial perfect information (EVPPI) of £700,000 at a WTP of £16,000/QALY. The bias-correction on the treatment's effectiveness was the second most valuable parameter in terms of EVPPI, peaking at £620,000 at the same WTP. Other parameters of the CEA combined, without considering the risk of sensitization, account for just over half of the EVPPI of the bias parameter (Figure). Results were analogous in sensitivity analyses that used alternative opinion-elicited bias corrections.
Conclusions: Uncertainty associated with parameters used in CEAs should reflect not only sampling variance of empirical studies, but also the uncertainty stemming from less than perfect design, execution or analysis of such studies. The latter is rarely accounted for in economic evaluations, but can tramp the former. Research performed sub optimally can be associated with a large opportunity cost that is routinely left unclear.
Method: We developed a three-state economic model comparing bevacizumab + capecitabine (new strategy) to capecitabine alone (comparator) based on the results of the AVEX trial (Cunningham et al 2013). The three model states were progression-free, progressed and dead and were populated using both partitioned survival and Markov modelling approaches. Since patient-level data were not available, we recreated patient-level data using the methods of Guyot et al (2012) and fit parametric distributions to inform survival estimates. We also sought external study sources to estimate Markov transition probabilities for death from the intermediate state of progression, which in the absence of further data, was assumed to be the same regardless of initial treatment strategy. We contrast the data and assumptions used to populate each model type and demonstrate the implications for the estimated extra costs and effects produced.
Result: The partitioned survival model and Markov model produced similar incremental costs ($53,209, $53,902, respectively), and each produced 0.263 QALY gains in the progression-free state for the new strategy. Overall, the partitioned survival analysis produced an incremental 0.186 QALYs for the new strategy, due to 0.077 QALYs lost in the progressed state. In contrast, the Markov model produced an incremental 0.245 QALYs, due to only 0.018 QALYs lost in the progressed state. These differences led to ICERs of $286,121 and $220,027 per QALY gained for partitioned survival and Markov models, respectively. In the AVEX trial, PFS gain was larger than the OS gain. The partitioned survival model outcomes appeared to accurately reflect the trial data, while the Markov model survival did not align well with overall survival from the trial without further calibration and assumptions.
Conclusion: Uncertainties about partitioned survival models have often been accompanied by recommendations to pursue a Markov model. Both partitioned survival models and Markov models have limitations that can affect their ability to accurately reflect clinical reality. In this example, we demonstrated a scenario in which a commonly used approach for Markov models can lead to results that overestimate treatment benefits. Our study results suggest it is appropriate to consider both modelling types to address uncertainty in an analysis because they characterize and extrapolate risks differently.
Purpose: Probabilistic sensitivity analysis (PSA) is a recommended approach and is a necessary step for undertaking value of information analysis. However, conducting PSA can be computationally challenging in individual-level state-transition models (I-STMs) because of two levels of uncertainty: first- and second-order. Published guidelines suggest a careful evaluation of a balance between the inner and outer simulation loops. Our purpose was to evaluate the need for such a balance and to find the optimal combination to conduct PSA in I-STMs.
Method: We used a previously published and validated I-STM that evaluated the cost-effectiveness of hepatitis C treatment. Second-order, parameter uncertainty was defined using the recommended statistical distributions. We conducted PSA multiple times using five different combinations of inner (i.e., first-order) and outer loops (i.e., second-order) labelled as A-E. The total number of computational runs was equal to 1 million in all combinations. Using independent initial random-seeds, we ran PSA 20 times for each combination and obtained 20 sets of results and plotted 20 cost-effectiveness acceptability curves (CEACs). Our rationale was to determine variability in outcomes resulting from joint first- and second-order uncertainty. We also estimated standard error (SE) (from 20 runs) in mean costs and quality-adjusted life years (QALYs).
Result: Figure 1 A-E shows the CEACs associated with each of 20 sets of PSA runs for each combination of inner and outer loops. As the number of outer loops increases the variation caused by the random seed falls such that there is overlap across the CEACs. For the combination with 1 million outer loops (Fig E), the variability in the CEACs is completely resolved (i.e. the CEACs completely overlap), however, these CEACs are skewed downwards, compared to the other combinations. This is because when only 1 inner loop is used it is not possible to average across the individuals and hence the full impact of 1st order uncertainty is retained.
Conclusion: To run PSA in I-STMs, the right balance between inner-outer loops is needed. Using extreme combinations will result in inappropriate results which would impact on decisions regarding cost-effectiveness and the value of further research. Our empirical analysis also indicates that under conditions of constrained computational time, using more outer loops than inner loops should be preferred.
Methods: We illustrate our methods using longitudinal data from 1993-2011 from the cystic fibrosis national registry on patients. FEV1% is typically measured on patients at regular clinic visits and the last measured value is used to determine expected survival and need for lung transplantation. We contrasted this with an alternative approach. Biomarker prediction models were developed to predict FEV1% values based on earlier measurements and these predictions were used to determine expected survival and the need for transplantation. VOI approaches were applied based on the evolution of prediction uncertainty over time to determine the time-point where more precise information on biomarker levels would be most valuable. Using actual annual FEV1% values, we validated the implication of the VOI methods for optimal timing of biomarker collection.
Results: Decision-making about lung transplant assignments over 18 years for 11,254 patients using predicted values of FEV1% data generated a total of 138,699 life years for the same patients. The VOI model suggested that if only one biomarker measurement is available for prediction, the value of collecting the biomarker annually is very high. However, including more than one past measurement to capture patient history substantially decreases the value of annual collection such that the cost of updating biomarker levels is worthwhile only every three years, at $100K/LY. Furthermore, biomarker collection can be made even more efficient by targeting individuals with larger deterioration of predicted information over time. Actual annually collected FEV1% data validated the VOI-based recommendation.
Conclusions: A VOI approach to determining the optimal time interval between updating biomarker data is feasible and could be applicable to a variety of clinical conditions.
Method: We conducted simulations using a de novo model of a hypothetical health care system. The model comprises three stages: allocation of an initial budget among a pool of initial technologies, consideration of a new technology, and reallocation of resources among initial technologies if the new technology is adopted. The optimal threshold ensures that new technologies are adopted only if the net incremental benefit of adoption and reallocation is positive. Three scenarios were considered: divisible technologies exhibiting constant returns; divisible technologies exhibiting diminishing returns; and non-divisible technologies. For each scenario we estimated the optimal thresholds for net investments and net disinvestments across a range of possible budget impacts. We repeated each scenario using three different initial budgets.
Result: The standard exposition of the cost-effectiveness threshold holds under the following conditions: (a) initial technologies are divisible and exhibit constant returns to scale; (b) a single initial technology remains partially adopted following initial allocation; and (c) the budget impact of each new technology is sufficiently small that reallocation involves expanding or contracting only the partially adopted initial technology. In all other cases, the threshold depends upon whether the new technology is a net investment or net disinvestment and the magnitude of the budget impact. The threshold curve is a piecewise linear function under divisibility and constant returns, a concave function under divisibility and diminishing returns, or a step function under non-divisibility.
Conclusion: The standard exposition of the cost-effectiveness threshold is a special case that holds only under specific conditions. Under other conditions, threshold curves take a different functional form that reduces the scope for new technologies to appear cost-effective.
Method: We used patient-level data from one cross-sectional and one longitudinal dataset, the latter capturing three time points representing different states in disease/treatment trajectory. In the models, the items are represented by nodes, which are connected with edges whenever there is a conditional dependence between them, given the rest of the item responses. We tested the conditional dependence with a chi-square based statistical test, correcting for multiple testing. We also performed model selection, starting with the saturated model (with all possible connections present) and applying stepwise backward selection and the Bayesian Information Criterion (BIC), in order to obtain the best-fitted model structure.
Result: Using the statistical tests and the selected models we identified a number of conditional dependencies among the item responses. Out of the 45 possible connections among the 10 items of the PORPUS-U, 22 were included in the selected model based on the cross-sectional dataset. Using the longitudinal data at the three time points identified 16, 20 and 20 connections respectively. Similar results were obtained by the statistical tests. Five item pairs were identified as highly correlated in all 4 models.
Conclusion: Graphical models are powerful statistical tools that can be used for investigating possible deviations from the claimed structural independence among items of utility instruments. As such they can potentially improve the design and use of these instruments. In this study we identified dependencies among a number of items in a prostate cancer utility instrument. Further investigation using alternative testing and modeling strategies, applied to data from other instruments is needed.