##
1C
ORAL ABSTRACTS: ECONOMIC EVALUATION METHODS

David O. Meltzer, MD, PhD

University of Chicago

Professor of Medicine

Section of Hospital Medicine

** Purpose: ** Parameter
estimates used in cost-effectiveness analysis (CEA) are informed by empirical
studies. However, existing studies might not be rigorous or directly relevant
to the CEA's setting. We quantify the opportunity cost of biased
research using a CEA that informed UK antenatal care policies.

** Methods: ** We used
a CEA of antenatal prophylaxis strategies in rhesus-negative pregnant women:
anti-D prophylaxis for all; for primigravidae only; or
for none. Existing anti-D prophylaxis studies are neither historically
controlled, nor rigorously designed and conducted, and not directly relevant to
the modern era. We incorporated corrections for bias and lack of relevance in
parameters used in the CEA. We obtained estimates of crude and bias-corrected anti-D
prophylaxis effectiveness from a previously published quantitative bias
analysis, which used elicited opinion and constructed prior distributions to
represent the biases of each study. We conducted separate CEAs, with and
without such bias corrections. We estimated the opportunity cost of lack of
rigor and relevance of research with a value of information analysis (VOI).

** Results: **Bias
corrections changed the point estimates of anti-D prophylaxis effects little
but inflated their variances substantially. Without bias-corrections, treating
all rhesus negative pregnant women becomes the optimal strategy at a willingness-to-pay
(WTP) of £13,000 per
quality-adjusted life year (QALY). On the bias-corrected CEA, the optimality
threshold increases to £16,000/QALY. In
the VOI, effectiveness of anti-D prophylaxis was the most valuable parameter over
the examined range of WTP thresholds, reaching a population expected value of
partial perfect information (EVPPI) of £700,000 at a WTP of £16,000/QALY. The bias-correction
on the treatment's effectiveness was the second most valuable parameter in
terms of EVPPI, peaking at £620,000
at the same WTP. Other parameters of the CEA combined, without considering the
risk of sensitization, account for just over half of the EVPPI of the bias
parameter (Figure). Results were analogous in sensitivity analyses that used
alternative opinion-elicited bias corrections.

** Conclusions: ** Uncertainty
associated with parameters used in CEAs should reflect not only sampling
variance of empirical studies, but also the uncertainty stemming from less than
perfect design, execution or analysis of such studies. The latter is rarely
accounted for in economic evaluations, but can tramp the former. Research
performed sub optimally can be associated with a large opportunity cost that is
routinely left unclear.

**Purpose:**To illustrate differences in assumptions between partitioned survival and Markov model approaches for state-based economic evaluation using an example from a recent reimbursement review.

**Method: **We developed a three-state economic model comparing bevacizumab + capecitabine (new strategy) to capecitabine alone (comparator) based on the results of the AVEX trial (Cunningham et al 2013). The three model states were progression-free, progressed and dead and were populated using both partitioned survival and Markov modelling approaches. Since patient-level data were not available, we recreated patient-level data using the methods of Guyot et al (2012) and fit parametric distributions to inform survival estimates. We also sought external study sources to estimate Markov transition probabilities for death from the intermediate state of progression, which in the absence of further data, was assumed to be the same regardless of initial treatment strategy. We contrast the data and assumptions used to populate each model type and demonstrate the implications for the estimated extra costs and effects produced.

**Result: **The partitioned survival model and Markov model produced similar incremental costs ($53,209, $53,902, respectively), and each produced 0.263 QALY gains in the progression-free state for the new strategy. Overall, the partitioned survival analysis produced an incremental 0.186 QALYs for the new strategy, due to 0.077 QALYs lost in the progressed state. In contrast, the Markov model produced an incremental 0.245 QALYs, due to only 0.018 QALYs lost in the progressed state. These differences led to ICERs of $286,121 and $220,027 per QALY gained for partitioned survival and Markov models, respectively. In the AVEX trial, PFS gain was larger than the OS gain. The partitioned survival model outcomes appeared to accurately reflect the trial data, while the Markov model survival did not align well with overall survival from the trial without further calibration and assumptions.

**Conclusion: **Uncertainties about partitioned survival models have often been accompanied by recommendations to pursue a Markov model. Both partitioned survival models and Markov models have limitations that can affect their ability to accurately reflect clinical reality. In this example, we demonstrated a scenario in which a commonly used approach for Markov models can lead to results that overestimate treatment benefits. Our study results suggest it is appropriate to consider both modelling types to address uncertainty in an analysis because they characterize and extrapolate risks differently.

** Purpose: **Probabilistic sensitivity analysis
(PSA) is a recommended approach and is a necessary step for undertaking value
of information analysis. However, conducting PSA can be computationally
challenging in individual-level state-transition models (I-STMs) because of two
levels of uncertainty: first- and second-order. Published guidelines suggest a
careful evaluation of a balance between the inner and outer simulation loops.
Our purpose was to evaluate the need for such a balance and to find the optimal
combination to conduct PSA in I-STMs.

** Method: **We used a previously published and
validated I-STM that evaluated the cost-effectiveness of hepatitis C treatment.
Second-order, parameter uncertainty was defined using the recommended
statistical distributions. We conducted PSA multiple times using five different

*combinations*of inner (i.e., first-order) and outer loops (i.e., second-order) labelled as A-E. The total number of computational runs was equal to 1 million in all combinations. Using independent initial random-seeds, we ran PSA 20 times for each combination and obtained 20 sets of results and plotted 20 cost-effectiveness acceptability curves (CEACs). Our rationale was to determine variability in outcomes resulting from joint first- and second-order uncertainty. We also estimated standard error (SE) (from 20 runs) in mean costs and quality-adjusted life years (QALYs).

** Result: ** Figure 1 A-E shows the CEACs associated
with each of 20 sets of PSA runs for each combination of inner and outer loops.
As the number of outer loops increases the variation caused by the random seed
falls such that there is overlap across the CEACs. For the combination with 1 million outer
loops (Fig E), the variability in the CEACs is completely resolved (i.e. the
CEACs completely overlap), however, these CEACs are skewed downwards, compared
to the other combinations. This is because when only 1 inner loop is used it is
not possible to average across the individuals and hence the full impact of 1

^{st}order uncertainty is retained.

** Conclusion: **To run PSA in I-STMs, the right
balance between inner-outer loops is needed. Using extreme combinations will result
in inappropriate results which would impact on decisions regarding
cost-effectiveness and the value of further research. Our empirical analysis
also indicates that under conditions of constrained computational time, using
more outer loops than inner loops should be preferred.

**Purpose:**As a patient’s health evolves over time, knowing its level at any point in time through repeated collection of biomarkers may be critical to determining the benefits of an intervention at that time-point; however, repeated biomarker collection is costly and inconvenient. Alternatively, predictions based on patients’ earlier biomarker values may be used to inform dynamic decision-making; however, predicted biomarker levels are uncertain, giving rise to decision uncertainty. Our goal was to develop value of information (VOI) methods to determine at what time-point direct collection of biomarker data would be most valuable. This goal also fits squarely with this year’s SMDM theme “From Uncertainty to Action”.

**Methods: **We illustrate our methods using longitudinal data from 1993-2011 from the cystic fibrosis national registry on patients. FEV_{1}% is typically measured on patients at regular clinic visits and the last measured value is used to determine expected survival and need for lung transplantation. We contrasted this with an alternative approach. Biomarker prediction models were developed to predict FEV_{1}% values based on earlier measurements and these predictions were used to determine expected survival and the need for transplantation. VOI approaches were applied based on the evolution of prediction uncertainty over time to determine the time-point where more precise information on biomarker levels would be most valuable. Using actual annual FEV_{1}% values, we validated the implication of the VOI methods for optimal timing of biomarker collection.

**Results: ** Decision-making about lung transplant assignments over 18 years for 11,254 patients using predicted values of FEV_{1}% data generated a total of 138,699 life years for the same patients. The VOI model suggested that if only one biomarker measurement is available for prediction, the value of collecting the biomarker annually is very high. However, including more than one past measurement to capture patient history substantially decreases the value of annual collection such that the cost of updating biomarker levels is worthwhile only every three years, at $100K/LY. Furthermore, biomarker collection can be made even more efficient by targeting individuals with larger deterioration of predicted information over time. Actual annually collected FEV_{1}% data validated the VOI-based recommendation.

**Conclusions: **A VOI approach to determining the optimal time interval between updating biomarker data is feasible and could be applicable to a variety of clinical conditions.

**Purpose:**The optimal cost-effectiveness threshold has been subject to much debate. In the standard model, technologies are assumed to be divisible and exhibit constant returns to scale. The threshold is plotted as a linear function through the origin of the cost-effectiveness (CE) plane, implying a single threshold in all circumstances. We consider the implications of departures from the assumptions underlying the standard model, including the possibility of

*diminishing*marginal returns to scale or

*non-divisibility*of technologies. We also consider if the optimal threshold is dependent upon a new technology’s

*budget impact*and whether the new technology constitutes a

*net investment*or

*net disinvestment*.

**Method: ** We conducted simulations using a *de novo *model of a hypothetical health care system. The model comprises three stages: allocation of an initial budget among a pool of initial technologies, consideration of a new technology, and reallocation of resources among initial technologies if the new technology is adopted. The optimal threshold ensures that new technologies are adopted only if the net incremental benefit of adoption and reallocation is positive. Three scenarios were considered: divisible technologies exhibiting constant returns; divisible technologies exhibiting diminishing returns; and non-divisible technologies. For each scenario we estimated the optimal thresholds for net investments and net disinvestments across a range of possible budget impacts. We repeated each scenario using three different initial budgets.

**Result: ** The standard exposition of the cost-effectiveness threshold holds under the following conditions: (a) initial technologies are divisible and exhibit constant returns to scale; (b) a single initial technology remains partially adopted following initial allocation; and (c) the budget impact of each new technology is sufficiently small that reallocation involves expanding or contracting only the partially adopted initial technology. In all other cases, the threshold depends upon whether the new technology is a net investment or net disinvestment and the magnitude of the budget impact. The threshold curve is a piecewise linear function under divisibility and constant returns, a concave function under divisibility and diminishing returns, or a step function under non-divisibility.

**Conclusion: ** The standard exposition of the cost-effectiveness threshold is a special case that holds only under specific conditions. Under other conditions, threshold curves take a different functional form that reduces the scope for new technologies to appear cost-effective.

**Purpose:**The construction of preference based utility instruments relies on multi-attribute utility theory, one of the most important components of which is structural independence among the attributes. This means that overlap between items is minimized so that every combination of health states is possible. For example, severe pain with excellent emotional well-being is possible, if unlikely, but high levels of function are not compatible with low mobility, since function and mobility are highly correlated. This property is rarely tested empirically. We used patient data and graphical models - advanced statistical methods used for modelling multivariate associations and interdependencies among random variables - to test the structural independence of a prostate cancer-specific instrument, Patient-Oriented Prostate Utility Scale (PORPUS-U).

**Method: ** We used patient-level data from one cross-sectional and one longitudinal dataset, the latter capturing three time points representing different states in disease/treatment trajectory. In the models, the items are represented by nodes, which are connected with edges whenever there is a conditional dependence between them, given the rest of the item responses. We tested the conditional dependence with a chi-square based statistical test, correcting for multiple testing. We also performed model selection, starting with the saturated model (with all possible connections present) and applying stepwise backward selection and the Bayesian Information Criterion (BIC), in order to obtain the best-fitted model structure.

**Result: ** Using the statistical tests and the selected models we identified a number of conditional dependencies among the item responses. Out of the 45 possible connections among the 10 items of the PORPUS-U, 22 were included in the selected model based on the cross-sectional dataset. Using the longitudinal data at the three time points identified 16, 20 and 20 connections respectively. Similar results were obtained by the statistical tests. Five item pairs were identified as highly correlated in all 4 models.

**Conclusion: ** Graphical models are powerful statistical tools that can be used for investigating possible deviations from the claimed structural independence among items of utility instruments. As such they can potentially improve the design and use of these instruments. In this study we identified dependencies among a number of items in a prostate cancer utility instrument. Further investigation using alternative testing and modeling strategies, applied to data from other instruments is needed.