* Candidate for the Lee B. Lusted Student Prize Competition

**Purpose: **Decision-makers require cost-effectiveness analyses (CEA) for patient subgroups. In non-randomised studies a key challenge is that the treatment allocation and response surface models may be unknown. The propensity score (PS) can be used in matching, inverse probability of treatment weighting (IPTW) and Genetic Matching (GenMatch) to reduce selection bias due to observed characteristics. These methods have not been tested for estimating subgroup effects in CEA.

**Method: **This paper extends previous comparisons of PS methods for CEA, to the new context of subgroup analysis. Here, IPTW has potential appeal as it may be relatively statistically efficient. However, if the PS is misspecified, this method may fail to balance covariates in each subgroup. For GenMatch, we propose including subgroup by covariate interaction terms in the balance matrix. We compare these methods in a CEA of Xigris, a pharmaceutical intervention for critically ill patients with sepsis (n=2000), whose effectiveness is anticipated to vary by number of organ failures. In simulation studies we consider the relative performance (covariate balance, bias, root mean squared error [RMSE]) of the methods across the following scenarios: a) Ideal scenario: PS and response surface model correctly specified. b) PS model misspecified by excluding an interaction term. c) Response surface model misspecified, GenMatch given incorrect, initial weights for a covariate of high prognostic importance.

**Result: **In the case study, following both PS matching and IPTW, covariate balance was poor on key prognostic variables (e.g. for APACHE II P<0.01). For IPTW the variance on the weights was high for both subgroups. Following GenMatch covariates were balanced in each subgroup (e.g. APACHE II P>0.1). For patients with 2 organ failures, the incremental net benefit (λ=£20,000 per QALY) varied by method: from -£42,422 (95% CI: -47,013 to -37,831) for IPTW to -£17,247 (95% CI: -18,955 to -15,539) for GenMatch. In the simulation study, under the ideal scenario, all three methods performed well with low bias and RMSE. Once the PS was incorrectly specified, the bias for both PS methods was seven times that for GenMatch. When GenMatch was given incorrect weights, it still dominated the other methods on RMSE for both subgroups.

**Conclusion: **CEA should use methods that achieve covariate balance for each subgroup. GenMatch can help minimise bias across a range of circumstances faced in applied CEA.

**Purpose: ** The purpose of this study was to compare methods of analyzing cost-effectiveness data from nonrandomized atrial fibrillation (AF) patients treated with rhythm control or rate control strategies.

**Method: ** Patients were enrolled from 532 sites in 21 countries between May 2007 and April 2008 in the Registry on Cardiac Rhythm Disorders (RECORD AF). Hospitalizations, complications and cardiovascular events were assigned a Diagnosis Related Group (DRG) and costs estimated by multiplying the relative weight of a DRG by the Medicare base rate in 2008 ($4,893). Outpatient procedures were assigned a CPT4 code for costing, and Redbook 2007 average wholesale price (AWP) was used to cost medications. Lifetime costs were estimated using average Medicare participant per capita expenditure in 2008 ($6,458). Quality adjusted life years (QALYs) were calculated by estimating life expectancy based on Framingham data and multiplying by utility scores from the EQ-5D. Propensity scores were calculated from demographic and clinical history information. Rhythm and control patients were matched on propensity scores by use of a greedy matching algorithm. Stabilized inverse probability weighting (IPW) was used to weight both costs and life expectancies. Bootstrap analysis was used to compare costs and estimate incremental cost-effectiveness ratios (ICER). Lifetime costs and life expectancy were discounted 3%.

**Result: ** Observed data indicated rhythm is cost-effective with 100% of bootstrap replicates in quadrant 1 of the cost-effectiveness plane. Propensity score matching indicated similar costs and QALYs of the therapies, but only 57% of patients were matched. IPW analysis indicated rhythm control to be cost-effective with ICER of $3,977, and probability of 0.87 of the ICER < $30,000.

**Conclusion: ** Differences between rhythm control and rate control strategies with respect to total lifetime cost and QALYs were small in the matched (< $140 and < 0.1 for cost and QALYs respectively) and IPW (< $571 and 0.15) analyses, and were in the direction of cost-effectiveness of rhythm – more expensive, but more effective. Results for the observed data were similar and indicated rhythm was cost-effective. From an economic standpoint the differences in results were small. From a methodological standpoint, the potential confounding in nonrandomized studies may give misleading results. Propensity score matching may result in a severe loss of data, whereas IPW analysis allows the use of all available data.

**Purpose: **

**Method: ** The AHP is a technique for multi-criteria analysis. It can help decision-makers to evaluate a finite number of alternative health care technologies under a finite number of outcome measures. The Analytic Hierarchy Process (AHP) was applied to prioritize the outcome measures of treatments of equinovarus deformity poststroke, and to prioritize five alternative treatment regarding these outcome measures. Using the pairwise comparisons technique of the AHP, 140 patients prioritized the outcome measures, and 10 health professionals prioritized the five treatments with regard to these outcome measures. The priorities were used to calculate the relative effectiveness of the treatments. Sensitivity analysis is based on bootstrapping of the participants’ priorities. Relative costs, including the device related costs and the care related costs of the treatments, are calculated by applying the direct rating function of the AHP. ** **

**Result: ** Our analysis results in one overall efficiency frontier that takes into account the combined outcomes of the alternative treatments. The impact of the outcome measures on the combined outcomes is determined by their priorities. Functional outcomes (.51) has the highest weight, followed by risk and side effects (.19), comfort (.10), daily effort (.098), cosmetics (.07), and impact of treatment (.03). The overall effectiveness of soft-tissue surgery (.41) is ranked first, followed by orthopedic footwear (.18), ankle-foot orthosis (.15), surface electrostimulation (.14), and finally implanted electrostimulation (.12). Implanted electrostimulation (.35) and soft-tissue surgery (.34) are considered to be most expensive, followed by surface electrostimulation (.26), orthopedic footwear (.03) and ankle-foot orthosis (.02). Based on these priorities of the treatments’ overall effectiveness and costs, an efficiency frontier was drawn that includes decision uncertainty.

**Conclusion: ** The results suggest that the cost-effectiveness of implanted electrostimulation and surface electrostimulation are unfavourable. This new methodology for efficiency frontier analysis allows decision makers to integrate the outcomes about costs and values of health care technology, and can be applied broadly. It is particularly suitable in the field of early technology assessment, since the AHP supports a systematic estimation of priors about the effectiveness of alternative treatments.

**Purpose: ** Recently, the German HTA agency IQWiG published new guidelines on health-economic evaluations within the German statutory health care system. The goals of this pilot study commissioned by IQWiG in Germany were (1) to apply IQWiG’s new guidelines to the economic evaluation of combination therapy with peginterferon plus ribavirin (PegIFN+RBV) for chronic hepatitis C (CHC) and (2) to assess the feasibility of the efficiency frontier (EF) approach in this case example.

**Method: ** IQWiG’s EF approach assesses the cost-effectiveness of a new treatment within the specific disease area by comparing the new treatment’s incremental cost-effectiveness ratio (ICER) to ICERs of established treatments. We used IQWiG’s EF approach to assess the cost-effectiveness of PegIFN+RBV within the area of CHC comparing against other antiviral treatment regimes. We used a Markov model with a lifelong time horizon to determine health outcomes and costs of all treatment options. Health outcomes included sustained virological response (SVR), lifetime risk of decompensated cirrhosis and quality-adjusted life years (QALY). Model parameters were derived from the published literature and German databases. We adopted the perspective of the community of citizens insured through the statutory health insurance.

**Result: ** The ICERs of PegIFN+RBV compared to interferon plus ribavirin (IFN+RBV) were EUR 15,000 EUR per SVR avoided, EUR 42,000 per decompensated cirrhosis avoided, and EUR 4,000 per QALY gained. These ICERs are substantially lower than those of the last segments of the respective EFs (i.e., ICER of IFN+RBV vs. IFN monotherapy) indicating cost-effectiveness of PegIFN+RBV. The introduction of new genotype-specific treatment guidelines led to cost-savings when compared to IFN+RBV. The EF approach was feasible in the case of CHC treatment, because (1) IQWiG suggests several types of health outcomes, including response rates, prognostic implications, and quality-of-life scores, which can be generated in CHC (i.e., SVR, cirrhosis risk, QALY) and (2) sufficient treatments and evidence existed to generate the EF.

**Conclusion: ** PegIFN+RBV is cost-effective when compared to other established treatments in CHC. The EF approach should be feasible for HTAs in the area of CHC. However, several issues remain to be solved and conclusions derived from HTAs based on IQWiG’s framework may substantially differ from HTAs assuming uniform willingness-to-pay thresholds across the entire health care system. The foundation of IQWiG’s approach, that is, deriving disease-specific ICER thresholds, remains challenged.

**Purpose: ** Our study aims to combine the versatility of the Analytic Hierarchy Process (AHP) with the decision-analytic sophistication of Markov modelling in a new methodology for early technology assessment. As an illustration, we apply this methodology to a new technology to diagnose breast cancer.

**Method: ** Markov modelling is a commonly used approach to support decision making about the application of health care technology. We use a basic Markov model to compare the incremental cost-effectiveness of alternative technologies in terms of their costs and clinical effectiveness. The AHP is a technique for multi-criteria analysis, relatively new in the field of technology assessment. It can integrate both quantitative and qualitative criteria in the assessment of alternative technologies. We applied the AHP to prioritize a more versatile set of outcome measures than Markov models do. These outcome measures include the clinical effectiveness and its determinants, as well as costs, patient comfort and safety. Furthermore, the AHP is applied to predict the performance of the new technology with regard to these outcome measures.

**Result: ** We systematically estimated priors on the clinical effectiveness of the new technology. In our illustration, estimations on the sensitivity and specificity of the new diagnostic technology were used as an input in the Markov model. Moreover, prioritized outcome measures including the clinical effectiveness (w = 0.61), patient comfort (w = 0.09) and safety (w = 0.30) could be integrated into one combined outcome measure in the Markov model.

**Conclusion: ** Combining AHP and Markov modelling is particularly valuable in early technology assessment when evidence about the effectiveness of health care technology is still missing. Moreover, this combination can be valuable in case decision makers are interested in other patient relevant outcomes measures besides the technology’s clinical effectiveness, and which are not (adequately) captured in the mainstream utility measures. These outcome measures can have a strong impact on the successful application of health care technology.

** Purpose: ** Monte Carlo simulation is a commonly used method to account for uncertainty in cost-effectiveness analysis - by convention the number of simulations used is an arbitrary round number e.g. 1,000. We present a rational approach for determining the appropriate number of simulations to use in any particular simulation, along with a pilot of this method.

** Method: ** First the analyst should specify: (a) what is the purpose of the analysis; (b) how accurate must the results be; and (c) how many confirmatory simulations should be performed once the output appears to stabilise. The most straightforward means of implementing this approach is through iteration. Eventually additional simulations will have no substantial impact on the output of interest – at this point confirmatory simulations must be performed to confirm that the stabilisation is permanent rather than by random chance. This method was then piloted on a recently published cost-effectiveness analysis of screening for post-natal depression in UK primary care. The number of simulations required to derive either the ICERs or the net benefit associated with each of the 12 strategies was retrospectively calculated to varying degrees of accuracy.

** Result: ** The original analysis over 10,000 simulations found three strategies to be dominated and one to be extendedly dominated, resulting in seven ICERs. Each ICER appeared to stabilise to the nearest £1,000 after 1,122 simulations. The ICERs failed to permanently stabilise to the nearest £100 or £10 within the 10,000 simulations available. Net benefit associated with each strategy permanently stabilised to the nearest £100 in 50 simulations, and to the nearest £10 after 5,392 simulations. The net benefit failed to permanently stabilise to the nearest £1 within the 10,000 simulations available. Work is currently being undertaken to derive further simulations from the model in order to investigate these results further.

** Conclusion: ** The appropriate number of Monte Carlo simulations varies according to the purpose of the analysis, and is highly dependent upon the level of accuracy required. Adopting a conventional but arbitrary number of simulations generally results in either wasted computing time or spurious precision, as evidenced by the pilot study. We have introduced a simple and rational framework for calculating the appropriate number of Monte Carlo simulations, which could be adopted in future cost-effectiveness analyses.