Purpose: To extend the methods developed by Phelps and Mushlin (MDM, 1988) and demonstrate the power of a ‘rapid’ cost-effectiveness analysis of new diagnostic tests compared to existing tests based on minimal information and without having to develop a full decision-analytic modelling framework, which is often complex, time consuming and may be an inefficient use of resources.
Method: Using a simplified decision-analytic approach to the complete pathway of care from diagnosis to subsequent treatment, the cost-effectiveness of the diagnostic test under consideration is expressed as a mathematical function of diagnostic accuracy, cost, burden, and the cost-effectiveness of treatment. This function only includes parameters likely to be available during the early stages of test development, and allows instantaneous estimation of cost-effectiveness, i.e. it does not require any simulation. Uncertainty in these parameters is accounted for by applying probabilistic sensitivity analysis. Using a clinical example, the cost-effectiveness of magnetic resonance angiography (MRA) compared with digital subtraction angiography (DSA) for the detection of new intracranial aneurysms is assessed in patients with previous subarachnoid hemorrhage.
Result: The simplified approach produced cost-effectiveness results in line with our previous and similar, but much more comprehensive, assessment of cost-effectiveness of MRA compared with DSA. The comprehensive assessment resulted in a net monetary benefit (NMB) of $1,910 (95%CI -1,809 to 5,565) and probabilities of effectiveness and cost-effectiveness of 98% and 87%, respectively, for a willingness-to-pay threshold of $50,000 per QALY. Our simplified approach returned a NMB of $1,779 (95%CI 1,170 to 2,477) with corresponding probabilities of effectiveness and cost-effectiveness of 100% and 98%, respectively. Hence, in our clinical example the simplified approach would provide sufficient information and a clear indication of the potential benefits of replacing DSA with MRA.
Conclusion: Given the increasing abundance of newly developed diagnostic tests a rapid approximation of the cost-effectiveness of new diagnostic tests compared with existing tests at minimal costs is highly valuable. The low-cost mathematical satisficing approach supports improved use of health care resources by indicating 1) which tests are promising and should be developed further, 2) which tests are not promising and could have their development discontinued, and 3) which tests require more rigorous and comprehensive economic evaluations to obtain improved estimates of cost-effectiveness but at a higher use of health care resources.
Purpose: Multiple imputation (MI) is an attractive approach for addressing missing data in cost-effectiveness analyses (CEA). However, to provide appropriate inferences the imputation model must reflect the data’s structure. CEA alongside cluster randomised trials (CRTs), tend to have complex patterns of missing data. Previous studies have ignored the missingness mechanisms and applied complete-case analysis (CCA) or single-level MI. This paper presents multilevel MI approach for CEA alongside CRTs, and compares the results to those from conventional methods.
Method: We compared the relative performance of alternative methods for handling missing data across a wide range of circumstances. We generated different scenarios with missing costs and health outcomes, using a CEA alongside a CRT with fully-observed data. The CRT (4252 patients, 14 clusters) evaluated an intervention to improve diagnosis of active labour in primiparous women. We constructed scenarios that differed, for example, according to the proportion with missing data (e.g. 30%, 50%) and the missingness mechanisms (e.g. Missing Completely at Random (MCAR) or Missing at Random (MAR)). We estimated incremental net benefits (INB) with each method, and compared these to the corresponding estimates from the fully-observed data, taken to be the ‘true’ INB.
Result: When costs and outcomes were MCAR, all methods gave INBs similar to the ‘true’ estimates. When endpoints were MAR, the CCA gave estimates which differed from the ‘true’ INBs. Across all these scenarios, the single-level MI provided misleading point estimates and understated the uncertainty surrounding the INBs. Unlike single-level MI, the multilevel MI provided both point estimates and precision consistently close to the ‘true’ values, even in more challenging settings, such as when there were high levels of missing data. For example, when 50% of observations had costs and outcomes MAR, the probabilities that the intervention was cost-effective were 0.55 [CCA], 0.50 [single-level MI], 0.40 [multilevel MI], compared to the ‘true’ estimate of 0.39.
Conclusion: MI methods can appropriately handle missing data in CEA, but it is fundamental that the imputation model recognises the structure of the cost-effectiveness data. In CEA that use CRTs, MI can only provide appropriate inferences if the approach reflects the inherent clustering.
Purpose: Presence of heterogeneity alone in the comparative effects of treatments is not enough to call for investments in Patient-Centered Outcome Research (PCOR). Even in the presence of heterogeneous effects, individual outcomes from one treatment can stochastically dominate outcomes from an alternative, which would imply that PCOR has minimal value. Here, we develop a simple and novel method, called the “Jointness Box (JB)” that may be used to contemplate the value of PCOR based on marginal distributions of counterfactual outcomes obtained in traditional studies, helping in the prioritization of PCOR.
Methods: Let Q0 and Q1 denote outcomes generated under two treatments. Data from a standard clinical trial, where patients are randomly allocated to one or the other treatment, can be used to identify the marginal distributions of Q0 and Q1, but not their joint distribution since we lack information on the dependence of Q0 on Q1 at the individual level. However, the identified supports (ranges) of the marginal distributions define a “Jointness Box” (henceforth, JB) representing the plausible spread of heterogeneous treatment effects. In a plot of Q0 againt Q1, where the 45-degree line represents the locus of equality for Q0 and Q1 at the individual–level, the JB represents an area where the joint-distribution of Q0 and Q1 lie. We study two features: 1) JB-dominance i.e. if the JB lies entirely above or below this 45-degree line. 2) JB-area i.e. the proportion of the full area within JB that falls above the 45-degree line. Using bootstrap methods, with attention to sampling order statistics, joint distributions of {Max(Q0), Min(Q0)} and {Max(Q1), Min(Q1)} are obtained and used to study (1) Likelihood of JB-dominance; and (2) the 95% CI for JB-area. Various microsimulation exercises are set up to study the relationship between the JB-dominance and JB-area criteria with the value of PCOR.
Results: We found that the likelihood of JB-dominance is negatively correlated with the value of PCOR, irrespective of the dependence between Q0 and Q1. Additionally the JB area has a u-shaped relationship with the value of PCOR, and also varies with the nature of dependence between Q0 and Q1. The JB metrics are found to be useful tools to envision heterogeneity and prioritize PCOR.
Conclusion: Future work will apply JB metrics to various clinical applications.
Purpose: Choosing the best treatment is challenging when there is more than one reasonable option and each option has good and bad attributes that people may value differently. Our objective was to develop a practical approach to integrate patient preferences with clinical evidence in order to help patients more easily identify treatments most consistent with their preferences
Method: We developed a prototype that uses a vector space model to combine quantitative evidence about the impact of different treatment options with patient preferences. The evidence matrix defined by Pm-n describes the impact of each treatment T1-n on each attribute A1-m affected by these treatments. For each pairwise combination of T within each A, weights are assigned to each T in proportion to the difference (D) between the 2 treatments’ impact on each domain (Dt1t2). The preference attributes of greatest importance to elicit from patients are selected empirically, based on Dt1t2, and are framed consistently across attributes. Visual analog scales (ranging from 0 to 1) elicit patient preferences for each selected A, which are then normalized to create a unique preference vector. Treatments are rank ordered by multiplying the evidence matrix by the patient preference matrix. The evidence matrix can be easily updated to reflect new data, regional data, group-specific data, or different time horizons. Patient preferences can be obtained iteratively for additional attributes, as needed, to help distinguish among treatments.
Result: We created an algorithm that integrates evidence about the impact of treatments for low risk prostate cancer with individual patient preferences. Three treatments (active surveillance, radical prostatectomy, and radiation treatment) and four attributes (surviving prostate cancer, incontinence, impotence, and rectal problems) are considered as a test case. Using data from a 2011 AHRQ Evidence Report, the most important attributes to query patients about their preferences are impotence (1st), rectal problems (2nd), and incontinence (3rd). If patients only valued survival, the preferred treatment is radiation therapy; if patients equally valued all four attributes, the preferred treatment is surveillance. The model is sensitive to small changes in preferences.
Conclusion: This new approach to combining individual preferences with evidence minimizes both patient burden and bias on the part of the decision support tool designer, and is generalizable to other preference-sensitive decisions.
Purpose: To demonstrate the potential usefulness of a two-stage approach combining machine-learning and Bayesian techniques for the prediction of heterogeneous treatment effects in the presence of a large number of predictors with potential high-order interactions.
Method: 460 patients from the N9741 clinical trial of treatment in advanced colorectal cancer with complete response, toxicity and pharmacogenomic profiles were included. Survival was imputed for patients alive at last follow-up. In the first stage, random forest algorithms were used to predict survival separately for each treatment group as a function of age, sex, race (white vs. non-white), prior chemotherapy status and a set of 18 indicator variables containing information about single-nucleotide polymorphisms (SNPs). The resulting treatment-specific survival scores were included along with treatment assignment indicators in a second stage Bayesian GLM (gamma family, log-link) model predicting survival. The survival scores were designed to capture complex interactions of each treatment with individual characteristics, including genomic data. Given the large number of predictors and potential multi-way interactions, direct inclusion of treatment interaction terms would not have been feasible. Counterfactual simulations were conducted by applying treatment-specific survival scores for treatments not received by each individual to posterior parameter estimates from the Bayesian GLM survival model.
Result: Treatment specific survival score parameter estimates for two of the three treatments were significantly positive at the 95% posterior probability level, strongly suggesting the presence of treatment effect heterogeneity determined by personal characteristics, including genomic profiles. While overall treatment effect estimates strongly suggested that one regimen was likely to be superior on average, counterfactual simulations predicted that 61 of the 460 patients had at least a 50% chance of benefiting more from one of the other two regimens in terms of expected survival.
Conclusion: A two-stage approach combining random forests and Bayesian GLM was able to identify and estimate treatment effect heterogeneity given set of predictors (and possible interactions) too large to include directly as regression interaction terms. A subset of patients were identified who were likely to benefit more from a treatment which was not predicted to be the most effective on average.
Purpose: Probabilistic sensitivity analysis (PSA) is a recommended approach by ISPOR‐SMDM Modeling Good Research Practices Task Force and a necessary step for value of information analysis. However, conducting PSA can be computationally challenging and often impractical in large-scale patient-level simulation (PLS) models (e.g. microsimulation, discrete-event simulation, agent-based models). Our purpose was to conduct PSA using Latin Hypercube sampling and compare results with a commonly used approach of Monte Carlo sampling.
Method: We developed a Markov PLS model to conduct cost-effectiveness analysis of hepatitis C treatment where states included METAVIR fibrosis scores (F0-F4), decompensated cirrhosis, hepatocellular carcinoma, liver transplant, and liver-related death. We used 33 parameters to perform PSA which included state transition probabilities, utility weights and costs. We used two sampling techniques: random sampling (RS), and Latin Hypercube sampling (LHS), a type of stratified sampling technique. We ran PSA with different number of samples, n=100,1000 (2nd-order uncertainty) resulting in RS100, RS1000, LHS100, LHS1000 strategies using 1000 iterations within each run (1st-order uncertainty). Using independent initial random-seeds, we obtained 20 sets of results for each sampling strategy and estimated standard error (SE) in the mean cost, QALYs, incremental cost-effectiveness ratios (ICERs), and their lower and upper 95% confidence limits. We compared these outcomes with a "gold standard" (GS), an outcome of extensive random sampling of 100,000 PSA inputs. Finally, we identified influential inputs based on each method and plotted cost-effectiveness acceptability curves.
Result: No trend was observed using 100 samples. Using 1000 samples, SE with LHS decreased in comparison with RS by 35-43% in costs, 37-48% in QALYs, 13-40% in confidence-intervals of costs, and 27-49% in confidence-intervals of QALYs (table). The total bias in costs and QALYs obtained with all sampling strategies was less than 4% when compared to GS. However, ICERs obtained with RS100, LHS100, RS1000 and LHS1000 were higher than that obtained with GS by 44%, 72%, 42%, and 25%, respectively.
Conclusion: Compared with standard Monte Carlo sampling the bias in costs and QALYs may reduce substantially with Latin Hypercube sampling; however, large samples are needed to reduce bias in ICERs. Results with Latin Hypercube sampling are less dependent on initial random seed as compared to random sampling.