ORAL ABSTRACTS: ADVANCING EVALUATION METHODS AND FRAMEWORKS
Methods: Tool development followed the Ottawa Decision Support Framework and the International Patient Decision Aid Standards. Prior qualitative and survey assessments directed the initial content. Decision aids (DAs) were iteratively modified using feedback from 28 patient interviews; three patient focus groups; patient stakeholders; local research groups; and a national panel of 14 experts in decision science and/or cardiovascular disease. The pilot trial was conducted at three clinics across Denver, Colorado. To test feasibility, recruitment strategies differed by site: 1) chart review of ICD referrals; 2) chart review of electrophysiologists’ (EP) schedules; and 3) clinic staff identification of eligible patients. Intervention patients were mailed the four DAs before discussing ICD therapy with the EP. Patients were interviewed at baseline, one month after meeting with the EP, and three months after enrollment. Primary outcomes were acceptability and feasibility; secondary outcomes included ICD-specific knowledge, decision quality, values concordance, decision conflict, and decision regret.
Results: Twenty-one eligible patients enrolled; 15 were randomized to the intervention and six to the control (usual care). 67% found the DAs to be unbiased, 22% thought they were biased toward ICDs, and 11% thought they were biased toward not getting an ICD. Furthermore, 89% found the DAs helpful and 100% would recommend them to others. The pilot was feasible at all sites; however, using clinic staff to identify eligible patients was more efficient than using chart review. Intervention patients did not have significantly greater knowledge about ICDs (M=14.00, SD=2.62) than controls (M=11.60, SD=3.13, t13=1.57, ns). Intervention patients had increased concordance between their decision and end-of-life values (71% concordant vs. 29%, p=0.06). Intervention patients did not have significantly different levels of decision conflict (M=17.81, SD=14.97) than controls (M=24.38, SD=25.35, t13=0.64, ns) or decision regret (M=21.88, SD=16.24, M=16.00, SD=19.12, respectively, t11=0.59, ns).
Conclusions: Patients felt the DAs provided helpful, balanced information that they would recommend to other patients. Furthermore, utilizing clinic staff is an efficacious way to get decision aids to patients. The impact of the DAs on the secondary outcomes will be tested in a future, adequately powered trial.
Method: This paper discusses the main MCDA approaches and methods and provides examples of the diverse range of health care applications in use internationally. The common steps for implementing MCDA are explained, which include 1) carefully structuring the decision problem being addressed; 2) ensuring that appropriate criteria are specified; 3) measuring alternatives’ performance accurately; using valid and reliable methods for 4) scoring alternatives and 5) weighting criteria; and 6) presenting MCDA results, including 7) sensitivity analysis, in a form that is relatively easily interpreted and communicated. However, the way these steps are conducted differentiate the MCDA methods.
Result: Most applications of MCDA in health care are based on weighted-sum models. Notwithstanding the popularity of this approach, methodological issues arise at each step of the process for creating and applying such models. In particular, there are a potentially confusing variety of scoring and weighting methods (steps 4 and 5) to choose from. Naturally, all methods (and software implementing them) have their relative strengths and weaknesses (choosing the ‘best’ MCDA method is itself a multi-criteria decision problem!).
When thinking about which scoring and weighting methods to use consideration needs to be given to: how well methods elicit trade-offs between criteria; the time and resources required to implement alternative methods; the cognitive burden imposed on participants and whether skilled facilitators are required; the need for additional data processing and statistical analysis; the validity of the underlying assumptions relative to decision-makers’ preferences; and whether the outputs produced will satisfy decision-makers’ objectives.
Conclusion: As the use of MCDA in health care increases, further research into the development of a framework to help select the most appropriate methods for particular types of health care application would be worthwhile.
Method: An expert panel of five members was formed to provide inputs and guidance on the checklist. We followed the international reporting guideline development framework. A list of items was generated based on a systematic literature review of EQ-5D valuation studies. A modified Delphi panel approach was adopted by asking the expert panel via email to assess independently the content validity, completeness, and wording of these items and suggest any additional items if needed. Upon receiving inputs from the expert panel, items were refined. In the next stage, inputs on the checklist were solicited from the members of the EuroQol Research Foundation who were asked to comment on the checklist and assess how important each item is. If an item was classified as “required” by more than 50% of the participants in the survey, the item was included in the second round of deliberation which decided the final version of the checklist.
Result: From an initial list of 35 items, 21 items were selected for final inclusion on the checklist, grouped into 7 sections: (1) descriptive system; (2) health states valued; (3) sampling; (4) preference data collection; (5) study sample; (6) modeling; and (7) scoring algorithm.
Conclusion: The CREATE is aimed to facilitate and promote transparent reporting for valuation studies of MAUIs. This checklist is methodology-oriented and can assist users in their critical appraisal of value sets and help guide research related to the design, execution and reporting of health valuation studies.
Purpose: Bayesian methods are naturally suited for calibration because they reveal the posterior distributions of the model parameters and their correlations, unlike direct search algorithms [e.g., Nelder-Mead (NM)] that only produce point estimates. However, Bayesian methods are rarely implemented in practice due to technical and computational challenges associated with defining simulation models in specialized software (e.g., BUGS). We propose combining artificial neural network (ANN) metamodeling with Bayesian calibration as a hybrid approach that is efficient and can be quickly scaled to models of arbitrary complexity.
Methods: Our approach involves these steps: (1) conduct a PSA with vague input parameter values, (2) fit an ANN metamodel using the PSA's inputs and outputs, (3) calibrate the ANN metamodel using Markov chain Monte Carlo (MCMC) sampling algorithm, and (4) obtain the posterior distribution of the calibrated parameters that quantifies all sources of uncertainty not explained by the simulation model or the observed data. We demonstrate our approach with a Markov model for cancer progression. The model has three states: Cancer free, Metastasis and Dead, and two unknown probabilities that define the transitions between cancer free and metastasis (pMet) and death from metastasis (pDieMet). We produced 100 survival curves from the Markov model using 100 random parameter sets for pMet and pDieMet. We compared the accuracy of the Hybrid approach to estimate the true parameter values for each parameter set relative to NM calibration. In addition, we initialized NM from 100 random starting points, while we initialized the Hybrid approach from a single starting point.
Results: The Hybrid approach was more precise than NM. The mean squared errors for NM were more than 1000 times and 200 times greater than the Hybrid approach for pMet and pDieMet, respectively. The Hybrid approach took 2.9 seconds compared to 1.6 seconds for NM. The Figure shows the results of the Hybrid approach for one set of true parameter values indicated by the star (pMet=0.0099 and pDieMet=0.0477). In addition the Hybrid approach reveals the posterior distribution and the correlation between pMet and pDieMet, which are not possible with direct search algorithms like NM.
Conclusions: Bayesian calibration reveals posterior parameter distributions and their correlations for calibrated model parameters. Adding ANN metamodeling can overcome many technical and computational challenges associated with Bayesian calibration.
Method: The HUI2 and HUI3 assign a utility to a health state by aggregating individuals’ preferences into a community scoring function. This is done by taking the average of individuals’ utilities for a health state as “the” utility of that health state. Social choice theory (Arrow, 1951; Sen, 1970) describes the contexts in which various preference aggregation methods are normatively justified. We describe the aggregation procedures used in the HUI2 and HUI3 in the framework of social choice theory and apply results from social choice theory to the HUI context.
Result: Examining the HUI2 and HUI3 through the lens of social choice theory, we investigate the assumptions that must be satisfied to justify relying on the average. For example, excluding individuals based on “illogical” responses (e.g., preferring a state with lower functional capacity to a state with higher functional capacity) creates normative problems for using the average as the mechanism of preference aggregation. Similarly, excluding individuals who report no differences between health states has strong implications for the normative foundations of the aggregation procedure. Social choice theory also provides alternatives to the average. For example, a minimum or maximum aggregation method may identify subgroups of interest while an aggregation method that is a function of the standard deviation may address concerns with equity. Thus, aggregation methods embody different values and we describe policy scenarios in which an alternative aggregation method may be preferred.
Conclusion: The current use of the average as the method of preference aggregation is justified under explicit assumptions. This method ignores aspects of decision-making that may be relevant to societal decisions such as the equity of outcomes. There exist theoretically strong alternatives that capture distributional properties ignored by the average. Alternative procedures could be used in sensitivity analysis, ensuring the average is not ignoring some aspect of the heterogeneity of preferences that is relevant to the decision at hand.
Method: Building on our previously developed Expected Value of Individualized Care (EVIC) framework, we conceptualize new decision-relevant metrics to better understand and forecast the expected value of PM. Several aspects of behavior at the patient, physician and the payer level are considered that can inform the rate and manner in which PM innovations diffuse throughout the relevant population. We illustrate this framework and the methods using a retrospective evaluation of the use of OncotypeDx genomic test among breast cancer patients.
Result: The enriched metrics can help inform many facets of PM decision making, such as evaluating alternative reimbursement levels for PM tests, implementation and education programs for physicians and patients, and decisions around research investments by manufacturers and public entities. We replicated prior published results on evaluation of OncotypeDx among breast cancer patients, but also illustrated that those results are based on assumptions that are often not met in practice. Instead, we show how incorporating more practical aspects of behavior around PM could lead to drastically different estimates of value. For OncotypeDx, population returns to a social insurer ranged from $17Billions to $37Billion and from $4Billion to $10Billion in revenues for the manufacturer depending on the nature of reimbursement policies and diffusion patterns.
Conclusion: We believe that the framework and the methods presented can provide decision makers with more decision-relevant tool to explore the value of PM. There is a growing recognition that data on adoption is important to decision makers. More research is needed to develop prediction models for potential diffusion of PM technologies.