L USING MODELING TO MAKE BETTER MEDICAL DECISIONS

Tuesday, October 22, 2013: 1:30 PM - 3:00 PM
Key Ballroom 7,9,10 (Hilton Baltimore)
Category Reference for Presentations
AHEApplied Health Economics DECDecision Psychology and Shared Decision Making
HSPHealth Services, and Policy Research METQuantitative Methods and Theoretical Developments

* Candidate for the Lee B. Lusted Student Prize Competition

Session Chairs:
Torbjorn Wisloff, MSc and Michael W. Kattan, PhD
1:30 PM
L-1
(AHE)
Caterina Conigliani, PhD, Universita' di Roma Tre, Roma, Italy, Andrea Manca, PhD, MSc, The University of York, York, United Kingdom and Andrea Tancredi, PhD, Universita' di Roma 'La Sapienza', Roma, Italy
Purpose:   This paper proposes a novel modelling strategy for the analysis of the EQ-5D responses, which recognises both the likely dependence between the five dimensions of the questionnaire at the patient level, and the fact that the severity levels of each dimension are naturally ordered.  We also address the key problem of choosing an appropriate summary measure of agreement between predicted and observed data, when these models are used to develop mapping algorithms between patients reported outcome measures (PROMs).

Methods:   Using data from the Health Survey for England (HSE) and the National Health Measurement Study (NHMS) we develop a multivariate ordered probit (MVOP) model for the analysis of the EQ-5D responses and compare its performance against other approaches proposed in the literature, such as response mapping (eg. multinomial logit, ML) and univariate regression models (applied directly on the EQ-5D index score).  Models goodness‑of-fit assessment is carried out using the Deviance Information Criteria (DIC), while their in-sample and out-of-sample predictive abilities (crucial when developing mapping algorithms) are assessed using Bayesian proper scoring rules.   Departing from the use of measures based on the predicted mean such as the (root) mean squared error, scoring rules exploit instead the whole posterior predictive distribution of the parameters of the model, thus reflecting both central tendency and uncertainty in the prediction.  The analysis is implemented within a Bayesian framework.

Results The MVOP fits the two independent datasets better (DIC: 15,145 for the NHMS and 45,550 HSE) than the ML (DIC: 15,703 for the NHMS and 47,140 HSE) and of the independent ordered probit for each dimensions (DIC: 15,720 for the NHMS and 45,550 HSE).   Assessment of their predictive posterior distribution shows that the MVOP has better coverage of the central tendency measure (in-sample validation), and better out-of-sample predictive ability (0.531 for the MVOP vs 0.513 for the independent univariate ordered probit vs 0.481 for the ML).

Conclusions:   Explicit modelling of both correlation between the responses on each of the five dimensions of the EQ‑5D and the natural ordering of the severity levels within each dimension yields more accurate predictions.   Modelling at the response level, rather than at the index score, facilitates a more generalisable assessment of the EQ-5D responses which is not confounded by the valuation set used in each country.

1:45 PM
L-2
(MET)
Anahita Khojandi1, Lisa Maillart, PhD1, Oleg Prokopyev, PhD1, Mark S. Roberts, MD, MPP2 and Samir Saba, MD1, (1)University of Pittsburgh, Pittsburgh, PA, (2)University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA

Purpose: Cardiac implantable electronic device (CIED) leads fail stochastically, requiring the immediate implantation of a new lead(s). Because the total number of concurrently implanted leads (both functioning and failed) is subject to a maximum (i.e., five leads according to current guidelines), whenever a lead fails, it may be beneficial to extract this lead and/or any previously abandoned leads. Extraction, however, carries small but real life-threatening risks that increase in lead dwell time. Therefore, a tradeoff exists between maintaining space for new leads and avoiding risky extractions. Furthermore, surgical lead procedures involve a risk of infection. If an infection occurs, all implanted leads must be extracted. Hence, choosing to leave leads in place at the time of failures may result in risky, mandatory extractions. The purpose of this study is to determine a patient-specific extraction policy to maximize the expected lifetime of a single chamber pacemaker patient using a Markov decision process (MDP) model.

Method: We develop a MDP model to dynamically make extraction decisions at the time of lead failures as a function of patient and all lead ages. We also simulate this process to obtain prediction intervals on measures of interest including the expected patient lifetime and the likelihood of CIED-related deaths (as opposed to natural causes). Finally, we conduct comparisons to three heuristics commonly used in practice.

Results:   Under the optimal policy, the extraction decision for each lead only depends on its age, patient age, its rank among the lead ages and the total number of implanted leads, i.e., the decision does not depend on the exact ages of all implanted leads. Figure 1 illustrates the optimal lead maintenance policy for a specific, single chamber pacemaker patient. Compared to the heuristic policies, the optimal policy significantly decreases CIED-related deaths and increases the expected lifetime, e.g., under the policy in Figure 1, a 60-year-old patient with failed leads of ages 20, 17, 12 and 2, observes an increase (decrease) of up to 1.5 years (7%) in their expected lifetime (likelihood of CIED-related deaths).

Conclusion: Cardiac leads are often referred to as “the weakest link” in implantable cardiac device treatment. Despite its importance, lead maintenance varies widely from practice to practice. We develop an approach that helps clinicians make patient-specific lead extraction/abandonment decisions optimally.

2:00 PM
L-3
(MET)
Nicky J. Welton, PhD, Bristol University, Bristol, United Kingdom
Purpose: To illustrate how Expected Value of Sample Information (EVSI) can be used to assist the prioritisation of future randomised controlled trials when there are multiple competing health technologies. In particular, the decision as to how many arms and which technologies to include, as well as the sample size on each arm.

Methods: EVSI measures the expected net health gains from conducting a new research study given a proposed study design. EVSI relies on a synthesis of the current evidence available on treatment efficacy, and a cost-effectiveness model. Network Meta-Analysis (NMA) pools together evidence on relative efficacy of multiple competing health technologies that have been compared in Randomised Controlled Trials that form a connected network of comparisons. The results obtained from NMA provide a coherent basis on which to make comparisons across the entire set of treatments, and NMA is now commonly used to inform decision models to identify the most cost-effective treatment.

We describe methods to evaluate EVSI when the efficacy outcome is binary and the net benefit function is linear on the absolute probability scale. We distinguish between absolute effects (used in the decision model) and relative effects (which the RCT provides information on). The methods allow for heterogeneity in the existing NMA evidence, which forms a hierarchical prior for the result from the new study. We view this hierarchical prior structure as data so that we can obtain a posterior, given new data, in closed form. We use a Taylor series approximation to obtain the updated expectation of the net benefit given new data, without needing an inner simulation step.  

Results: We illustrate the approach using as an example a network meta-analysis and cost-effectiveness analysis of 6 competing treatments for bipolar disorders, to identify the optimal number of arms and sample size per arm to include in a new study to inform this decision.

Conclusions: EVSI can be a valuable tool to assist in the prioritisation and optimal design of new research studies when there are multiple competing technologies.

2:15 PM
L-4
(MET)
Thomas Trikalinos, MD1, David Hoaglin, PhD2, Kevin Small, PhD3, Norma Terrin, PhD4 and Christopher H. Schmid, PhD1, (1)Brown University, Providence, RI, (2)Sudbury, MA, Sudbury, MA, (3)NIH, Bethesda, MD, (4)Tufts Medical Center, Boston, MA
Purpose: Existing methods for meta-analysis of diagnostic test accuracy focus primarily on a single index test rather than comparing two or more tests that have been applied to the same patients in paired designs. We develop novel methods for the joint meta-analysis of studies of diagnostic accuracy that compare two or more tests on the same participants.

Method: We extend existing bivariate meta-analysis methods to simultaneously synthesize multiple index tests. The proposed methods respect the natural grouping of data by studies, account for the within-study correlation (induced because tests are applied to the same participants) between the tests’ true-positive rates (TPRs) and between their false-positive rates (FPRs), and allow for between-study correlations between TPRs and FPRs (such as those induced by threshold effects). We focus mainly on algorithms in the Bayesian setting, using discrete (binomial and multinomial) likelihoods. We use as an example a meta-analysis of 11 studies on the screening accuracy of detecting Down syndrome in liveborn infants using two tests: shortened humerus (arm bone), and shortened femur (thigh bone). Secondary analyses included an additional 19 studies on shortened femur only.

Result: In the application, separate and joint meta-analyses yielded very similar estimates. For example, in models using the discrete likelihood, the summary TPR for a shortened humerus was 35.3% (95% credible interval [CrI]: 26.9, 41.8%) with the novel method, and 37.9% (27.7 to 50.3%) when shortened humerus was analyzed on its own. The corresponding numbers for the summary FPR were 4.9% (2.8 to 7.5%) and 4.8% (3.0 to 7.4%).

However, when calculating comparative accuracy, joint meta-analyses resulted in shorter credible intervals compared with separate meta-analyses for each test. In analyses using the discrete likelihood, the difference in the summary TPRs was 0.0% (-8.9, 9.5%; TPR higher for shortened humerus) with the novel method versus 2.6% (-14.7, 19.8%) with separate meta-analyses. The standard deviation of the posterior distribution of the difference in TPR with joint meta-analyses is half of that with separate meta-analyses.

Conclusion: The joint meta-analysis of multiple tests is feasible. It may be preferable to separate analyses for estimating measures of comparative accuracy of diagnostic tests, and therefore, of primary interest in parameterizing models that compare diagnostic strategies. Simulation and empirical analyses are needed to better define the role of the proposed methodology.

2:30 PM
L-5
(MET)
Joseph F. Levy1, David J. Vanness, Ph.D.2, Yirong Wu, PhD1 and Elizabeth S. Burnside, MD, MPH, MS1, (1)University of Wisconsin-Madison, Madison, WI, (2)Department of Population Health Sciences, Madison, WI

Purpose: To optimize recommendations for biopsy after mammography using random forests and maximum expected utility.   

Methods: We used a dataset of 62,219 mammographic findings matched with cancer registry data to construct a random forest estimating the probability that each finding is malignant (positive).  Random forests consist of an ensemble of classification trees constructed using randomly resampled data and randomly selected subsets of predictor variables and tuned to improve out-of-sample prediction.  We used patient demographic risk factors, radiologist-observed standardized descriptors using the Breast Imaging-Reporting and Data System (BI-RADS) lexicon, radiologist subjective opinion (BI-RADS category 0-5, indicating increasing likelihood of malignancy) and the eventual outcomes (benign/malignant) of the finding to recursively partition the data into groups with different probabilities of malignancy.  We applied previously reported estimates of utilities associated with false positives, true positives and false negatives (relative to true negative) to calculate expected utility associated with different thresholds and used the threshold that maximizes expected utility to determine the “optimal” random forest.

Results: ROC curves were constructed from the BI-RADS categories assigned by the radiologists and the predicted malignancy probabilities of the random forest (Figure 1). The radiologists operating point is regularly considered at the BI-RADS category 3 corresponding to a threshold above which biopsy would be recommended (approximately 2% likelihood of malignancy). The random forest improved AUC overall (0.948 vs. 0.935), comparing the forest at the 2% classification threshold, improved sensitivity (85.4% vs. 85.3%) and specificity (97.55% vs. 88.1%). When considering maximum expected utility, the optimal threshold of predicted malignancy by the forest was 0.4%, altering sensitivity and specificity to 88.6% and 96.3% respectively.

Conclusion: Random forests have the potential to improve the accuracy of biopsy recommendations over standard practice.  When accounting for the relative consequences of true and false positives and negatives, the threshold for recommending biopsy using a random forest differs from regular threshold used by radiologists.

2:45 PM
L-6
(MET)
Hawre Jalal, MD, MSc, Michel Boudreaux, MSc and Karen M. Kuntz, ScD, University of Minnesota, Minneapolis, MN
   Purpose:   Modelers lack a simple tool to examine decision sensitivity (i.e., the change in the probability of a strategy being optimal due to parameter uncertainty).  We propose multinomial logistic regression (MNR) metamodeling to reveal decision sensitivity.       Methods:  MNR is useful in analyses where the dependent varaible is categorical and not ordered. In this study, we apply MNR in a novel way to analyze the probabilistic sensitivity analysis (PSA) from a decision model in order to reveal decision sensitivity.  We demonstrate our approach with a previously published decision model for treating a suspected case of herpes simplex encephalopathy.  The model compares three strategies: treat everyone, biopsy, and do not treat or biopsy.  We performed 10,000 PSA scenarios.  For the MNR, we treated the model's input parameter values as independent variables and the optimal strategy in each iteration as the dependent variable. In this capacity the MNR is a second (meta) model.  Because the regression coefficients are difficult to interpret, we report the marginal effects (ME) as a direct measure of decision sensitivity.  The MEs measure the change in the probability of each strategy being optimal due to one unit change in each parameter.  Furthermore, we developed a new score, the sum of absolute marginal effects (SAME) to combine the ME of a parameter on all the strategies, and compared our results to expected value of partial perfect information (EVPPI).       Results:   The probability of severe sequalae following biopsy was associated with the highest decision sensitivity.  The ME of this parameter on biopsy was -0.28, indicating that the probability of biopsy being optimal decreases by 0.28 if the value of this parameter is increased by one standard deviation from its mean.  Similarly, the importance of all the model parameters were ranked by their ME and SAME scores.  In addition, the SAME scores were highly correlated with the EVPPI (correlation coefficient = 0.97) (see Figure).       Conclusion:   Regression analsyis can be used to evaluate the impact of decision model parameters and is highly correlated with EVPPI results.