* Candidate for the Lee B. Lusted Student Prize Competition

** Purpose: ** Economic evaluations can mask important sources of heterogeneity that may lead to suboptimal reimbursement decisions for subgroups of patients. For dichotomous characteristics (i.e. sex) heterogeneity can easily be incorporated in economic evaluations using subgroups. However, when the characteristic is continuous (i.e. age), the optimal cut-off point for subgroups is uncertain. This study provides a stepwise method to determine cut-off points based on incremental net monetary benefit (iNMB), as opposed to the traditional cut-off based on clinical effectiveness or costs.

** Methods: ** First, an economic evaluation is performed on all available evidence, including parameter uncertainty and possible sources of heterogeneity. Incremental cost-effectiveness ratios (ICERs), cost-effectiveness acceptability curves and expected value of perfect information are calculated for the total population. When decision uncertainty exists, the second step is to examine the relationship between the source of heterogeneity and iNMB. The third step is to repeat step 1 for a range of cut-off points for subgroups. As an illustration we use a hypothetical economic evaluation of a new treatment for lung cancer. Heterogeneity is incorporated in terms of the continuous characteristic tumor size (range 1-8 centimeters). The relation between tumor size and probability of recurrence is incorporated using regression analysis.

** Results: ** First, an ICER of €87149 per quality adjusted life year (QALY) for the population was estimated. Assuming a maximum threshold of €40000 per QALY, this resulted in a negative iNMB and a cost-effectiveness probability of 27%. Second, regression analysis showed that the iNMB was higher for smaller tumor sizes. The maximum tumor size for which treatment was cost-effective depended on the threshold value for a QALY (Figure 1). Third, narrowing the indication to a subgroup with a maximum tumor size of 4 centimeters decreased the ICER to €23270. For a threshold value of €40000 per QALY, the cost-effectiveness probability for this subgroup was 79%.

** Conclusions: ** The present study illustrates a possible solution for incorporating continuous heterogeneity characteristics in economic evaluations and decision-making. Instead of basing subgroups on clinical evidence, the relationship between the characteristic and the iNMB, for a range of thresholds, is examined, to give decision-makers the opportunity to define for which subgroups the treatment cost-effectiveness is acceptable. Still, more research is needed on methodology for incorporating and presenting heterogeneity for continuous parameters in economic evaluation.

**Purpose: ** To evaluate the relationship between preferences for quantitative information in risk communication and objective measures of how well the quantitative information is understood.

**Method: ** A cross sectional survey was conducted. Participants were randomly recruited from an adult primary care population. Clinical vignettes were developed to simulate a discussion between a patient and physician regarding the risks and benefits of breast and colorectal screening, respectively. The clinical vignettes presented risk information using both probability and frequency formats. After reading the vignettes, participants responded to multiple choice questions testing their understanding of the information presented. In a second exercise, participants took the Medical Data Interpretation Test (MDIT). This MDIT is a validated instrument that assesses that ability to critically interpret the results of medical studies as they may be presented to the lay public. Upon completion of the clinical vignette exercise and MDIT, respectively, participants were asked their perceptions of the usefulness of presenting such information with numbers. Univariate and multivariate analyses were done to evaluate the association of preferences for numeric information with an objective measure of how well the information was understood.

**Results: ** There were 359 participants in the study; 70% were white and 27% were black. Twenty-eight percent (28%) had no more than a high school education. Eighty-six percent (86%) and 78% strongly agreed or agreed that the quantitative information related to cancer screening and medical studies respectively, was useful. Clinical vignette knowledge scores ranged from 0 to 9 (mean 6.2, SD 2.0). MDIT scores ranged from 1-18 (mean 9.5, SD 3.2). In univariate analysis, those that strongly agreed or agreed that quantitative information was helpful were no more likely to correctly interpret the clinical vignettes than those who were neutral or did not agree. However, MDIT scores were higher among those that strongly agreed or agreed that numbers were helpful in communicating information about medical studies (p=0.018). This association persisted after controlling for age, gender, race, education, and income (p=0.049).

**Conclusion: ** In this study, the association of perceived usefulness with objective knowledge of the information presented was inconsistent and varied with context. These findings highlight the need to assess level of understanding of quantitative information irrespective of the subject perceptions of the patient regarding the desire for quantitative information.

**Purpose: ** As Bayesian statistical approaches have gained broader acceptance within the clinical-trial community, the Centers for Medicare and Medicaid (CMS) sought to assess the impact of such techniques on policy-level decision making. We performed a case-study of Bayesian approaches in the clinical domain of implantable cardioverter defibrillator (ICD) therapy for the prevention of sudden cardiac death.

**Method: ** We considered patient-level data from eight ICD trials representing 6,286 patients and two decades of evidence. We considered two treatment groups (ICD versus control) and four baseline prognostic variables (age, ejection fraction, NYHA class, and ischemia) to capture some of the differences in trial designs. We explored the use of frequentist or Bayesian techniques in combining data from trials (1) without adjustments for (potential) trial effects, (2) adjusting for trial effects using fixed or random effect, and (3) assuming trial-specific baseline hazard functions. We performed sensitivity analyses on priors used in our Bayesian analyses.

**Result: ** Under all model formulations considered, there is evidence of a treatment effect on overall survival. Estimates from Bayesian models are generally similar to those obtained under frequentist models. Under the full Bayesian hierarchical model that accounts for trial variation in the baseline-hazard, main and interaction effects, we found differential ICD effect across trials. This variation could be due to differences in the devices, in the underlying medical care, or in patient characteristics that are currently not included in our analysis. When considering only data from a single trial, our results are more sensitive to prior choices. Increasing the sample size (by combining data from trials) reduces the sensitivity to prior choices. We found no evidence of interactions between treatment and any of the prognostic variables.

**Conclusion: ** Bayesian models flexibly allow for borrowing of information while also allowing for different treatment and subgroup effects across trials. They provide more precise estimates of the treatment effects and may reconcile what could be an unexpected result from a single trial. When considering Bayesian estimation the role of priors should be examined through a sensitivity analysis. Incorporation of Bayesian techniques into CMS decision-making process may enable policymakers to harness the power of available evidence, explore subgroup effects within a trial and across trials in a methodologically rigorous manner, and assess the uncertainty in clinical trial findings.

**Purpose: **Simple and easy yet well performing diagnostic models are preferable for successful implementation into daily clinical practice. Therefore, we present some approaches for cost-sensitive variable selection within the context of ovarian tumor diagnosis using logistic regression.

**Method: **We performed variable selection on data from 1938 females with an ovarian tumor (542 malignancies) using 31 candidate predictors. Variable cost was scored from 1 to 5, based on time-related and financial constraints, subjectivity, and patient impact. Stepwise selection based on the Akaike information criterion (AIC), Schwarz’ Bayesian information criterion (BIC), or the area under the ROC curve (AUC) was considered. To account for variable cost, the penalty term *k*p* for AIC (*k*=2) and BIC (*k*=log(*n*)) was replaced by *k**(Σ*c *+ 1), with *p* the number of coefficients and Σ*c *the total cost for the variables in the model. The original cost values, i.e. 1 to 5, were also linearly rescaled to 1 to *C* to vary the impact of variable cost. Cost was accounted for in the AUC criterion by subtracting *m**Σ*c* from the training AUC (rounded at three decimals), with *m* representing the impact of variable cost. If *C*=1 (AIC/BIC) or *m*=0 (AUC), no penalization for variable cost is induced. One thousand random train-validation splits of the data set were created (70% vs 30%). After variable selection, the training AUC and validation AUC were recorded, as well as the number of selected variables, the total cost Σ*c*, and the average cost per selected variable. We combined results for the 1000 train-validation splits using box plots and averages.

**Result: **For all three criteria, similar results were obtained. Compared to no penalization, increasing impact of variable cost by varying *C* or *m* strongly reduced the total cost with very limited reductions in training or validation AUC (e.g. -60% versus -5%). The reduction in total cost was mainly caused by selecting fewer variables. The average cost per selected variable also decreased as the selection of high-cost predictors was increasingly discouraged.

**Conclusion: **The straightforward incorporation of variable cost into variable selection for logistic regression can result in clearly cheaper and simpler models with limited loss of discriminatory performance. Further work will focus on other applications, sample size, cross-validated AUC, polytomous diagnosis, and on other methods than stepwise selection.

**Purpose: ** Cost-effectiveness analyses (CEA) often use non-randomised studies (NRS) to compare treatment groups. Baseline covariates may therefore be highly imbalanced. Conventional methods for addressing selection bias assume the treatment selection or response model is correctly specified, but usually the specification is unknown. Instead, we develop machine learning techniques that combine non-parametric methods for both matching and covariate adjustment.

**Method: ** Machine learning using a non-parametric matching method, Genetic Matching, can improve covariate balance in CEA (Sekhon and Grieve 2008). This paper extends machine learning approaches to CEA with covariate imbalances after Genetic Matching. We use ‘super learning’ (Van der Laan et al, 2007) for post matching bias adjustment. The ‘super learner’ is a computer algorithm which applies a set of candidate adjustment methods or learners (e.g. least squares, spline regression) to different portions of the data (test samples). The ‘super learner’ develops the ‘optimal learner’ by weighting candidate learners according to their relative performance in the remaining data (validation samples). We compare machine learning with propensity score matching. We use CEA of Pulmonary Artery Catherization (PAC) from a RCT (n=1,014) and the corresponding NRS (n=38,000). Identical measures were recorded across the settings for 40 baseline covariates. We match RCT treated cases to NRS controls using propensity score versus Genetic Matching. The super learning approach finds and applies the optimal covariate adjustment after Genetic Matching. We compare cost-effectiveness from these methods to the RCT.

**Result: ** The RCT reported mean incremental net benefits (INB) for PAC that were not significantly different from zero (λ=£30,000, INB -£3,000 [95% CI -£22,000 to £12,000]). The NRS results differed by method. Following propensity score matching, covariate balance was poor; PAC was associated with increased mortality and negative INBs (corresponding INB, -£60,000). Covariate balance much improved with Genetic Matching (baseline probability of death; *p*=0.93), but some imbalances remained. The super learner minimised residual biases following Genetic Matching, the INBs were similar to the RCT. Matching RCT and NRS controls gave similar net benefits, suggesting the main identifying assumption holds in this context.

**Conclusion: ** Machine learning provides CEA with flexible methods for matching and post matching adjustment that avoids parametric assumptions. These methods are doubly robust: if either the matching model or the response surface model is correctly specified, the estimates are robust.