Monday, October 19, 2015: 4:30 PM - 6:00 PM
Grand Ballroom B (Hyatt Regency St. Louis at the Arch)

4:30 PM

Reza Yaesoubi, PhD, MSc, Forrest Crawford, PhD and A. David Paltiel, PhD, Yale School of Public Health, New Haven, CT

Purpose: Although it plays a central role in cost-effectiveness analysis (CEA), society's willingness to invest for an additional unit of health is rarely known to policy makers. Our goal is to develop a statistical method to help decision-makers determine whether a new healthcare alternative is considered cost-effective in the absence of exact value for the willingness-to-pay for health (WTP).

Method: Our method utilizes a probability density function P to represent the policy maker's uncertain belief about the true value of WTP. The proposed method calculates a probability p that corresponds to the p-value of the hypothesis that the net monetary benefit (NMB) of a new alternative is less than or equal to that of an existing alternative when the true WTP value is randomly drawn from P. If p is less than a desired significance level, we reject this hypothesis, and consider the new alternative cost-effective under the WTP belief P. Our method also calculates the expected NMB gain under P if the new alternative is chosen. This information allows statistical comparison of the cost-effectiveness of multiple interventions. To demonstrate the application of our method, we consider two hypothetical alternatives, both of which present the same incremental cost-effectiveness ratio of $20,000 per unit of health but result in substantially different health and financial outcomes (Figure A-B). These alternatives also yield the same cost-effectiveness acceptability curves (CEAC) (Figure C-D), a popular tool used in CEA when the true value of WTP is unknown.

Result: When the policy maker's belief about the WTP value follows a Gamma distribution with mean $50,000 and StDev $5,000 (Figure E), both Alternatives 1 and 2 are considered cost-effective at significance level 0.05 (p-value < 0.01, Table). While CEACs suggests that Alternatives 1 and 2 perform equally well (Figure C-D), our method determines that Alternative 2 has significantly higher expected NMB gain, and hence, should be preferred to Alternative 1. In this example, when the policy maker's belief about the WTP value is uninformative (Figure F), neither of these alternatives is considered cost-effective (p-value > 0.2).

Conclusion: We developed a method to statistically evaluate and compare the cost-effectiveness of healthcare alternatives under uncertainty about the WTP value. We showed how our approach can overcome the limitations of CEACs commonly used in CEA. 

4:45 PM

Ying Lin, MS, Shuai Huang, PhD and Shan Liu, PhD, Industrial and Systems Engineering, University of Washington, Seattle, WA
Purpose: Depression is a common, complex, and dynamic mental disorder. Mitigating depression has become a national health priority as it affects 1 out of 10 American adults and is the most common mental illness seen in primary care. While the emerging use of electronic health record (EHR) in health care provides an unprecedented information infrastructure, the complex dynamics of individual’s depression trajectory and the widely reported heterogeneity of the depression population are two major challenges for monitoring depressive patients. The objective of this study is to effectively analyze patterns in the collected depression trajectories of a treatment population and proactively probe new trajectories for monitoring treatment outcomes.

Method: Our data contain longitudinal Patient Health Questionnaire (PHQ)-9 scores over 4 years for assessing depression severity from the Mental Health Research Network. The PHQ-9 scores are linked to time between observations, type of providers, age, sex, treatment status, and Charlson comorbidity score of the patients. We analyzed >6,000 patients with at least four PHQ-9 observations who have on-going treatment. We first used smoothing splines to model each depression trajectory. We then used K-means clustering, recursive partitioning, and collaborative degradation model (CDM) to identify the subgroup patterns. CDM considers the underlying cluster structure embedded in the population and the resemblance of the individuals to these clusters. Lastly, for >3,000 patients with at least six PHQ-9 observations, we compared the individual growth model (IGM), mixed effect model (MEM), CDM, and CDM with network regularization (NCDM) on their predictive performance on the last two observations within each subject.

Result: We found five trajectory patterns in the on-going treatment population: stable high, stable low, stable moderate, an increasing and a decreasing group. The increasing and decreasing groups converge and become stable around a PHQ-9 score of 10 to 15. For prediction, the root mean square error in the testing set for IGM, MEM, CDM, and NCDM are 21.98, 6.12, 5.24, and 3.46.

Conclusion: We established a trajectory-based framework for depression diagnosis and prognosis adaptable to population heterogeneity using electronic health record data. Clustering provides an effective tool for characterizing the trajectory patterns of the depression population. For prediction, we found the NCDM achieved the highest performance.

5:00 PM

Diana Negoescu, PhD, University of Minnesota, Minneapolis, MN, Heiner Bucher, MPH, Swiss HIV Cohort Study, Basel Institute for Clinical Epidemiology & Biostatistics, Basel, Switzerland and Eran Bendavid, MD, MS, Stanford Health Policy, Centers for Health Policy and Primary Care and Outcomes Research, Department of Medicine, Stanford University, Stanford, CA

Purpose: Early detection of virologic failure in HIV patients on antiretroviral therapy may improve the health outcomes of patients and reduce transmission. HIV RNA (viral load) monitoring is a costly yet common technology for detecting failure.  We assess the health benefits and cost-effectiveness of strategies for viral load monitoring in resource-limited contexts.

Methods: We created a microsimulation model parameterized using longitudinal cohort data.  We use the Swiss HIV Cohort Study (SHCS) data to estimate: 1) predictors of virologic failure; and 2) CD4 evolution during virologic failure and on antiretroviral therapy (ART).  We model the probability of virologic failure as a function of self-reported patient adherence, time on regimen, age and gender.  Adherence is also modeled as a time-varying process that depends on the patient's previous adherence status, age, gender and education level.  Individual CD4 counts are modeled using quantile regression models, where CD4 progression depends on failure status, previous CD4 count, time since ART initiation, CD4 nadir, age and gender.  We develop a microsimulation model informed by the data, and simulate 30,000 HIV patients in Uganda for 10 years following ART initiation. We then use our model to evaluate the total costs and QALYs achieved over a 10-year period by four viral load monitoring frequencies: every 3, 4, 6 or 12 months.

Results: The model was validated by matching the five-year survival rates and the opportunistic infection-free survival rates within the 95% confidence intervals of the DART randomized clinical trial.  The average total number of months spent in virologic failure per patient over the 10 years simulated ranged from 2.07 for the 3-month interval policy to 4.25 for the 12-month policy. The percentage of the patients who switched to second-line regimen by the end of the 10-year period ranged from 31.6% for the 12-month policy to 36.8% for the 3-month policy.  In comparison with monitoring viral load every 12 months, more frequent monitoring marginally increased QALYs.  Compared with 12-monthly monitoring, 3-monthly monitoring yields on average a gain of 0.0595 QALYs per patient, at an incremental cost of $821. 

Conclusions: In resource-limited settings, high-frequency viral load monitoring relative to yearly monitoring costs more per QALY gained than many HIV interventions.  Use of direct person-level data can inform model construction and improve parameter estimation for diverse populations.

5:15 PM

Sze-chuan Suen, MS1, Jeremy D. Goldhaber-Fiebert, PhD2 and Margaret L. Brandeau, PhD1, (1)Department of Management Science and Engineering, Stanford University, Stanford, CA, (2)Stanford Health Policy, Centers for Health Policy and Primary Care and Outcomes Research, Department of Medicine, Stanford University, Stanford, CA

Purpose: Economic evaluations of infectious disease control interventions frequently use dynamic compartmental epidemic models. Such models capture heterogeneity in risk of infection by stratifying the population into discrete risk groups, thus approximating what is typically continuous variation in risk with discrete groups. An important open question is whether and how different risk stratification choices influence model predictions.

Method: We developed equivalent Susceptible-Infectious-Susceptible dynamic transmission models: an unstratified model and models stratified into high-risk and low-risk groups. All model parameters other than contact rate(s) were identical. Stratified models differed from one another in terms of the proportion of the population that was high risk (a) and the contact rates in the high- and low-risk groups, though the overall contact rate in all models was equal. Models were equivalent in the sense that absent intervention, they all produced the same overall prevalence of infectious individuals at all times. We introduced a hypothetical intervention that reduced the contact rate and applied it to a proportion of the population, irrespective of risk group in the stratified models. We addressed two questions: 1) Does the choice of where to discretize risk alter the model-predicted effectiveness (cases averted) of an intervention relative to an unstratified model? 2) If so, how are deviations from the unstratified model's predicted effectiveness related to the choice of discretization? To answer these questions, we chose an example set of model parameters and examined model predictions following the discretization of various population distributions of contact rates.

Result: For models that produce equivalent epidemic predictions in the absence of intervention, we find that the predicted number of cases averted depends upon how the population's distribution of contact rates is discretized into high- and low-risk groups (Figure 1, Panel A and B). Additionally, Panel A shows that unstratified models may produce a higher estimate of effectiveness than the stratified models, and the extent of this difference depends on the underlying distribution of risk. Deviation from the prediction of the unstratified model (a = 0) is largest when a takes on intermediate values between 0 and 1 (Panel B).

Conclusion: The choice of how to discretize risk in compartmental epidemic models can influence predicted effectiveness of interventions. Analysts should carefully examine multiple alternatives and report the range of results.

5:30 PM

Christopher Parker, MSc, ICON Health Economics & Epidemiology, Oxford, United Kingdom and Neil Hawkins, PhD, London School of Hygiene and Tropical Medicine, London, United Kingdom

Certain endpoints, such as progression-free survival, are by definition interval censored. The exact time of the event is unknown; rather we only know that it occurred between two assessment times. However, this censoring is typically ignored in survival analysis for cost-effectiveness analysis, despite the fact that statistical methods for taking account of the interval censoring are well established.

The objective of this study was to investigate, in the context of cost-effectiveness analysis, the potential bias that may occur if interval censoring is not accounted for in survival analysis.


Time-to-event data including interval censoring were simulated. 10,000 sets of 500 event times (representing a typical trial) were simulated for two treatment groups from Weibull distributions that had common shape but different scale parameters. Interval censoring was simulated assuming that assessments (for example, for progression) were conducted every four months. The actual event times were then rounded down and up to the nearest assessment to form the left and right hand censoring times respectively.

The mean time-to-event was estimated using two different parametric survival models; (i) assuming no censoring, and (ii) accounting for the interval censoring. The degree of bias in the mean difference in time-to-event between treatments was assessed by comparing the mean difference estimated from both models with the true value estimated using the parameters of the distribution. Two scenario analyses were conducted; decreasing the frequency of assessments to eight months, and increasing the event hazard in the control arm.


When interval censoring was ignored, the mean time-to-event difference was overestimated (bias = 1.91 months). The bias was reduced when methods that account for interval censoring were used (bias = 0.51 months).  When the frequency of assessments was decreased to eight months, accounting for the interval censoring reduced the bias from 1.47 months to -0.01 months. Similarly, when increasing the event hazard in the control arm, accounting for the interval censoring reduced the bias from 1.75 months to -0.70 months.


Interval censoring is a common finding in clinical studies, resulting from periodic assessments for events such as disease progression. When interval censoring is present, ignoring this censoring will yield biased estimates of mean time-to-event and potentially cost-effectiveness. Therefore, accounting for interval censoring is important when estimating time-to-event curves for cost-effectiveness analysis.

5:45 PM

Zhanglin Lin Cui, PhD1, Lisa M. Hess, PhD2, Robert J Goodloe, MS1, Gebra Cuyun Carter, PhD1 and Douglas E Faries, PhD1, (1)Eli Lilly and Co, Indianapolis, IN, (2)Eli Lilly and Company/Indiana University, Indianapolis, IN
Purpose: Conventional pairwise propensity score matching (PSM) has significant limitations when applied to multiple cohorts due to the lack of common support of matched patients across comparisons. This study compares the generalized PSM to the conventional pairwise PSM in assessing the comparative effectiveness of common regimens used in the second-line treatment of lung cancer in the US.

Method: IMS Oncology patient-level EMR data were used for this study. Eligible patients were those with a diagnosis of lung cancer (ICD-9-CM 162.2-162.9) from 1/1/2007-6/30/2013 who received at least two lines of treatment. Generalized propensity scores were estimated using multinomial logistic regression. A region of common support with sufficient overlap in the covariate distribution and minimum variance of the covariate space was identified. Generalized PSM with replacement was performed on the common support to obtain estimated outcomes under each regimen for each patient. Balance among the cohorts was assessed by using absolute standardized differences (ASD) in covariates. Cox proportional hazards model was used for survival analysis after the generalized PSM and compared to outputs after conventional pairwise PSM. Bootstrapping was conducted as a sensitivity analysis.

Result: The five most common lung cancer regimens were identified, resulting in a total sample size of 5,222 patients. Generalized PSM used 61.2% of the patient sample while the conventional pairwise PSM used 24.1-77.1% of the patient sample across the 10 comparisons. Perfect balance (ASD=0) among the regimens was achieved on each covariate after generalized PSM by definition; acceptable balance was achieved in the conventional pairwise PSM with ASDs<0.1. Using the generalized PSM, median overall survival ranged from 5.6-8.9 months among the top 5 regimens; 8 out of the 10 survival comparisons achieved statistical significance (p<0.05). Similar results were obtained from bootstrapping. Using the conventional pairwise PSM, the median overall survival ranged from 5.6-9.5 months among the top 5 regimens and only 1 out of the 10 survival comparisons achieved statistical significance (p<0.05). The noted differences arose from different matched patient samples and the size of the samples. 

Conclusion: The generalized PSM allows for comparisons across multiple cohorts using a common support while removing bias from observed covariates under the ‘no unmeasured confounding’ assumption and may have potential applications in observational studies with multiple cohorts.