H METHODS FOR COMPARATIVE EFFECTIVENESS RESEARCH

Friday, October 19, 2012: 1:00 PM-2:30 PM
Regency Ballroom C (Hyatt Regency)

Session Chairs:
Bruce R. Schackman, PhD and Lisa A. Prosser, M.S., Ph.D.
1:00 PM
H-1
(HSP)
Jennifer Schneider Chafen, M.D., M.S.1, Daniella J. Perlroth, MD2, Cathie Markow, MBA, RN1 and Dena M. Bravata, MD, MS1, (1)Castlight Health, San Francisco, CA, (2)Stanford University, Stanford, CA

Purpose:   Health plans are increasingly offering procedure-specific hospital designations (e.g., Centers-of-Excellence [COE]) to signify high quality care.  Further, many self-insured employers are instituting benefit designs to incentivize employees to preferentially utilize these centers.  Consumers often seek publicly available quality information when choosing a facility for elective surgical procedures. If these data conflict with COE designations, consumer confusion could increase. The purpose of this study is to evaluate the publicly reported quality metrics for facilities designated as COEs.  

Methods:   We evaluated two publicly-reported quality metrics from the healthcare.gov consumer website on patient satisfaction and surgical safety practices for COE-designated facilities for a self-insured employer for five elective surgical procedures (hip replacement, knee replacement, spinal fusion, disc surgery, and bariatric surgery). The patient satisfaction measure used was the percent of patients responding “would definitely recommend this hospital” on the 2011 Hospital Consumer Assessment of Healthcare Providers and Systems [HCAHPS] survey. We only included those facilities in the HCAHPS evaluation if at least 100 patients responded to the survey. The surgical safety measure was a weighted composite score from the 2011 Surgical Care Improvement Project [SCIP]. We only included those facilities in the SCIP composite measure evaluation if at least 30 patients provided data for at least 7 out of 9 measures.  

Results: 3,089 facilities met inclusion criteria for the HCAHPS comparison. 25% of the COEs for all five procedures were in the 0-25th percentile of patient satisfaction (range: 8% for disc surgery to 50% for bariatric surgery). 4% of the COEs for all five procedures were in the 95th-100th percentile (range: 0% for bariatric surgery to 6% for disc surgery and hip replacement).  2,455 facilities met inclusion criteria for the SCIP composite score.  1% of disc surgery and spinal fusion COEs and 11% of bariatric surgery COEs were in the 0-5th percentile.  None of the bariatric surgery COEs and 9% of spinal fusion COEs were above the 95th percentile.  

Conclusions: Health plan COE designations are inconsistent with publicly reported quality metrics—with some COEs performing among the worst facilities in the US. To avoid consumer confusion, employers implementing COE programs should carefully communicate with their employees about how the COEs are selected and how best to incorporate COE designations and publicly reported quality metrics in their decision making.

1:15 PM
H-2
(HSP)
W. Wei, PhD, MS, MBA1, J. Frimpter1, K. Edwardson2, D. Mitchell1 and MG Savella2, (1)Sanofi US, Bridgewater, NJ, (2)Doctor Evidence, LLC, Santa Monica, CA

Purpose: To synthesize real-world evidence on outcomes among patients with type 2 diabetes mellitus (T2DM) who initiated insulin glargine via disposable pen versus vial/syringe.

Method: We performed a meta-analysis of previously reported retrospective studies conducted in 4 different databases with a common data structure framework (consistently defined study design and measures). All four studies included adult T2DM patients previously treated with oral anti-diabetes drugs and/or glucagon-like peptide-1 therapy only, who initiated insulin glargine via disposable pen (GLA-P) or vial/syringe (GLA-V) between 2007 and 2009. All patients had to have continuous health plan enrollment 6 months prior to insulin initiation (baseline), and 12 months after (follow-up). In each study, baseline differences between GLA-P and GLA-V patients were balanced using stringent 1:1 propensity score matching. Study measures defined consistently across all four studies included 1-year follow-up treatment persistence and adherence, healthcare utilization, and hypoglycemia events. Data was analyzed with random effects modeling, using a unique evidence synthesis platform (Doctor Evidence®, Santa Monica, CA), with I2 to indicate degree of heterogeneity across studies.

Result: A total of 22,234 patients were pooled, and baseline characteristics for GLA-P (N=11,117) and GLA-V (N=11,117) patients were similar across each individual study. During 1 year follow-up, GLA-P patients were 25% more likely to be persistent (39.5% vs. 31.5%, p<0.0001, relative risk (RR) = 1.25, 95% Confidence Interval (CI) 1.15-1.37, I2 = 85.7%) and adherent (mean difference = 0.04, 95% CI 0.03-0.05; I2 = 10.24%), averaging an additional 30.3 days on treatment (95% CI 21.64-38.99; I2 = 81.8%). GLA-P patients were also 24% less likely to have hypoglycemic events (6.4% vs 8.5%; RR=0.76, 95% CI 0.69-0.83; I2 = 0%) and 15% less likely to have hospital visits (21.7% vs 25.7%; RR=0.85, 95% CI 0.81-0.89; I2 = 22.61%), but 26% more likely to have endocrinologist visits (22% vs. 17%, RR=1.26, 95% CI 1.1-1.45; I2 = 83.76%). Heterogeneity varied across analyses. Sensitivity analyses yielded consistent results with the primary analysis.

Conclusion: This meta-analysis supports previous findings from individual studies, suggesting improved outcomes associated with disposable pen versus vial/syringe for T2DM patients initiating insulin glargine therapy. Additionally, application of a common data structure across studies, combined with the unique evidence synthesis platform, enables reliable pooling of retrospective database studies and facilitates synthesis of real-world evidence.

1:30 PM
H-3
(HSP)
Michael Rothberg, MD, MPH1, Penelope Pekow, PhD2, Aruna Priya, MA, MSc2, Marya Zilberberg, MD, MPH3, Raquel Belforti, DO2, Richard Brown, MD2, Daniel Skiest, MD2 and Peter K. Lindenauer, MD, MSc2, (1)Department of Medicine, Springfield, MA, (2)Baystate Medical Center (Tufts University), Springfield, MA, (3)University of Massachusetts, Amherst, MA

Purpose:   Clinical prediction instruments generally incorporate clinical data, whereas models derived from administrative data make use of information coded at discharge.  We constructed a mortality model derived from highly detailed administrative data acquired during the first 48 hours of admission.

Methods:   Our dataset included information on all patients aged ≥18 years with a principal diagnosis of pneumonia or a secondary diagnosis of pneumonia paired with a principal diagnosis of sepsis, respiratory failure/arrest or influenza, who were admitted between 07/01/07 and 06/30/10 to 347 hospitals that participated in Premier's Perspective database.  The dataset was divided into a derivation and validation set.  We derived an HGLM inpatient mortality model that included patient demographics, co-morbidities, acute and chronic medications, therapies and diagnostic tests administered in the first 48 hours of admission as well as interaction effects.  The final model was applied to the validation set.

Results:   The dataset included 200,870 patients in the derivation cohort and 50,037 patients in the validation cohort.  In the final multivariable model, 3 demographic factors, 27 comorbidities, 40 medications, 8 diagnostic tests and 10 treatments within the first 48 hours were associated with mortality.  The strongest predictors of mortality were early vasopressors (OR 1.79), early non-invasive ventilation (OR 1.59), and early bicarbonate treatment (OR 1.70).  The model had a c-statistic of 0.85 in both the derivation and validation cohorts.  In the validation cohort, deciles of predicted risk ranged from 0.4% to 33.9% with observed risk over the same deciles from 0.1% to 33.4%. 

Conclusions:   A multivariable mortality model based on highly detailed administrative data available during the first 48 hours of hospitalization had good discrimination and calibration.  The model could be used for risk-adjustment in observational studies.

1:45 PM
H-4
(HSP)
Kristian Thorlund, PhD, MSc, McMaster University, Vancouver, BC, Canada, Eric Druyts, MSc, University of British Columbia, Vancouver, BC, Canada and Edward J. Mills, PhD, MSc, University of Ottawa, Vancouver, BC, Canada

Purpose: To methodologically review the published literature on rheumatoid arthritis multiple treatment comparison meta-analysis (MTCs). To identify methodological issues that can explain the substantial discrepancies in the findings of these MTCs.

Methods: We searched MEDLINE for rheumatoid arthritis multiple treatment comparisons. Following the PRISMA guidelines, we extracted a large set of methodological items from the identified reviews. These included, but were not limited to, inclusion/exclusion criteria, information sources (e.g., MEDLINE), choice of efficacy outcomes, approaches to dealing with differing response profiles to available treatments (e.g., DMARD-naïve vs DMARD inadequate response (IR)), approaches to monotherapies versus combination therapies, and approaches to dealing with potential covariate effect modifiers (i.e., sources of heterogeneity).

Results: We identified 13 published MTC, of which 9 were published since 2009. We identified major discrepancies in the estimated treatment effects across MTCs. For example, some treatments with almost identical effect estimates in one MTC could be significantly different in another. We identified major discrepancies in the inclusion of trials, despite highly similar eligibility criteria and literature searches. The number of included trials was typically much smaller than number of eligible trials at the time of publication. Six MTCs included patients of differing response profiled, and 3 of these inappropriately lumped DMARD-naïve and DMARD-IR patients in the analyses.  Eight MTCs included considered both patients mono-therapy and combination therapy (ie, concomitant DMARD), but only 4 adjusted for the potential effect modification of giving concomitant DMARD Approximately half of the identified MTCs did not explore potential sources of heterogeneity. Among those that did, the explored sources were inconsistent. Lastly, most MTC only included one or two efficacy outcomes (e.g., ACR50) and only two considered health related quality of life outcomes (e.g., HAQ and DAS)

Conclusions: Major inconsistencies exist in the findings of published rheumatoid arthritis MTCs. The identified methodological shortcomings and inconsistencies may explain these inconsistencies. Further, there are many lessons to be learned from the identified shortcomings and the previous publications which can potentially strengthen the evidence base on comparative effectiveness between biologics for the treatment of rheumatoid arthritis.

2:00 PM
H-5
(AHE)
M.Z. Sadique, PhD1, Richard Grieve, PhD1, D.A. Harrison, PhD2, Mark Jit, PhD3, Elizabeth Allen, PhD4 and K. Rowan, PhD2, (1)London School of Hygiene and Tropical Medicine, London, United Kingdom, (2)Intensive Care National Audit & Research Centre, London, United Kingdom, (3)Health Protection Agency, London, United Kingdom, (4)London School of Hygiene & Tropical Medicine, London, United Kingdom

Purpose: Health care interventions are often targeted using risk prediction models. However, there is a lack of work, that both develops and evaluates the cost-effectiveness of alternative risk prediction strategies, within a single study. This paper develops new risk prediction models, and evaluates whether using the risk models in prevention strategies is cost-effective. We illustrate this approach in the Fungal Infection Risk Evaluation (FIRE) study, which developed and validated risk models to identify non-neutropenic, critically ill adult patients at high risk of invasive fungal disease (IFD).

Method: A decision-analytical model was developed to compare alternative strategies to prevent IFD. The alternative prevention strategies, comprised assessment according to predicted risk of IFD at up to three decision time points (critical care admission, after 24 hours, end of day 3), with antifungal prophylaxis for those judged ‘high’ risk according to three thresholds, versus no formal risk assessment or prophylaxis, which is UK current practice. Data on risk factors were available for 54,289 eligible admissions to 96 UK adult, general critical care units. Risk models were developed and validated to predict the risk of IFD before hospital discharge. The decision model was populated with estimates of positive predictive value (PPV) and negative predictive value (NPV) from the best fitting risk model at each time point. Estimates of the effectiveness of antifungal prophylaxis were taken from a systematic review of published RCTs. We projected lifetime cost-effectiveness and the value of further information for groups of parameters (VOPPI).

Result: The baseline risk of IFD was low (0.4%). The best fitting prognostic model, gave PPVs and NPVs that varied across strategies from 0.57%-1.94% and 99.65%-99.95% respectively. Incremental Quality-Adjusted Life Years (QALY) of the risk assessment strategies compared with current practice were positive but small, versus incremental costs. Current practice was the strategy with the highest probability of being cost-effectiveness (between 40%-80%). The VOPPIs were relatively high for PPV or NPV (£4m-£13m) and QALYs (£4m-£12m).

Conclusion: It is effective but not cost-effective to formally assess the risk of IFD for non-neutropenic, critically ill adult patients, but the value of further research is high. This integrated approach to developing, and evaluating risk models within the same study is useful for informing clinical practice and future research investment. Grant Acknowledgement: NIHR Health Technology Assessment Programme

2:15 PM
H-6
(HSP)
Su-Hsin Chang, PhD1, Carolyn R.T. Stoll, MPH, MSW1, Jihyun Song, PhD1, Esteban J. Varela, MD2, Christopher J. Eagon, MD2 and Graham A. Colditz, MD, DrPH1, (1)Division of Public Health Sciences, Washington University School of Medicine, St. Louis, MO, (2)Division of General Surgery, Washington University School of Medicine, St. Louis, MO

Purpose: To examine and generalize the risks and effectiveness of bariatric surgery using updated data and sophisticated meta-analysis techniques to compare different types of surgery.

Method: This study was conducted according to the established guidelines for meta-analysis. Surgery types considered were Roux-en-Y gastric bypass (RYGB), laparoscopic adjustable gastric banding (LAGB), vertical banded gastroplasty (VBG), and sleeve gastrectomy (SG).  Literature searches of Medline, Embase, Scopus, Current Contents, Cochrane Library, and the Clinicaltrials.gov databases between 2003 and 2012 were performed. Articles were screened for both exclusion and inclusion criteria before data extraction occurred. A mixed treatment comparison meta-analysis was conducted for body mass index (BMI) change to take advantage of data reported at different study time points. For the other surgical outcomes – operative mortality, complication, reoperation rates, and percentage of remission of the obesity-attributable comorbidities, both Bayesian hierarchical models and meta-analysis of rare binary event data were used because the number of zero cells for such data is large.

Result: Peri- (< 30 days) and post-operative (≥ 30 days) mortality rates were 17 and 31 deaths out of 10,000 patients, respectively. Complication rates were 16% and 11% for randomized trials (RCTs) and observational studies (OBs), respectively. Reoperation rates were 7.6% (RCTs) and 5.9% (OBs). RYGB had the lowest peri-operative mortality and reoperation rates. LAGB had the lowest post-operative mortality and complication rates. The first 3-year post-surgery BMI loss, in general, were 16, 13, and 13 kg/m2 (approximately 36%, 29%, 29% BMI loss for an individual with a pre-surgery BMI of 45 kg/m2). RYGB was the most effective in terms of weight loss (Figure 1), followed by SG, VBG, and LAGB. Remission rates of the obesity comorbidities were high: type 2 diabetes – 92% for RCTs and 86% for OBs; hypertension – 74% for RCTs and 69% for OBs; and dyslipidemia – 76% for RCTs and 56% for OBs. Effectiveness of the various types of surgery in improving comorbidities correspond with their effectiveness in weight loss.

Conclusion: This study provides evidence suggesting that the mortality risk of bariatric surgery is low. It is also effective in weight loss and improvement in obesity-related comorbidities. Compared with RYGB, LAGB has lower weight loss efficacy and less effective comorbidity remission outcomes, but also leads to a lower rate of complications.