D-5 MACHINE LEARNING METHODS THAT COMBINE NON-PARAMETRIC MATCHING WITH POST MATCHING COVARIATE ADJUSTMENT: AN APPLICATION TO COST-EFFECTIVENESS ANALYSIS

Tuesday, October 20, 2009: 2:00 PM
Grand Ballroom, Salon 4 (Renaissance Hollywood Hotel)
Jasjeet S. Sekhon, PhD, UC-Berkeley, Berkeley, CA and Richard Grieve, PhD, London School of Hygiene and Tropical Medicine, London, United Kingdom

Purpose: Cost-effectiveness analyses (CEA) often use non-randomised studies (NRS) to compare treatment groups. Baseline covariates may therefore be highly imbalanced. Conventional methods for addressing selection bias assume the treatment selection or response model is correctly specified, but usually the specification is unknown. Instead, we develop machine learning techniques that combine non-parametric methods for both matching and covariate adjustment.

Method: Machine learning using a non-parametric matching method, Genetic Matching, can improve covariate balance in CEA (Sekhon and Grieve 2008). This paper extends machine learning approaches to CEA with covariate imbalances after Genetic Matching.  We use ‘super learning’ (Van der Laan et al, 2007) for post matching bias adjustment. The ‘super learner’ is a computer algorithm which applies a set of candidate adjustment methods or learners (e.g. least squares, spline regression) to different portions of the data (test samples). The ‘super learner’ develops the ‘optimal learner’ by weighting candidate learners according to their relative performance in the remaining data (validation samples). We compare machine learning with propensity score matching. We use CEA of Pulmonary Artery Catherization (PAC) from a RCT (n=1,014) and the corresponding NRS (n=38,000).  Identical measures were recorded across the settings for 40 baseline covariates. We match RCT treated cases to NRS controls using propensity score versus Genetic Matching. The super learning approach finds and applies the optimal covariate adjustment after Genetic Matching. We compare cost-effectiveness from these methods to the RCT.

Result: The RCT reported mean incremental net benefits (INB) for PAC that were not significantly different from zero (λ=£30,000, INB -£3,000 [95% CI -£22,000 to £12,000]). The NRS results differed by method. Following propensity score matching, covariate balance was poor; PAC was associated with increased mortality and negative INBs (corresponding INB, -£60,000). Covariate balance much improved with Genetic Matching (baseline probability of death; p=0.93), but some imbalances remained. The super learner minimised residual biases following Genetic Matching, the INBs were similar to the RCT. Matching RCT and NRS controls gave similar net benefits, suggesting the main identifying assumption holds in this context.

Conclusion: Machine learning provides CEA with flexible methods for matching and post matching adjustment that avoids parametric assumptions. These methods are doubly robust: if either the matching model or the response surface model is correctly specified, the estimates are robust.

Candidate for the Lee B. Lusted Student Prize Competition