C-5 COMBINING RANDOM FORESTS AND BAYESIAN GLM FOR ESTIMATION OF HETEROGENEOUS TREATMENT EFFECTS

Thursday, October 18, 2012: 2:30 PM
Regency Ballroom D (Hyatt Regency)
Quantitative Methods and Theoretical Developments (MET)

David J. Vanness, Ph.D., Department of Population Health Sciences, Madison, WI

Purpose: To demonstrate the potential usefulness of a two-stage approach combining machine-learning and Bayesian techniques for the prediction of heterogeneous treatment effects in the presence of a large number of predictors with potential high-order interactions.

Method: 460 patients from the N9741 clinical trial of treatment in advanced colorectal cancer with complete response, toxicity and pharmacogenomic profiles were included.  Survival was imputed for patients alive at last follow-up.  In the first stage, random forest algorithms were used to predict survival separately for each treatment group as a function of age, sex, race (white vs. non-white), prior chemotherapy status and a set of 18 indicator variables containing information about single-nucleotide polymorphisms (SNPs).  The resulting treatment-specific survival scores were included along with treatment assignment indicators in a second stage Bayesian GLM (gamma family, log-link) model predicting survival.  The survival scores were designed to capture complex interactions of each treatment with individual characteristics, including genomic data.  Given the large number of predictors and potential multi-way interactions, direct inclusion of treatment interaction terms would not have been feasible.  Counterfactual simulations were conducted by applying treatment-specific survival scores for treatments not received by each individual to posterior parameter estimates from the Bayesian GLM survival model.

Result: Treatment specific survival score parameter estimates for two of the three treatments were significantly positive at the 95% posterior probability level, strongly suggesting the presence of treatment effect heterogeneity determined by personal characteristics, including genomic profiles.  While overall treatment effect estimates strongly suggested that one regimen was likely to be superior on average, counterfactual simulations predicted that 61 of the 460 patients had at least a 50% chance of benefiting more from one of the other two regimens in terms of expected survival.

Conclusion: A two-stage approach combining random forests and Bayesian GLM was able to identify and estimate treatment effect heterogeneity given set of predictors (and possible interactions) too large to include directly as regression interaction terms.  A subset of patients were identified who were likely to benefit more from a treatment which was not predicted to be the most effective on average.