PS4-55 CAUSAL INFERENCE USING AGENT-BASED MODELS AND THE PARAMETRIC G-FORMULA

Wednesday, October 21, 2015
Grand Ballroom EH (Hyatt Regency St. Louis at the Arch)
Poster Board # PS4-55

Eleanor Murray, ScD candidate1, James Robins, MD1, George R. Seage, DSc1, Kenneth Freedberg, MD, MSc2 and Miguel Hernan, MD1, (1)Harvard TH Chan School of Public Health, Boston, MA, (2)Massachusetts General Hospital, Boston, MA

Purpose: For evidence-based medical decisions to be sound, clinicians need to estimate outcome distributions under different candidate strategies. In the absence of randomized trials, two possible approaches are agent-based models (ABM) and the parametric g-formula. ABMs and the g-formula use a similar mathematical approach, but differ in required assumptions. The parametric g-formula requires strong identifiability and modeling assumptions and is limited to the cohort of interest, but is agnostic about the presence of unknown variables that do not confound the effect of interest. By making assumptions regarding those unknown variables, ABMs can be used to estimate effects across populations. However substantial bias can arise in ABMs when those assumptions are incorrect. We describe this potential source of bias when using ABMs for causal inference.

Methods: We describe a simplified example of three cohorts of 100,000 HIV-infected individuals which differ only in the prevalence of an unknown variable associated with mortality and a known time-varying confounder. We used an ABM and the parametric g-formula to obtain estimates of 12-month mortality in each cohort under three strategies when the treatment is ineffective: treat with probability observed in the data (natural course); continuous treatment; and no treatment. We compare results obtained from the ABM when probability inputs are obtained directly from the population of interest to the ABM when inputs are estimated from other sources.

Results: The parametric g-formula correctly estimated the probability of mortality in each of the cohorts under all three treatment strategies (Table). The ABM was highly sensitive to the source of input probabilities, and correctly estimated mortality only when all probabilities were estimated directly from the cohort of interest. The ABM estimates for high-risk and low-risk cohorts was most biased when all inputs were from the base-case cohort (Table), but some bias remained when only the direct effect of treatment was estimated from the base-case (natural course risk: 20.85% in high-risk, 5.60% in low-risk).

Conclusions:  Using ABMs to estimate effects in a particular population may result in biased estimates when the inputs for the ABM are obtained from another population, even if the causal network linking all variables in the ABM is identical in both populations. The parametric g-formula can provide unbiased estimates for both populations, but also requires data from both.