Candidate for the Lee B. Lusted Student Prize Competition
Biomarkers predict disease risk and are potential intervention targets. They present modeling challenges because they are often: 1) distributed continuously and non-normally; 2) correlated with other individual characteristics; 3) evolving over time with non-linear dependencies on their current levels and covariates; 4) related non-linearly to multiple health outcomes. We develop appropriate estimation procedures and apply them in a microsimulation of hemoglobin and anemia-related outcomes in rural Chinese schoolchildren.
Method:
The microsimulation follows 100,000 4th and 5th graders over 4 years, with transitions estimated from 9 months of follow-up for 12,369 students. We capture the baseline joint distribution of the biomarker (hemoglobin) and covariates by fitting elliptical Gaussian copulas with Gaussian marginals for hemoglobin and Gamma marginals for age, conditional on sex and ethnicity. We sample from it to create our simulated cohort. We predict hemoglobin changes, employing quantile regressions with linear splines of lagged hemogloblin and other covariates. The 1st-99th quantiles conditional on covariates represent the cumulative density function of hemoglobin change from which we repeatedly sample and update to determine individuals’ hemoglobin paths over time. Using a similar specification, we employ multinomial logistic regressions to link hemoglobin to the number of outcomes experienced (diarrhea, fever, chills, etc.) and then estimate the likelihood of each combination of outcomes conditional upon number. We sample from this two-part model to determine individual outcome patterns. We compare our modeled outcomes to the empirical data and to alternative models to assess in-sample accuracy and effects on extrapolation.
Result:
Our model more closely represents the empirical data than alternatives. The copula for baseline covariates provides a better fit than sampling from independent marginal distributions. Quantile-spline modeled hemoglobin changes match follow-up data better than using a least squares model without splines. The multinomial outcomes model better captures correlations across outcomes than using separate logistic regressions. Evaluating the significance of these differences for 4-year extrapolation, we find that: anemia levels in the alternate model are 13-14% lower; likelihoods of individual outcomes differ by 3-10%; the prevalence of children with multiple outcomes is 44% lower; and estimated intervention effects are smaller.
Conclusion:
Many diseases involve continuously evolving biomarkers (cholesterol, HbA1c) with complex relationships to multiple outcomes (cardiovascular disease, diabetes). Constructing microsimulations using the methods we describe is appropriate and feasible.