MODELING CROSSWALK FUNCTIONS: A BAYESIAN BETA REGRESSION APPROACH

Tuesday, October 26, 2010
Sheraton Hall E/F (Sheraton Centre Toronto Hotel)
David J. Vanness, Ph.D., Department of Population Health Sciences, Madison, WI and Janel Hanmer, PhD, University of Wisconsin, Madison, WI

Purpose: This study introduces Bayesian beta regression as a method for constructing crosswalk functions for continuous bounded dependent variables with possible ceiling and floor effects.

Method: Crosswalk functions are often used to map individual health status measures to health utilities.  Crosswalks based on ordinary least squares (OLS) can produce predictions outside the natural bounds of health utilities.  Furthermore, since the conditional variance of bounded continuous variables must decrease as the bounds are approached, health utilities are inherently heteroskedastic, thus violating the assumptions for classical regression.  Beta regression (Kieschnick and McCullough, 2003; Ferrari and Cribari-Neto 2004) is a method akin to generalized linear models (GLM) that requires specification of a distribution (Beta) for the dependent variable and link functions relating individual covariates to the conditional mean and variance of the dependent variable.  A variable y~Beta(a,b) has range (0,1), mean a/(a+b) and variance ab/((a+b)^2(a+b+1)).  Bounds other than zero or one can be accommodated by simple transformation.  In Beta regression, the conditional variance is a quadratic function of the conditional mean, thus reflecting the heteroskedasticity inherent with bounded variables.  Floor and ceiling effects can be modeled using a two-part specification where the first part is an ordered categorical model.   We examine the performance of posterior mean, median and mode estimates from two-part Bayesian beta regression against OLS, Tobit and censored least absolute deviations (CLAD) in constructing a crosswalk from the SF-12 to the EuroQol EQ-5D using the 2000 and 2002 Medical Expenditure Panel Survey (MEPS).

Result: Qualitatively, the posterior mean, posterior median and CLAD estimates were closest in reproducing the tri-modal density of actual EQ-5D values.   In both the estimation and validation datasets, mean absolute prediction error was lowest for the posterior median, followed by CLAD.  Mean squared prediction error was lowest for OLS followed by the posterior mean. 

Conclusion: Bayesian beta regression is a promising approach to modeling continuous interval measures such as those used in crosswalk functions.  The posterior median and mean compare favorably to CLAD, OLS and Tobit as point estimates for use in crosswalk functions, while the posterior predictive distribution can reflect both first- and second-order individual uncertainty, a feature useful for inclusion in microsimulations.