17CSG IDENTIFYING SIGNIFICANT PREDICTORS OF HIGH HEALTHCARE COSTS IN DIABETES USING A DECISION TREE APPROACH

Wednesday, October 22, 2008
Columbus A-C (Hyatt Regency Penns Landing)
Michael B. Nichol, PhD1, Tara K. Knight, PhD1, Joanne Wu, MS1, Jack Mahoney, MD2 and Christine Berman, PhD2, (1)University of Southern California, Los Angeles, CA, (2)Pitney Bowes, Stamford, CT
Purpose: To identify significant factors associated with future high healthcare costs using a decision tree model in a large U.S. employer’s diabetes population. Method:  For inclusion in the analysis, employees/dependents were required to have a diagnosis code of 250.x or any antidiabetic prescription, as well as two years of continuous health coverage during the 2004-2006 study timeframe.  Using medical and pharmaceutical claims data,  a decision tree model was developed using previous years demographic, clinical, and health care utilization data to predict high healthcare cost (≥$10,000) in 2005 and 2006. Three models were constructed for 2004 predicting 2005 (2004-2005), 2005-2006, and 2004-2006. The data was partitioned into 70% for training and 30% for validation samples for three model years.  Reduction in Gini index (GI) was used to evaluate candidate splitting rules and to search for the best tree node. The measure of variable’s relative importance was produced from SAS Enterprise Miner 5.2.  Significant variables were determined by variable importance, and were compared across model years.  Results: 22% of diabetes patients were classified as high cost.  The important variables for 2004-2005 model included total healthcare cost in 2004 (importance=1.00), number of office visits (importance=0.20), number of drugs taken (importance=0.15), business unit (importance=0.14), years of service (importance=0.12), geographic region (importance=0.12) and peripheral valvular disorders (importance=0.10). The important variables for 2005-2006 model included total healthcare costs in 2005 (importance=1.20), number of drugs taken (importance=1.00), and Elixhauser comorbidities (importance=0.96).  The important variables for 2004-2006 model included total healthcare cost in 2005 (importance=1.00), number of drugs taken (importance=0.18), total healthcare cost in 2004 (importance=0.15), compliance to any antidiabetic medication (importance=0.33), and hypothyroidism (importance=0.08).  Sensitivity ranged from 70% (2004-2006) to 83% (2005-2006), specificity ranged from 65%(2005-2006) to 79%(2004-2005), and misclassification rate ranged from 22%(2004-2005) to 31%(2005-2006) for validation samples across model years. Conclusion:  Although some important predictors varied across model years, an increase in the number of drugs taken and total healthcare costs in prior years were consistently identified as important variables for predicting diabetes patients with future high healthcare costs. These models demonstrated good predictive accuracy and could be used to create a risk scoring system that may be able to target patients who are at high cost risk and who will most benefit from disease management or patient education programs.