Meeting Brochure and registration form      SMDM Homepage

Monday, October 22, 2007
P2-21

ON DATA QUALITY AND RISK IN GUIDELINE BASED CLINICAL DECISION SUPPORT SYSTEMS

Sharique Hasan, MS and Rema Padman, PhD. Carnegie Mellon University, Pittsburgh, PA

Purpose: The poor quality of data in electronic medical records and databases poses a risk that these systems may introduce unintended errors into the medical-decision making process. Appropriately assessing the magnitude of the risk posed by data quality is an important but difficult problem because the nature of this risk depends on several complex and interrelated factors.

Methods: To analyze the extent of this problem, we propose a novel probabilistic framework that explicitly models a decision-maker's beliefs in the quality of the underlying data, the nature and distribution of this data, the types of errors introduced into data, and how this data is processed by clinical decision support systems to produce guidance. Thus, we model how beliefs about data quality affect the accuracy of the statements that make up a clinical guideline. In particular, using the structure of the ‘Prevention of Breast Cancer' guideline as our template, we model how the guideline executes each of these statements and how likely each set of statements are, given the underlying population of patients. Integrating this information with our empirical estimates of the model parameters, instantiated using national health and demographic data, we calculate the risk of incorrect medical decisions given beliefs about the quality of the underlying data. Furthermore, we provide measures for understanding the cumulative effect of all data elements on risk, and the marginal effect of each data element by computing the partial derivative of the total accuracy function with respect to each data element.

Results: Our example guideline has six data elements, fourteen affirmed and negated statements, and eight paths. It incorporates many elements present in more complex guidelines, using both binary and numeric data, as well as hierarchical data-statement relationships. Application of our framework to this guideline generates a linear relationship between guideline accuracy and data quality. The marginal analysis ranks the six data elements in decreasing order of its value in determining guideline accuracy, resulting in a non-intuitive higher ranking of the temporal variables.

Conclusions: Our framework gives the decision-maker the ability to assess how uncertainty about data quality translates into the risk of negative medical consequences and determine which data elements are most critical for minimizing this risk. These results can inform efficient data-quality improvement and risk minimization strategies.