Purpose: The lack of conclusive information about disease status (i.e., a gold standard) renders the evaluation of test characteristics of a new diagnostic test problematic. Using an imperfect reference standard to estimate ROC curves can bias estimates of diagnostic accuracy and hence clinical value of the test under evaluation. We sought to evaluate the extent and direction of this bias through simulation.
Method: We simulated values for a continuously scaled reference standard and diagnostic test from multivariate normal distributions for diseased and non-diseased individuals with differing means by disease status. The new diagnostic test values were simulated over a range of correlations to the reference standard from -0.3 to 0.5 (conditional on true disease status). The mean for non-diseased patients was fixed at 0, and the mean for diseased patients was set to 3 for the imperfect reference standard and 2 for the test. Data from the diseased and non-diseased patients were combined, and the simulated reference standard values were used to construct ROC curves of the new test results, to determine the bias introduced by the error in the reference standard, as well as through correlation of the test with the reference.
Result: The true area under the ROC curve (assuming a perfect gold standard) was 0.76. The area under the ROC curve was lowest for a correlation of -0.3 at 0.69. For correlation values less than 0.3, use of an imperfect reference standard to determine disease status biased the ROC downward, and only for a correlation of 0.5 was the area under the curve biased upward (0.80).
Conclusion: In these simulations, only if the diagnostic test was at least moderately correlated with the imperfect reference standard was an upward bias observed. In cases of conditional independence or negative correlation, the impact of utilizing an imperfect reference was to underestimate diagnostic accuracy of the test. If the diagnostic accuracy of a reference standard is known, and normality assumptions hold, correlation estimates between the reference and a diagnostic test for evaluation might provide value in estimating the extent of potential bias introduced to estimates of diagnostic accuracy.
See more of: The 32nd Annual Meeting of the Society for Medical Decision Making