EVALUATION OF ALGORITHMS TO IDENTIFY BREAST CANCER CASES IN MEDICARE CLAIMS DATA

Monday, 24 October 2005
24

EVALUATION OF ALGORITHMS TO IDENTIFY BREAST CANCER CASES IN MEDICARE CLAIMS DATA

Heather Taffet Gold, PhD and Huong T. Do, MA. Weill Cornell Medical College, New York, NY

Purpose. To test the generalizability of published algorithms designed to identify incident breast cancer cases in Medicare claims data for conducting analyses of healthcare service utilization.

Methods/Data. We use the Surveillance, Epidemiology, and End Results (SEER) registry data linked with Medicare physician, hospital, and outpatient claims data for breast cancer cases in 1998 and a 5% control sample of Medicare beneficiaries in SEER areas (n≈70,000; 13% cases). SEER is the gold standard for case identification. Each algorithm uses different combinations of diagnosis and procedure codes to classify cases. We apply three algorithms to our data and evaluate the sensitivity and specificity of each compared to reported results from earlier data used for algorithm development. We compare algorithm performance by age, stage, race, and SEER region, and test via logistic regression whether adding demographic variables to the algorithm improves area under the receiver operative characteristic (ROC) curve.

Results. Sensitivity of two of three algorithms applied to the data is significantly lower than that obtained by the algorithm developers. However, sensitivity decreases as age increases (range for 80+yo: 51.4-74.4%). Sensitivity is lower for cases with in situ and metastatic disease compared to Stage 1 or 2 disease. There also is substantial variation by SEER registry, but differences by race are insignificant. Specificity overall is about the same. Adding age, region, and race variables to the algorithm significantly improve the ROC area (p<0.0001).

Algorithm Source

Sensitivity (%)

Specificity (%)

Reported

1998

Reported

1998

Nattinger, HSR (1994-5 data)

80.11-80.26

77.4*

99.95

99.9*

Warren, MedCare (1992 data)

62.0

73.7*

99.9

99.7*

Freeman, JClinEpi (1992 data)

90.0

59.0*

99.86

100*

*p<0.0001 for equality of rates

Conclusions. Algorithm sensitivity is lower for the 1998 data, indicating that published algorithms may need to be updated due to changing patient characteristics or patterns of care. Differential sensitivity by SEER region likely reflects geographic variation in practice patterns. Depending on the algorithm, 3-5% of subjects are misclassified in 1998, with false negatives highest in Freeman's algorithm and lowest using Nattinger's method. Misclassification disproportionately affects older women and those diagnosed with in situ, metastatic, or unknown-stage disease. Algorithms should be applied cautiously to insurance claims databases to assess healthcare utilization and costs of breast cancer care outside SEER-Medicare populations because of misclassification bias.

See more of Poster Session III
See more of The 27th Annual Meeting of the Society for Medical Decision Making (October 21-24, 2005)

Algorithm Source	Sensitivity (%)		Specificity (%)
Algorithm Source	Reported	1998	Reported	1998
Nattinger, HSR (1994-5 data)	80.11-80.26	77.4*	99.95	99.9*
Warren, MedCare (1992 data)	62.0	73.7*	99.9	99.7*
Freeman, JClinEpi (1992 data)	90.0	59.0*	99.86	100*
*p<0.0001 for equality of rates

Monday, 24 October 200524

EVALUATION OF ALGORITHMS TO IDENTIFY BREAST CANCER CASES IN MEDICARE CLAIMS DATA

Monday, 24 October 2005
24