L-5 IMPROVING BIOPSY RECOMMENDATIONS FOLLOWING MAMMOGRAPHY USING RANDOM FORESTS

Tuesday, October 22, 2013: 2:30 PM
Key Ballroom 7,9,10 (Hilton Baltimore)
Quantitative Methods and Theoretical Developments (MET)
Candidate for the Lee B. Lusted Student Prize Competition

Joseph F. Levy1, David J. Vanness, Ph.D.2, Yirong Wu, PhD1 and Elizabeth S. Burnside, MD, MPH, MS1, (1)University of Wisconsin-Madison, Madison, WI, (2)Department of Population Health Sciences, Madison, WI

Purpose: To optimize recommendations for biopsy after mammography using random forests and maximum expected utility.   

Methods: We used a dataset of 62,219 mammographic findings matched with cancer registry data to construct a random forest estimating the probability that each finding is malignant (positive).  Random forests consist of an ensemble of classification trees constructed using randomly resampled data and randomly selected subsets of predictor variables and tuned to improve out-of-sample prediction.  We used patient demographic risk factors, radiologist-observed standardized descriptors using the Breast Imaging-Reporting and Data System (BI-RADS) lexicon, radiologist subjective opinion (BI-RADS category 0-5, indicating increasing likelihood of malignancy) and the eventual outcomes (benign/malignant) of the finding to recursively partition the data into groups with different probabilities of malignancy.  We applied previously reported estimates of utilities associated with false positives, true positives and false negatives (relative to true negative) to calculate expected utility associated with different thresholds and used the threshold that maximizes expected utility to determine the “optimal” random forest.

Results: ROC curves were constructed from the BI-RADS categories assigned by the radiologists and the predicted malignancy probabilities of the random forest (Figure 1). The radiologists operating point is regularly considered at the BI-RADS category 3 corresponding to a threshold above which biopsy would be recommended (approximately 2% likelihood of malignancy). The random forest improved AUC overall (0.948 vs. 0.935), comparing the forest at the 2% classification threshold, improved sensitivity (85.4% vs. 85.3%) and specificity (97.55% vs. 88.1%). When considering maximum expected utility, the optimal threshold of predicted malignancy by the forest was 0.4%, altering sensitivity and specificity to 88.6% and 96.3% respectively.

Conclusion: Random forests have the potential to improve the accuracy of biopsy recommendations over standard practice.  When accounting for the relative consequences of true and false positives and negatives, the threshold for recommending biopsy using a random forest differs from regular threshold used by radiologists.