D-3 DOUBLE READING OF MAMMOGRAMS: EFFECTIVELY PAIRING READERS WITH DIVERSE SKILLS TO IMPROVE PERFORMANCE

Monday, October 21, 2013: 3:00 PM
Key Ballroom 8,11,12 (Hilton Baltimore)
Health Services, and Policy Research (HSP)
Candidate for the Lee B. Lusted Student Prize Competition

Peter Ayton, PhD, Marwa Gadala, MASc, Lorenzo Strigini, M.Eng and Andrey Povyakalo, PhD, City University London, London, United Kingdom
Purpose: Double reading is standard practice in breast cancer screening programs in at least 12 countries.  We retrospectively investigated whether its benefits can be increased by forming complementary reader pairs according to indicators of ability, as per published guidelines.

Method: We used data from an independent UK clinical trial where 50 readers each read 180 mammograms - 60 with cancer and 120 normal.  We selected four groups of complementary reader pairs in which a member of a Group A, expected to be more effective, is paired with a member of a Group B, expected to be less effective.  The complementary AB groups are:  (1) high and low experience (recommended by UK NHS), (2) high and low specificity, (3) high and low sensitivity, and (4) high sensitivity and low specificity readers.  For each group, all possible AB double reading pairs were simulated using the OR recall rule.  We compared sensitivities and specificities of these complementary pairs first to those of homogeneous (AA, BB) pairs, and then to each other.  Statistical significance was determined using Welch’s t-test and 95% confidence intervals.  To weigh sensitivity and specificity benefits, ROC curves, Youden’s indices, and positive likelihood ratios were compared.   

Result: Grouping according to sensitivity and according to specificity significantly increased sensitivity by 3.5% (p=0.0009) and 1.7% (p=0.037) respectively, compared to homogeneous pairings, with no significant effects on specificity.  Grouping high sensitivity and low specificity readers produced a sensitivity of 0.918 (95%CI: 0.913, 0.924), significantly higher than all other groups.  Grouping according to experience produced a sensitivity of 0.852 (95%CI: 0.844, 0.859), significantly lower than all other groups, but also the significantly highest specificity, 0.722 (95%CI: 0.707, 0.737).  The bootstrap method for ROC comparisons applied to pAUC (sensitivity >0.8) shows that Group (1)’s most extreme readers have the highest performance (non-significant).  However, Youden’s indices and positive likelihood ratios show significant differences between the complementary groups with Group (1) still being the highest, followed by Group (3). 

Conclusion: Some forms of pairing by complementary ability levels can significantly improve sensitivity, with an insignificant effect on specificity, compared to homogeneous pairings. These preliminary results suggest that pairing by sensitivity yields the best clinical performance, and should be further investigated. Pairing readers simply according to convenience could be significantly less effective.