Candidate for the Lee B. Lusted Student Prize Competition
Method: We used data from an independent UK clinical trial where 50 readers each read 180 mammograms - 60 with cancer and 120 normal. We selected four groups of complementary reader pairs in which a member of a Group A, expected to be more effective, is paired with a member of a Group B, expected to be less effective. The complementary AB groups are: (1) high and low experience (recommended by UK NHS), (2) high and low specificity, (3) high and low sensitivity, and (4) high sensitivity and low specificity readers. For each group, all possible AB double reading pairs were simulated using the OR recall rule. We compared sensitivities and specificities of these complementary pairs first to those of homogeneous (AA, BB) pairs, and then to each other. Statistical significance was determined using Welch’s t-test and 95% confidence intervals. To weigh sensitivity and specificity benefits, ROC curves, Youden’s indices, and positive likelihood ratios were compared.
Result: Grouping according to sensitivity and according to specificity significantly increased sensitivity by 3.5% (p=0.0009) and 1.7% (p=0.037) respectively, compared to homogeneous pairings, with no significant effects on specificity. Grouping high sensitivity and low specificity readers produced a sensitivity of 0.918 (95%CI: 0.913, 0.924), significantly higher than all other groups. Grouping according to experience produced a sensitivity of 0.852 (95%CI: 0.844, 0.859), significantly lower than all other groups, but also the significantly highest specificity, 0.722 (95%CI: 0.707, 0.737). The bootstrap method for ROC comparisons applied to pAUC (sensitivity >0.8) shows that Group (1)’s most extreme readers have the highest performance (non-significant). However, Youden’s indices and positive likelihood ratios show significant differences between the complementary groups with Group (1) still being the highest, followed by Group (3).
Conclusion: Some forms of pairing by complementary ability levels can significantly improve sensitivity, with an insignificant effect on specificity, compared to homogeneous pairings. These preliminary results suggest that pairing by sensitivity yields the best clinical performance, and should be further investigated. Pairing readers simply according to convenience could be significantly less effective.