During a ten-month period, 511 patients referred to the melanoma unit of the Department of Dermatology were examined by an expert dermatologist and five other physicians with various levels of training. The latter group had a neural network-based decision support system at their disposal, which they could use at their discretion. Every patient was examined by the expert dermatologist, and by two of the computer-supported other physicians. The system gave a probability assessment of the malignancy of the lesion being examined, and a dichotomous suggestion whether or not to excise the lesion. The main study questions were: Is there a difference in benefit between more and less experienced physicians? What is the diagnostic performance of these physicians, compared to an expert who does not use the system?
After removal of incomplete data, there were 31 patients with melanomas and 414 patients without melanomas in the study, for a total of 31 melanomas and 3244 benign lesions. The non-expert physicians missed 3 of 31 melanomas altogether, and on the lesions were they did consult the computer, they achieved an overall sensitivity of 84%. The overall specificity of all five physicians was 80%. The computer system by itself achieved a sensitivity of 75%, and a specificity of 96%. The expert dermatologist, without computer support, achieved values of 96% sensitivity and 66% specificity. When stratifying the physicians according to experience, the less experienced physicians obtained values of 81% sensitivity and 79% specificity, and the more experienced physicians values of 86% sensitivity and 85% specificity.
Conclusion: There is no significant difference between the sensitivities of experienced and inexperienced physicians; the difference in specificities between these two groups is of borderline significance (p = 0.05). There is, however, a tradeoff and resulting large discrepancy between the computer users and the expert dermatologist both in sensitivity (p < 0.1) and specificity (p < 0.00001). This difference can be explained by different decision thresholds in the two groups.