INTER-OBSERVER AGREEMENT IN LUNG SOUND CLASSIFICATION AIDED BY VISUAL REPRESENTATION OF THE SOUNDS
Method(s): We obtained sound recordings at six different locations of the thorax from seven apparently healthy subjects, and 13 patients with heart or lung disease. We recruited 28 observers; 16 general practitioners from four different countries, four pulmonologists, four Norwegian medical students and an international group of four researchers in the field of lung sounds. Videos of sound spectrograms were presented together with the sounds. On a questionnaire, the observers evaluated each recording for the presence of crackles and wheezes. We analyzed the inter-observer agreement using Fleiss kappa between all of the observers, and in subsamples. Then, we created a reference standard from the answers of the lung sound researchers and compared the answers of each observer against the reference standard using Cohen’s kappa.
Result(s): The level of agreement between the 28 observers was K=0.38 (95% CI 0.12- 0.63) for wheezes and K=0.41 (CI 0.27 - 0.53) for crackles. The agreement varied between the subsamples. In the two groups of general practitioners from the UK and Norway the kappa values for wheezes were K=0.97 and K=0.59 respectively, and K=0.51 and K=0.58 for crackles, reaching moderate to almost perfect agreement. The mean kappa when comparing each of the observers to the reference standard was K=0.54 for crackles (CI 0.48-0.60), and K=0.67 (CI 0.56-0.78) for wheezes. The members of the subgroups with the highest multi-rater kappa had also the best agreement with the reference standard. All but four observers reached kappa values >0.4 for both crackles and wheezes, when compared to the reference standard.