PS4-11 INTER-OBSERVER AGREEMENT IN LUNG SOUND CLASSIFICATION AIDED BY VISUAL REPRESENTATION OF THE SOUNDS

Tuesday, June 14, 2016
Exhibition Space (30 Euston Square)
Poster Board # PS4-11

Juan Carlos Aviles Solis, MD, General Practice Reseach Unit, Tromso¸, Norway, Peder A. Halvorsen, MD, PhD, Department of Community Medicine, UiT - The Arctic University of Norway, Tromsø, Norway and Hasse Melbye, MD, PhD, General Practice Reseach Unit, Tromso, Norway
Purpose: To explore the level of agreement between healthcare professionals classifying lung sounds aided by visual representation of the sounds in the form of spectrograms. We plan to use this method in a large epidemiological study and therefore, we are in the need to explore the reliability of it. 

Method(s): We obtained sound recordings at six different locations of the thorax from seven apparently healthy subjects, and 13 patients with heart or lung disease. We recruited 28 observers; 16 general practitioners from four different countries, four pulmonologists, four Norwegian medical students and an international group of four researchers in the field of lung sounds. Videos of sound spectrograms were presented together with the sounds. On a questionnaire, the observers evaluated each recording for the presence of crackles and wheezes. We analyzed the inter-observer agreement using Fleiss kappa between all of the observers, and in subsamples. Then, we created a reference standard from the answers of the lung sound researchers and compared the answers of each observer against the reference standard using Cohen’s kappa. 

Result(s): The level of agreement between the 28 observers was K=0.38 (95% CI 0.12- 0.63) for wheezes and K=0.41 (CI 0.27 - 0.53) for crackles. The agreement varied between the subsamples. In the two groups of general practitioners from the UK and Norway the kappa values for wheezes were K=0.97 and K=0.59 respectively, and K=0.51 and K=0.58 for crackles, reaching moderate to almost perfect agreement. The mean kappa when comparing each of the observers to the reference standard was K=0.54 for crackles (CI 0.48-0.60), and K=0.67 (CI 0.56-0.78) for wheezes. The members of the subgroups with the highest multi-rater kappa had also the best agreement with the reference standard. All but four observers reached kappa values >0.4 for both crackles and wheezes, when compared to the reference standard.

Conclusion(s): We found mostly moderate to substantial levels of agreement in the classification of lung sounds. Wheezes had higher levels of agreement compared to crackles. The agreement in this method is comparable to those for other clinical observations and therefore usable for our epidemiological study.