PS4-54 A DATA-GENERATED MARGIN-OF-ERROR TO DECIDE WHEN TWO MEASUREMENTS AGREE

Wednesday, October 21, 2015
Grand Ballroom EH (Hyatt Regency St. Louis at the Arch)
Poster Board # PS4-54

Hongsheng Wu, PhD, Wentworth Institute of Technology, Boston, MA, Timothy Moore, MD, Department of Radiology at the University of Nebraska Medical Center, Omaha, NE, Alan Erickson, MD, Division of Rheumatology & Immunology University of Nebraska Medical Center, Omaha, NE and Robert Lew, PhD, Department of Veterans Affairs, Boston, MA
Purpose: We repeat a measurement or ask two clinicians to assess the same subject twice to detect disagreement and bias.  If exact agreement between readers is too stringent and we lack a pre-specified margin-of-error (MoE) for agreement, we propose a data-based MoE to mark where agreement ends and disagreement begins.

Method: Estimated from an analysis of variance, the MoE was applied to 125 pairs of Sharp score ratings of x-ray images from the study comparing rheumatoid arthritis therapies.   The Sharp score counts narrowing and erosions in joints of the hands and feet.  Using a square-root transform, we extended the MoE to deal with outliers.  We computed kappa statistics (K) defining agreement as a paired difference <= MoE.

Result:

Only 10% of 125 paired ratings exactly agreed (K = 0.02).   Defining agreement as Sharp score differences <=3 points, 57% of the pairs agreed (K = 0.38).   The MoE was 7.8.  With agreement defined as differences <=7 points, 74% agreed (K = 0.63).   Using the MoE to select discordant pairs, we found systematic differences between x-ray readers and refined consensus training.    The table below shows the extended MoE damped outliers and gave more reasonable results than MoE.

X-ray reader differences, D, for wrist, hand, erosions, narrowing, and total Sharp score.  

Grouping

D:Mean

(D)

Extended MoE

MoE

Wrist

          0.6

2.6

1.4

3.5

Hand

1.0

3.6

3.2

5.2

Erosions

0.8

3.5

2.9

4.6

Narrowing

          1.0

4.5

4.0

6.2

Total

1.8

5.2

7.3

7.8

Conclusion: The data-generated MoE identifies a threshold for agreement, identifies discordant pairs, and rescales data to produce a more easily interpreted kappa statistic.   MoE applies to nearly any unstable or imprecise paired measurements.