Inter-Observer Reliability of Classification of Disease

In research, inter-observer reliability is the degree to which different observers agree on classification, measurement, or choice. If a thing cannot be categorized, identified, or measured reliably, then it cannot be used for diagnosis, treatment, or prognosis. It is important to establish reliability before assessing accuracy or utility.

In the past, studies of reliability would often recruit a few students and trainees to review a large number of images or perform a large number of measurements. We now use large groups of practicing surgeons reviewing smaller amounts of information so that we can keep their interest and get more representative answers. With a large enough group of surgeon-observers, we can also randomize what people are presented with to test the influence on the reliability of choosing between options. 

Radiological grading of wrist osteoarthritis associated with scaphoid nonunion advanced collapse (SNAC) can be difficult. It was proposed that a comparison radiograph of the contralateral healthy wrist and an educational training in the various SNAC stages may improve reliability. However, the use of an additional comparison view and/or training does not seem to improve reliability in a clinically relevant way for SNAC staging (1). There is room for improvement in the way we assess patients with SNAC wrists.

The appearance of early Kienböck disease on radiographs and magnetic resonance imaging (MRI) may be difficult to distinguish from other conditions that affect the lunate. We aimed to assess the inter-observer agreement in the diagnosis of early Kienböck disease when evaluated on different imaging modalities. We found, with the numbers evaluated, a notable but nonsignificant difference in agreement that was in favor of observers who evaluated MRI scans in addition to radiographs compared with radiographs alone (2). More sophisticated imaging did not consistently or substantially improve reliability. Surgeons should be aware that the diagnosis of Kienböck disease in the precollapse stages is not well-defined, as evidenced by the substantial inter-observer variability.

Patients with Madelung deformity exhibit a spectrum of mild to severe deformity and distortion of wrist geometry. It may be difficult to reliably distinguish mild Madelung deformity from normal. One of our studies thus tested the reliability of the diagnosis of mild Madelung deformity on a single posteroanterior (PA) radiograph and found that the diagnosis of mild Madelung on radiographs is neither accurate nor reliable (3). 

One of our studies found that simplification of the Eaton-Glickel (E-G) classification of trapeziometacarpal (TMC) joint arthrosis by eliminating evaluation of the scaphotrapezial (ST) joint slightly, but significantly, improved inter-observer reliability (1). This finding suggests that  simpler classifications that focus on a single anatomical area can be more reliable. Providing clinical information about the patient’s symptoms and examination to observers also marginally improved inter-observer reliability. Patient and surgeon factors were also associated with variation in reliability of classification (4).  

References

  1. Ten Berg PWL, Drijkoningen T, Guitton TG, Ring D. Does a Comparison View Improve the Reliability of Staging Wrist Osteoarthritis? Hand (N Y). 2017 Sep;12(5):439-445. doi: 10.1177/1558944716677541. Epub 2016 Nov 10. PubMed PMID:28832197.
  2. van Leeuwen WF, Janssen SJ, Guitton TG, Chen N, Ring D. Interobserver Agreement in Diagnosing Early-Stage Kienböck Disease on Radiographs and Magnetic Resonance Imaging. Hand (N Y). 2017 Nov;12(6):573-578. doi:10.1177/1558944716677538. Epub 2016 Nov 30. PubMed PMID: 29091489.
  3. Farr S, Guitton TG, Ring D; Science of Variation Group. How Reliable is the Radiographic Diagnosis of Mild Madelung Deformity? J Wrist Surg. 2018 Jul;7(3):227-231. doi: 10.1055/s-0037-1612636. Epub 2017 Dec 14. PubMed PMID: 29922499; PubMed Central PMCID: PMC6005771.
  4. Becker SJ, Bruinsma WE, Guitton TG, van der Horst CM, Strackee SD, Ring D; Science of Variation Group. Interobserver Agreement of the Eaton-Glickel Classification for Trapeziometacarpal and Scaphotrapezial Arthrosis. J Hand Surg Am. 2016 Jan 27. pii: S0363-5023(15)01631-7. doi: 10.1016/j.jhsa.2015.12.028. [Epub ahead of print] PubMed PMID: 26826947.