Plus Disease in Retinopathy of Prematurity: Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis

Imaging and Informatics in Retinopathy of Prematurity Research Consortium

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Purpose To determine expert agreement on relative retinopathy of prematurity (ROP) disease severity and whether computer-based image analysis can model relative disease severity, and to propose consideration of a more continuous severity score for ROP. Design We developed 2 databases of clinical images of varying disease severity (100 images and 34 images) as part of the Imaging and Informatics in ROP (i-ROP) cohort study and recruited expert physician, nonexpert physician, and nonphysician graders to classify and perform pairwise comparisons on both databases. Participants Six participating expert ROP clinician-scientists, each with a minimum of 10 years of clinical ROP experience and 5 ROP publications, and 5 image graders (3 physicians and 2 nonphysician graders) who analyzed images that were obtained during routine ROP screening in neonatal intensive care units. Methods Images in both databases were ranked by average disease classification (classification ranking), by pairwise comparison using the Elo rating method (comparison ranking), and by correlation with the i-ROP computer-based image analysis system. Main Outcome Measures Interexpert agreement (weighted κ statistic) compared with the correlation coefficient (CC) between experts on pairwise comparisons and correlation between expert rankings and computer-based image analysis modeling. Results There was variable interexpert agreement on diagnostic classification of disease (plus, preplus, or normal) among the 6 experts (mean weighted κ, 0.27; range, 0.06–0.63), but good correlation between experts on comparison ranking of disease severity (mean CC, 0.84; range, 0.74–0.93) on the set of 34 images. Comparison ranking provided a severity ranking that was in good agreement with ranking obtained by classification ranking (CC, 0.92). Comparison ranking on the larger dataset by both expert and nonexpert graders demonstrated good correlation (mean CC, 0.97; range, 0.95–0.98). The i-ROP system was able to model this continuous severity with good correlation (CC, 0.86). Conclusions Experts diagnose plus disease on a continuum, with poor absolute agreement on classification but good relative agreement on disease severity. These results suggest that the use of pairwise rankings and a continuous severity score, such as that provided by the i-ROP system, may improve agreement on disease severity in the future.

Original languageEnglish (US)
Pages (from-to)2345-2351
Number of pages7
JournalOphthalmology
Volume123
Issue number11
DOIs
StatePublished - Nov 1 2016

Fingerprint

Retinopathy of Prematurity
Informatics
Databases
Physicians
Neonatal Intensive Care Units
Publications
Cohort Studies
Outcome Assessment (Health Care)

ASJC Scopus subject areas

  • Ophthalmology

Cite this

Plus Disease in Retinopathy of Prematurity : Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis. / Imaging and Informatics in Retinopathy of Prematurity Research Consortium.

In: Ophthalmology, Vol. 123, No. 11, 01.11.2016, p. 2345-2351.

Research output: Contribution to journalArticle

Imaging and Informatics in Retinopathy of Prematurity Research Consortium. / Plus Disease in Retinopathy of Prematurity : Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis. In: Ophthalmology. 2016 ; Vol. 123, No. 11. pp. 2345-2351.
@article{53353b2c95be40e2a2d6f8bc87d5b880,
title = "Plus Disease in Retinopathy of Prematurity: Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis",
abstract = "Purpose To determine expert agreement on relative retinopathy of prematurity (ROP) disease severity and whether computer-based image analysis can model relative disease severity, and to propose consideration of a more continuous severity score for ROP. Design We developed 2 databases of clinical images of varying disease severity (100 images and 34 images) as part of the Imaging and Informatics in ROP (i-ROP) cohort study and recruited expert physician, nonexpert physician, and nonphysician graders to classify and perform pairwise comparisons on both databases. Participants Six participating expert ROP clinician-scientists, each with a minimum of 10 years of clinical ROP experience and 5 ROP publications, and 5 image graders (3 physicians and 2 nonphysician graders) who analyzed images that were obtained during routine ROP screening in neonatal intensive care units. Methods Images in both databases were ranked by average disease classification (classification ranking), by pairwise comparison using the Elo rating method (comparison ranking), and by correlation with the i-ROP computer-based image analysis system. Main Outcome Measures Interexpert agreement (weighted κ statistic) compared with the correlation coefficient (CC) between experts on pairwise comparisons and correlation between expert rankings and computer-based image analysis modeling. Results There was variable interexpert agreement on diagnostic classification of disease (plus, preplus, or normal) among the 6 experts (mean weighted κ, 0.27; range, 0.06–0.63), but good correlation between experts on comparison ranking of disease severity (mean CC, 0.84; range, 0.74–0.93) on the set of 34 images. Comparison ranking provided a severity ranking that was in good agreement with ranking obtained by classification ranking (CC, 0.92). Comparison ranking on the larger dataset by both expert and nonexpert graders demonstrated good correlation (mean CC, 0.97; range, 0.95–0.98). The i-ROP system was able to model this continuous severity with good correlation (CC, 0.86). Conclusions Experts diagnose plus disease on a continuum, with poor absolute agreement on classification but good relative agreement on disease severity. These results suggest that the use of pairwise rankings and a continuous severity score, such as that provided by the i-ROP system, may improve agreement on disease severity in the future.",
author = "{Imaging and Informatics in Retinopathy of Prematurity Research Consortium} and Jayashree Kalpathy-Cramer and Campbell, {J. Peter} and Deniz Erdogmus and Peng Tian and Dharanish Kedarisetti and Chace Moleta and Reynolds, {James D.} and Kelly Hutcheson and Shapiro, {Michael J.} and Repka, {Michael X.} and Philip Ferrone and Kimberly Drenser and Jason Horowitz and Kemal Sonmez and Ryan Swan and Susan Ostmo and Jonas, {Karyn E.} and Chan, {R. V Paul} and Chiang, {Michael F.} and Chiang, {Michael F.} and Susan Ostmo and Kemal Sonmez and Campbell, {J. Peter} and Chan, {R. V Paul} and Karyn Jonas and Jason Horowitz and Osode Coki and Eccles, {Cheryl Ann} and Leora Sarna and Audina Berrocal and Catherin Negron and Kimberly Denser and Kristi Cumming and Tammy Osentoski and Tammy Check and Mary Zajechowski and Thomas Lee and Evan Kruger and Kathryn McGovern and Charles Simmons and Raghu Murthy and Sharon Galvis and Jerome Rotter and Ida Chen and Xiaohui Li and Kent Taylor and Kaye Roll and Jayashree Kalpathy-Cramer and Deniz Erdogmus and Martinez-Castellanos, {Maria Ana}",
year = "2016",
month = "11",
day = "1",
doi = "10.1016/j.ophtha.2016.07.020",
language = "English (US)",
volume = "123",
pages = "2345--2351",
journal = "Ophthalmology",
issn = "0161-6420",
publisher = "Elsevier Inc.",
number = "11",

}

TY - JOUR

T1 - Plus Disease in Retinopathy of Prematurity

T2 - Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis

AU - Imaging and Informatics in Retinopathy of Prematurity Research Consortium

AU - Kalpathy-Cramer, Jayashree

AU - Campbell, J. Peter

AU - Erdogmus, Deniz

AU - Tian, Peng

AU - Kedarisetti, Dharanish

AU - Moleta, Chace

AU - Reynolds, James D.

AU - Hutcheson, Kelly

AU - Shapiro, Michael J.

AU - Repka, Michael X.

AU - Ferrone, Philip

AU - Drenser, Kimberly

AU - Horowitz, Jason

AU - Sonmez, Kemal

AU - Swan, Ryan

AU - Ostmo, Susan

AU - Jonas, Karyn E.

AU - Chan, R. V Paul

AU - Chiang, Michael F.

AU - Chiang, Michael F.

AU - Ostmo, Susan

AU - Sonmez, Kemal

AU - Campbell, J. Peter

AU - Chan, R. V Paul

AU - Jonas, Karyn

AU - Horowitz, Jason

AU - Coki, Osode

AU - Eccles, Cheryl Ann

AU - Sarna, Leora

AU - Berrocal, Audina

AU - Negron, Catherin

AU - Denser, Kimberly

AU - Cumming, Kristi

AU - Osentoski, Tammy

AU - Check, Tammy

AU - Zajechowski, Mary

AU - Lee, Thomas

AU - Kruger, Evan

AU - McGovern, Kathryn

AU - Simmons, Charles

AU - Murthy, Raghu

AU - Galvis, Sharon

AU - Rotter, Jerome

AU - Chen, Ida

AU - Li, Xiaohui

AU - Taylor, Kent

AU - Roll, Kaye

AU - Kalpathy-Cramer, Jayashree

AU - Erdogmus, Deniz

AU - Martinez-Castellanos, Maria Ana

PY - 2016/11/1

Y1 - 2016/11/1

N2 - Purpose To determine expert agreement on relative retinopathy of prematurity (ROP) disease severity and whether computer-based image analysis can model relative disease severity, and to propose consideration of a more continuous severity score for ROP. Design We developed 2 databases of clinical images of varying disease severity (100 images and 34 images) as part of the Imaging and Informatics in ROP (i-ROP) cohort study and recruited expert physician, nonexpert physician, and nonphysician graders to classify and perform pairwise comparisons on both databases. Participants Six participating expert ROP clinician-scientists, each with a minimum of 10 years of clinical ROP experience and 5 ROP publications, and 5 image graders (3 physicians and 2 nonphysician graders) who analyzed images that were obtained during routine ROP screening in neonatal intensive care units. Methods Images in both databases were ranked by average disease classification (classification ranking), by pairwise comparison using the Elo rating method (comparison ranking), and by correlation with the i-ROP computer-based image analysis system. Main Outcome Measures Interexpert agreement (weighted κ statistic) compared with the correlation coefficient (CC) between experts on pairwise comparisons and correlation between expert rankings and computer-based image analysis modeling. Results There was variable interexpert agreement on diagnostic classification of disease (plus, preplus, or normal) among the 6 experts (mean weighted κ, 0.27; range, 0.06–0.63), but good correlation between experts on comparison ranking of disease severity (mean CC, 0.84; range, 0.74–0.93) on the set of 34 images. Comparison ranking provided a severity ranking that was in good agreement with ranking obtained by classification ranking (CC, 0.92). Comparison ranking on the larger dataset by both expert and nonexpert graders demonstrated good correlation (mean CC, 0.97; range, 0.95–0.98). The i-ROP system was able to model this continuous severity with good correlation (CC, 0.86). Conclusions Experts diagnose plus disease on a continuum, with poor absolute agreement on classification but good relative agreement on disease severity. These results suggest that the use of pairwise rankings and a continuous severity score, such as that provided by the i-ROP system, may improve agreement on disease severity in the future.

AB - Purpose To determine expert agreement on relative retinopathy of prematurity (ROP) disease severity and whether computer-based image analysis can model relative disease severity, and to propose consideration of a more continuous severity score for ROP. Design We developed 2 databases of clinical images of varying disease severity (100 images and 34 images) as part of the Imaging and Informatics in ROP (i-ROP) cohort study and recruited expert physician, nonexpert physician, and nonphysician graders to classify and perform pairwise comparisons on both databases. Participants Six participating expert ROP clinician-scientists, each with a minimum of 10 years of clinical ROP experience and 5 ROP publications, and 5 image graders (3 physicians and 2 nonphysician graders) who analyzed images that were obtained during routine ROP screening in neonatal intensive care units. Methods Images in both databases were ranked by average disease classification (classification ranking), by pairwise comparison using the Elo rating method (comparison ranking), and by correlation with the i-ROP computer-based image analysis system. Main Outcome Measures Interexpert agreement (weighted κ statistic) compared with the correlation coefficient (CC) between experts on pairwise comparisons and correlation between expert rankings and computer-based image analysis modeling. Results There was variable interexpert agreement on diagnostic classification of disease (plus, preplus, or normal) among the 6 experts (mean weighted κ, 0.27; range, 0.06–0.63), but good correlation between experts on comparison ranking of disease severity (mean CC, 0.84; range, 0.74–0.93) on the set of 34 images. Comparison ranking provided a severity ranking that was in good agreement with ranking obtained by classification ranking (CC, 0.92). Comparison ranking on the larger dataset by both expert and nonexpert graders demonstrated good correlation (mean CC, 0.97; range, 0.95–0.98). The i-ROP system was able to model this continuous severity with good correlation (CC, 0.86). Conclusions Experts diagnose plus disease on a continuum, with poor absolute agreement on classification but good relative agreement on disease severity. These results suggest that the use of pairwise rankings and a continuous severity score, such as that provided by the i-ROP system, may improve agreement on disease severity in the future.

UR - http://www.scopus.com/inward/record.url?scp=84994123709&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994123709&partnerID=8YFLogxK

U2 - 10.1016/j.ophtha.2016.07.020

DO - 10.1016/j.ophtha.2016.07.020

M3 - Article

C2 - 27566853

AN - SCOPUS:84994123709

VL - 123

SP - 2345

EP - 2351

JO - Ophthalmology

JF - Ophthalmology

SN - 0161-6420

IS - 11

ER -