Multireader, multicase receiver operating characteristic analysis: An empirical comparison of five methods

Nancy A. Obuchowski, Sergey V. Beiden, Kevin S. Berbaum, Stephen L. Hillis, Hemant Ishwaran, Hae Hiang Song, Robert F. Wagner

Research output: Contribution to journalArticle

67 Citations (Scopus)

Abstract

Rationale and objectives Several statistical methods have been developed for analyzing multireader, multicase (MRMC) receiver operating characteristic (ROC) studies. The objective of this article is to increase awareness of these methods and determine if their results are concordant for published datasets. Materials and methods Data from three previously published studies were reanalyzed using five MRMC methods. For each method the 95% confidence intervals (CIs) for the mean of the readers' ROC areas for each diagnostic test, the P value for the comparison of the diagnostic tests' mean accuracies, and the 95% CIs for the mean difference in ROC areas of the diagnostic tests were reported. Results Important differences in P values and CIs were seen when using parametric versus nonparametric estimates of accuracy, and there were the expected differences for random-reader versus fixed-reader models. Controlling for these differences, the Dorfman-Berbaum-Metz (DBM), Obuchowski-Rockette, Beiden-Wagner-Campbell, and Song's multivariate Wilcoxon-Mann-Whitney (WMW) methods gave almost identical results for the fixed-reader model. For the random-reader model, the DBM, Obuchowski-Rockette, and Beiden-Wagner-Campbell methods yielded approximately the same inferences, but the CIs for the Beiden-Wagner-Campbell method tend to be broader. Ishwaran's hierarchical ROC sometimes yielded significance not found with other methods. Song's modification of DBM's jack-knifing algorithm sometimes led to different conclusions than the original DBM algorithm. Conclusion In choosing and applying MRMC methods, it is important to recognize: (1) the distinction between random-reader and fixed-reader models, the uncertainties accounted for by each, and thus the level of generalizeability expected from each; (2) assumptions made by the various MRMC methods; and (3) limitations of a five- or six-reader study when the reader variability is great.

Original languageEnglish
Pages (from-to)980-995
Number of pages16
JournalAcademic Radiology
Volume11
Issue number9
DOIs
StatePublished - Sep 1 2004
Externally publishedYes

Fingerprint

ROC Curve
Routine Diagnostic Tests
Confidence Intervals
Music
Uncertainty

Keywords

  • diagnostic accuracy
  • multireader multicase (MRMC) study
  • multireader study
  • Receiver operating characteristic (ROC) curve
  • ROC analysis

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging

Cite this

Obuchowski, N. A., Beiden, S. V., Berbaum, K. S., Hillis, S. L., Ishwaran, H., Song, H. H., & Wagner, R. F. (2004). Multireader, multicase receiver operating characteristic analysis: An empirical comparison of five methods. Academic Radiology, 11(9), 980-995. https://doi.org/10.1016/j.acra.2004.04.014

Multireader, multicase receiver operating characteristic analysis : An empirical comparison of five methods. / Obuchowski, Nancy A.; Beiden, Sergey V.; Berbaum, Kevin S.; Hillis, Stephen L.; Ishwaran, Hemant; Song, Hae Hiang; Wagner, Robert F.

In: Academic Radiology, Vol. 11, No. 9, 01.09.2004, p. 980-995.

Research output: Contribution to journalArticle

Obuchowski, Nancy A. ; Beiden, Sergey V. ; Berbaum, Kevin S. ; Hillis, Stephen L. ; Ishwaran, Hemant ; Song, Hae Hiang ; Wagner, Robert F. / Multireader, multicase receiver operating characteristic analysis : An empirical comparison of five methods. In: Academic Radiology. 2004 ; Vol. 11, No. 9. pp. 980-995.
@article{0787ed42482640a887ef2986b16438b1,
title = "Multireader, multicase receiver operating characteristic analysis: An empirical comparison of five methods",
abstract = "Rationale and objectives Several statistical methods have been developed for analyzing multireader, multicase (MRMC) receiver operating characteristic (ROC) studies. The objective of this article is to increase awareness of these methods and determine if their results are concordant for published datasets. Materials and methods Data from three previously published studies were reanalyzed using five MRMC methods. For each method the 95{\%} confidence intervals (CIs) for the mean of the readers' ROC areas for each diagnostic test, the P value for the comparison of the diagnostic tests' mean accuracies, and the 95{\%} CIs for the mean difference in ROC areas of the diagnostic tests were reported. Results Important differences in P values and CIs were seen when using parametric versus nonparametric estimates of accuracy, and there were the expected differences for random-reader versus fixed-reader models. Controlling for these differences, the Dorfman-Berbaum-Metz (DBM), Obuchowski-Rockette, Beiden-Wagner-Campbell, and Song's multivariate Wilcoxon-Mann-Whitney (WMW) methods gave almost identical results for the fixed-reader model. For the random-reader model, the DBM, Obuchowski-Rockette, and Beiden-Wagner-Campbell methods yielded approximately the same inferences, but the CIs for the Beiden-Wagner-Campbell method tend to be broader. Ishwaran's hierarchical ROC sometimes yielded significance not found with other methods. Song's modification of DBM's jack-knifing algorithm sometimes led to different conclusions than the original DBM algorithm. Conclusion In choosing and applying MRMC methods, it is important to recognize: (1) the distinction between random-reader and fixed-reader models, the uncertainties accounted for by each, and thus the level of generalizeability expected from each; (2) assumptions made by the various MRMC methods; and (3) limitations of a five- or six-reader study when the reader variability is great.",
keywords = "diagnostic accuracy, multireader multicase (MRMC) study, multireader study, Receiver operating characteristic (ROC) curve, ROC analysis",
author = "Obuchowski, {Nancy A.} and Beiden, {Sergey V.} and Berbaum, {Kevin S.} and Hillis, {Stephen L.} and Hemant Ishwaran and Song, {Hae Hiang} and Wagner, {Robert F.}",
year = "2004",
month = "9",
day = "1",
doi = "10.1016/j.acra.2004.04.014",
language = "English",
volume = "11",
pages = "980--995",
journal = "Academic Radiology",
issn = "1076-6332",
publisher = "Elsevier USA",
number = "9",

}

TY - JOUR

T1 - Multireader, multicase receiver operating characteristic analysis

T2 - An empirical comparison of five methods

AU - Obuchowski, Nancy A.

AU - Beiden, Sergey V.

AU - Berbaum, Kevin S.

AU - Hillis, Stephen L.

AU - Ishwaran, Hemant

AU - Song, Hae Hiang

AU - Wagner, Robert F.

PY - 2004/9/1

Y1 - 2004/9/1

N2 - Rationale and objectives Several statistical methods have been developed for analyzing multireader, multicase (MRMC) receiver operating characteristic (ROC) studies. The objective of this article is to increase awareness of these methods and determine if their results are concordant for published datasets. Materials and methods Data from three previously published studies were reanalyzed using five MRMC methods. For each method the 95% confidence intervals (CIs) for the mean of the readers' ROC areas for each diagnostic test, the P value for the comparison of the diagnostic tests' mean accuracies, and the 95% CIs for the mean difference in ROC areas of the diagnostic tests were reported. Results Important differences in P values and CIs were seen when using parametric versus nonparametric estimates of accuracy, and there were the expected differences for random-reader versus fixed-reader models. Controlling for these differences, the Dorfman-Berbaum-Metz (DBM), Obuchowski-Rockette, Beiden-Wagner-Campbell, and Song's multivariate Wilcoxon-Mann-Whitney (WMW) methods gave almost identical results for the fixed-reader model. For the random-reader model, the DBM, Obuchowski-Rockette, and Beiden-Wagner-Campbell methods yielded approximately the same inferences, but the CIs for the Beiden-Wagner-Campbell method tend to be broader. Ishwaran's hierarchical ROC sometimes yielded significance not found with other methods. Song's modification of DBM's jack-knifing algorithm sometimes led to different conclusions than the original DBM algorithm. Conclusion In choosing and applying MRMC methods, it is important to recognize: (1) the distinction between random-reader and fixed-reader models, the uncertainties accounted for by each, and thus the level of generalizeability expected from each; (2) assumptions made by the various MRMC methods; and (3) limitations of a five- or six-reader study when the reader variability is great.

AB - Rationale and objectives Several statistical methods have been developed for analyzing multireader, multicase (MRMC) receiver operating characteristic (ROC) studies. The objective of this article is to increase awareness of these methods and determine if their results are concordant for published datasets. Materials and methods Data from three previously published studies were reanalyzed using five MRMC methods. For each method the 95% confidence intervals (CIs) for the mean of the readers' ROC areas for each diagnostic test, the P value for the comparison of the diagnostic tests' mean accuracies, and the 95% CIs for the mean difference in ROC areas of the diagnostic tests were reported. Results Important differences in P values and CIs were seen when using parametric versus nonparametric estimates of accuracy, and there were the expected differences for random-reader versus fixed-reader models. Controlling for these differences, the Dorfman-Berbaum-Metz (DBM), Obuchowski-Rockette, Beiden-Wagner-Campbell, and Song's multivariate Wilcoxon-Mann-Whitney (WMW) methods gave almost identical results for the fixed-reader model. For the random-reader model, the DBM, Obuchowski-Rockette, and Beiden-Wagner-Campbell methods yielded approximately the same inferences, but the CIs for the Beiden-Wagner-Campbell method tend to be broader. Ishwaran's hierarchical ROC sometimes yielded significance not found with other methods. Song's modification of DBM's jack-knifing algorithm sometimes led to different conclusions than the original DBM algorithm. Conclusion In choosing and applying MRMC methods, it is important to recognize: (1) the distinction between random-reader and fixed-reader models, the uncertainties accounted for by each, and thus the level of generalizeability expected from each; (2) assumptions made by the various MRMC methods; and (3) limitations of a five- or six-reader study when the reader variability is great.

KW - diagnostic accuracy

KW - multireader multicase (MRMC) study

KW - multireader study

KW - Receiver operating characteristic (ROC) curve

KW - ROC analysis

UR - http://www.scopus.com/inward/record.url?scp=4544318699&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544318699&partnerID=8YFLogxK

U2 - 10.1016/j.acra.2004.04.014

DO - 10.1016/j.acra.2004.04.014

M3 - Article

C2 - 15350579

AN - SCOPUS:4544318699

VL - 11

SP - 980

EP - 995

JO - Academic Radiology

JF - Academic Radiology

SN - 1076-6332

IS - 9

ER -