Evaluating the impact of undetected disk errors in RAID systems

Eric W D Rozier, Wendy Belluomini, Veera Deenadhayalan, Jim Hafner, K. K. Rao, Pin Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Dependable Systems and Networks
Pages83-92
Number of pages10
DOIs
StatePublished - Nov 26 2009
Event2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009 - Lisbon, Portugal
Duration: Jun 29 2009Jul 2 2009

Other

Other2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
CountryPortugal
CityLisbon
Period6/29/097/2/09

Fingerprint

Large scale systems
Simulators

Keywords

  • Modeling
  • Silent data corruption
  • Simulation
  • Undetected disk errors

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Rozier, E. W. D., Belluomini, W., Deenadhayalan, V., Hafner, J., Rao, K. K., & Zhou, P. (2009). Evaluating the impact of undetected disk errors in RAID systems. In Proceedings of the International Conference on Dependable Systems and Networks (pp. 83-92). [5270353] https://doi.org/10.1109/DSN.2009.5270353

Evaluating the impact of undetected disk errors in RAID systems. / Rozier, Eric W D; Belluomini, Wendy; Deenadhayalan, Veera; Hafner, Jim; Rao, K. K.; Zhou, Pin.

Proceedings of the International Conference on Dependable Systems and Networks. 2009. p. 83-92 5270353.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rozier, EWD, Belluomini, W, Deenadhayalan, V, Hafner, J, Rao, KK & Zhou, P 2009, Evaluating the impact of undetected disk errors in RAID systems. in Proceedings of the International Conference on Dependable Systems and Networks., 5270353, pp. 83-92, 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009, Lisbon, Portugal, 6/29/09. https://doi.org/10.1109/DSN.2009.5270353
Rozier EWD, Belluomini W, Deenadhayalan V, Hafner J, Rao KK, Zhou P. Evaluating the impact of undetected disk errors in RAID systems. In Proceedings of the International Conference on Dependable Systems and Networks. 2009. p. 83-92. 5270353 https://doi.org/10.1109/DSN.2009.5270353
Rozier, Eric W D ; Belluomini, Wendy ; Deenadhayalan, Veera ; Hafner, Jim ; Rao, K. K. ; Zhou, Pin. / Evaluating the impact of undetected disk errors in RAID systems. Proceedings of the International Conference on Dependable Systems and Networks. 2009. pp. 83-92
@inproceedings{78124372c129433c9cd80db322523523,
title = "Evaluating the impact of undetected disk errors in RAID systems",
abstract = "Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.",
keywords = "Modeling, Silent data corruption, Simulation, Undetected disk errors",
author = "Rozier, {Eric W D} and Wendy Belluomini and Veera Deenadhayalan and Jim Hafner and Rao, {K. K.} and Pin Zhou",
year = "2009",
month = "11",
day = "26",
doi = "10.1109/DSN.2009.5270353",
language = "English",
isbn = "9781424444212",
pages = "83--92",
booktitle = "Proceedings of the International Conference on Dependable Systems and Networks",

}

TY - GEN

T1 - Evaluating the impact of undetected disk errors in RAID systems

AU - Rozier, Eric W D

AU - Belluomini, Wendy

AU - Deenadhayalan, Veera

AU - Hafner, Jim

AU - Rao, K. K.

AU - Zhou, Pin

PY - 2009/11/26

Y1 - 2009/11/26

N2 - Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.

AB - Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.

KW - Modeling

KW - Silent data corruption

KW - Simulation

KW - Undetected disk errors

UR - http://www.scopus.com/inward/record.url?scp=70450198064&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70450198064&partnerID=8YFLogxK

U2 - 10.1109/DSN.2009.5270353

DO - 10.1109/DSN.2009.5270353

M3 - Conference contribution

SN - 9781424444212

SP - 83

EP - 92

BT - Proceedings of the International Conference on Dependable Systems and Networks

ER -