Evaluating the impact of undetected disk errors in RAID systems

Eric W.D. Rozier, Wendy Belluomini, Veera Deenadhayalan, Jim Hafner, K. K. Rao, Pin Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.

Original languageEnglish (US)
Title of host publicationProceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
Pages83-92
Number of pages10
DOIs
StatePublished - Nov 26 2009
Event2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009 - Lisbon, Portugal
Duration: Jun 29 2009Jul 2 2009

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Other

Other2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
CountryPortugal
CityLisbon
Period6/29/097/2/09

Keywords

  • Modeling
  • Silent data corruption
  • Simulation
  • Undetected disk errors

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Fingerprint Dive into the research topics of 'Evaluating the impact of undetected disk errors in RAID systems'. Together they form a unique fingerprint.

  • Cite this

    Rozier, E. W. D., Belluomini, W., Deenadhayalan, V., Hafner, J., Rao, K. K., & Zhou, P. (2009). Evaluating the impact of undetected disk errors in RAID systems. In Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009 (pp. 83-92). [5270353] (Proceedings of the International Conference on Dependable Systems and Networks). https://doi.org/10.1109/DSN.2009.5270353