TY - GEN
T1 - Evaluating the impact of undetected disk errors in RAID systems
AU - Rozier, Eric W.D.
AU - Belluomini, Wendy
AU - Deenadhayalan, Veera
AU - Hafner, Jim
AU - Rao, K. K.
AU - Zhou, Pin
PY - 2009
Y1 - 2009
N2 - Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.
AB - Despite the reliability of modern disks, recent studies have made it clear that a new class of faults, UndetectedDisk Errors (UDEs) also known as silent data corruption events, become a real challenge as storage capacity scales. While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID. We present a fault model for UDEs, and a hybrid framework for simulating UDEs in large-scale systems. The framework combines a multi-resolution discrete event simulator with numerical solvers. Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions. We present results for several systems and workloads, from gigascale to petascale. These results indicate that corruption from UDEs is a significant problem in the absence of protection schemes and that such schemes dramatically decrease the rate of undetected data corruption.
KW - Modeling
KW - Silent data corruption
KW - Simulation
KW - Undetected disk errors
UR - http://www.scopus.com/inward/record.url?scp=70450198064&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70450198064&partnerID=8YFLogxK
U2 - 10.1109/DSN.2009.5270353
DO - 10.1109/DSN.2009.5270353
M3 - Conference contribution
AN - SCOPUS:70450198064
SN - 9781424444212
T3 - Proceedings of the International Conference on Dependable Systems and Networks
SP - 83
EP - 92
BT - Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
T2 - 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009
Y2 - 29 June 2009 through 2 July 2009
ER -