Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices

Shravan Gaonkar, Eric Rozier, Anthony Tong, William H. Sanders

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Dependable Systems and Networks
Pages386-391
Number of pages6
DOIs
StatePublished - Oct 13 2008
Event2008 International Conference on Dependable Systems and Networks, DSN-2008 - Anchorage, AK, United States
Duration: Jun 24 2008Jun 27 2008

Other

Other2008 International Conference on Dependable Systems and Networks, DSN-2008
CountryUnited States
CityAnchorage, AK
Period6/24/086/27/08

Fingerprint

Servers
Stochastic models
Availability

Keywords

  • Data analysis
  • Modeling techniques
  • Reliability and availability
  • Simulation
  • Storage systems

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Gaonkar, S., Rozier, E., Tong, A., & Sanders, W. H. (2008). Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices. In Proceedings of the International Conference on Dependable Systems and Networks (pp. 386-391). [4630107] https://doi.org/10.1109/DSN.2008.4630107

Scaling file systems to support petascale clusters : A dependability analysis to support informed design choices. / Gaonkar, Shravan; Rozier, Eric; Tong, Anthony; Sanders, William H.

Proceedings of the International Conference on Dependable Systems and Networks. 2008. p. 386-391 4630107.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gaonkar, S, Rozier, E, Tong, A & Sanders, WH 2008, Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices. in Proceedings of the International Conference on Dependable Systems and Networks., 4630107, pp. 386-391, 2008 International Conference on Dependable Systems and Networks, DSN-2008, Anchorage, AK, United States, 6/24/08. https://doi.org/10.1109/DSN.2008.4630107
Gaonkar S, Rozier E, Tong A, Sanders WH. Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices. In Proceedings of the International Conference on Dependable Systems and Networks. 2008. p. 386-391. 4630107 https://doi.org/10.1109/DSN.2008.4630107
Gaonkar, Shravan ; Rozier, Eric ; Tong, Anthony ; Sanders, William H. / Scaling file systems to support petascale clusters : A dependability analysis to support informed design choices. Proceedings of the International Conference on Dependable Systems and Networks. 2008. pp. 386-391
@inproceedings{cfadcb30ef8d4260b08b2b4eb0fc545d,
title = "Scaling file systems to support petascale clusters: A dependability analysis to support informed design choices",
abstract = "Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.",
keywords = "Data analysis, Modeling techniques, Reliability and availability, Simulation, Storage systems",
author = "Shravan Gaonkar and Eric Rozier and Anthony Tong and Sanders, {William H.}",
year = "2008",
month = "10",
day = "13",
doi = "10.1109/DSN.2008.4630107",
language = "English",
pages = "386--391",
booktitle = "Proceedings of the International Conference on Dependable Systems and Networks",

}

TY - GEN

T1 - Scaling file systems to support petascale clusters

T2 - A dependability analysis to support informed design choices

AU - Gaonkar, Shravan

AU - Rozier, Eric

AU - Tong, Anthony

AU - Sanders, William H.

PY - 2008/10/13

Y1 - 2008/10/13

N2 - Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.

AB - Petascale computing requires I/O subsystems that can keep up with the dramatic computing power demanded by such systems. TOP500.org ranks top computers based on their peak compute performance, but there has not been adequate investigation of the current state-of-the-art and future requirements of storage area networks that support petascale computers. Dependable scaling of an I/O subsystem to support petascale computing is not as simple as adding more storage servers. In this paper, we present a stochastic activity network model that uses failure rates computed from real logs to predict the reliability and availability of the storage architecture of the Abe cluster at the National Center for Supercomputing Applications (NCSA). We then use the model to evaluate the challenges encountered as one scales the number of storage servers to support petascale computing. The results present new insights regarding the dependability challenges that will be encountered when building next-generation petabyte storage. Furthermore, we provide insight into a new design approach that will enable system designers to integrate the trace-based analysis of parameter values from real system data into their stochastic models to allow informed design choices.

KW - Data analysis

KW - Modeling techniques

KW - Reliability and availability

KW - Simulation

KW - Storage systems

UR - http://www.scopus.com/inward/record.url?scp=53349175680&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=53349175680&partnerID=8YFLogxK

U2 - 10.1109/DSN.2008.4630107

DO - 10.1109/DSN.2008.4630107

M3 - Conference contribution

AN - SCOPUS:53349175680

SP - 386

EP - 391

BT - Proceedings of the International Conference on Dependable Systems and Networks

ER -