Can high bandwidth and latency justify large cache blocks in scalable multiprocessors?

Ricardo Bianchini, Thomas LeBlanc

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

An important architectural design decision affecting the performance of coherent caches is the choice of block size. There are two primary factors that influence this choice: the reference behavior of applications and the remote access bandwidth and latency of the machine. Given that we anticipate increases in both network bandwidth and latency (in processor cycles) in scalable shared-memory multiprocessors, the question arises as to what effect these increases will have on the choice of block size. We use analytical modeling and execution-driven simulation of parallel programs on a large-scale shared-memory machine to examine the relationship between cache block size and application performance as a function of remote access bandwidth and latency. We show that even under assumptions of high remote access bandwidth and latency, the best application performance usually results from using cache blocks between S2 and 128 bytes in size. We also show that modifying the program to remove the dominant source of misses may not increase the best performing block size. We conclude that large cache blocks cannot be justified in most realistic scenarios.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference on Parallel Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Volume1
ISBN (Print)0849324939, 9780849324932
DOIs
StatePublished - 1994
Externally publishedYes
Event23rd International Conference on Parallel Processing, ICPP 1994 - Raleigh, NC, United States
Duration: Aug 15 1994Aug 19 1994

Other

Other23rd International Conference on Parallel Processing, ICPP 1994
CountryUnited States
CityRaleigh, NC
Period8/15/948/19/94

Fingerprint

Multiprocessor
Justify
Cache
Latency
Bandwidth
Data storage equipment
Architectural design
Shared-memory multiprocessors
Analytical Modeling
Architectural Design
Parallel Programs
Shared Memory
Cycle
Scenarios
Simulation

ASJC Scopus subject areas

  • Software
  • Mathematics(all)
  • Hardware and Architecture

Cite this

Bianchini, R., & LeBlanc, T. (1994). Can high bandwidth and latency justify large cache blocks in scalable multiprocessors? In Proceedings of the International Conference on Parallel Processing (Vol. 1). [4115727] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPP.1994.66

Can high bandwidth and latency justify large cache blocks in scalable multiprocessors? / Bianchini, Ricardo; LeBlanc, Thomas.

Proceedings of the International Conference on Parallel Processing. Vol. 1 Institute of Electrical and Electronics Engineers Inc., 1994. 4115727.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bianchini, R & LeBlanc, T 1994, Can high bandwidth and latency justify large cache blocks in scalable multiprocessors? in Proceedings of the International Conference on Parallel Processing. vol. 1, 4115727, Institute of Electrical and Electronics Engineers Inc., 23rd International Conference on Parallel Processing, ICPP 1994, Raleigh, NC, United States, 8/15/94. https://doi.org/10.1109/ICPP.1994.66
Bianchini R, LeBlanc T. Can high bandwidth and latency justify large cache blocks in scalable multiprocessors? In Proceedings of the International Conference on Parallel Processing. Vol. 1. Institute of Electrical and Electronics Engineers Inc. 1994. 4115727 https://doi.org/10.1109/ICPP.1994.66
Bianchini, Ricardo ; LeBlanc, Thomas. / Can high bandwidth and latency justify large cache blocks in scalable multiprocessors?. Proceedings of the International Conference on Parallel Processing. Vol. 1 Institute of Electrical and Electronics Engineers Inc., 1994.
@inproceedings{ee37ca3bfa7e4425b7aa97aa1bffb50d,
title = "Can high bandwidth and latency justify large cache blocks in scalable multiprocessors?",
abstract = "An important architectural design decision affecting the performance of coherent caches is the choice of block size. There are two primary factors that influence this choice: the reference behavior of applications and the remote access bandwidth and latency of the machine. Given that we anticipate increases in both network bandwidth and latency (in processor cycles) in scalable shared-memory multiprocessors, the question arises as to what effect these increases will have on the choice of block size. We use analytical modeling and execution-driven simulation of parallel programs on a large-scale shared-memory machine to examine the relationship between cache block size and application performance as a function of remote access bandwidth and latency. We show that even under assumptions of high remote access bandwidth and latency, the best application performance usually results from using cache blocks between S2 and 128 bytes in size. We also show that modifying the program to remove the dominant source of misses may not increase the best performing block size. We conclude that large cache blocks cannot be justified in most realistic scenarios.",
author = "Ricardo Bianchini and Thomas LeBlanc",
year = "1994",
doi = "10.1109/ICPP.1994.66",
language = "English (US)",
isbn = "0849324939",
volume = "1",
booktitle = "Proceedings of the International Conference on Parallel Processing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Can high bandwidth and latency justify large cache blocks in scalable multiprocessors?

AU - Bianchini, Ricardo

AU - LeBlanc, Thomas

PY - 1994

Y1 - 1994

N2 - An important architectural design decision affecting the performance of coherent caches is the choice of block size. There are two primary factors that influence this choice: the reference behavior of applications and the remote access bandwidth and latency of the machine. Given that we anticipate increases in both network bandwidth and latency (in processor cycles) in scalable shared-memory multiprocessors, the question arises as to what effect these increases will have on the choice of block size. We use analytical modeling and execution-driven simulation of parallel programs on a large-scale shared-memory machine to examine the relationship between cache block size and application performance as a function of remote access bandwidth and latency. We show that even under assumptions of high remote access bandwidth and latency, the best application performance usually results from using cache blocks between S2 and 128 bytes in size. We also show that modifying the program to remove the dominant source of misses may not increase the best performing block size. We conclude that large cache blocks cannot be justified in most realistic scenarios.

AB - An important architectural design decision affecting the performance of coherent caches is the choice of block size. There are two primary factors that influence this choice: the reference behavior of applications and the remote access bandwidth and latency of the machine. Given that we anticipate increases in both network bandwidth and latency (in processor cycles) in scalable shared-memory multiprocessors, the question arises as to what effect these increases will have on the choice of block size. We use analytical modeling and execution-driven simulation of parallel programs on a large-scale shared-memory machine to examine the relationship between cache block size and application performance as a function of remote access bandwidth and latency. We show that even under assumptions of high remote access bandwidth and latency, the best application performance usually results from using cache blocks between S2 and 128 bytes in size. We also show that modifying the program to remove the dominant source of misses may not increase the best performing block size. We conclude that large cache blocks cannot be justified in most realistic scenarios.

UR - http://www.scopus.com/inward/record.url?scp=84904337098&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904337098&partnerID=8YFLogxK

U2 - 10.1109/ICPP.1994.66

DO - 10.1109/ICPP.1994.66

M3 - Conference contribution

SN - 0849324939

SN - 9780849324932

VL - 1

BT - Proceedings of the International Conference on Parallel Processing

PB - Institute of Electrical and Electronics Engineers Inc.

ER -