Adjustable block size coherent caches

Czarek Dubnicki, Thomas J. LeBlanc

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the number of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, and merged back into a single cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by application.

Original languageEnglish (US)
Title of host publicationProceedings of the Ninth Annual International Symposium on Computer Architecture
PublisherPubl by ACM
Pages170-180
Number of pages11
ISBN (Print)0897915097
StatePublished - 1993
Externally publishedYes
EventProceedings of the 19th Annual International Symposium on Compu- ter Architecture - Gold Coast, Aust
Duration: May 19 1992May 21 1992

Other

OtherProceedings of the 19th Annual International Symposium on Compu- ter Architecture
CityGold Coast, Aust
Period5/19/925/21/92

Fingerprint

Data storage equipment

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Dubnicki, C., & LeBlanc, T. J. (1993). Adjustable block size coherent caches. In Proceedings of the Ninth Annual International Symposium on Computer Architecture (pp. 170-180). Publ by ACM.

Adjustable block size coherent caches. / Dubnicki, Czarek; LeBlanc, Thomas J.

Proceedings of the Ninth Annual International Symposium on Computer Architecture. Publ by ACM, 1993. p. 170-180.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dubnicki, C & LeBlanc, TJ 1993, Adjustable block size coherent caches. in Proceedings of the Ninth Annual International Symposium on Computer Architecture. Publ by ACM, pp. 170-180, Proceedings of the 19th Annual International Symposium on Compu- ter Architecture, Gold Coast, Aust, 5/19/92.
Dubnicki C, LeBlanc TJ. Adjustable block size coherent caches. In Proceedings of the Ninth Annual International Symposium on Computer Architecture. Publ by ACM. 1993. p. 170-180
Dubnicki, Czarek ; LeBlanc, Thomas J. / Adjustable block size coherent caches. Proceedings of the Ninth Annual International Symposium on Computer Architecture. Publ by ACM, 1993. pp. 170-180
@inproceedings{89f5553ba1014ea5a563a6faf868bec1,
title = "Adjustable block size coherent caches",
abstract = "Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the number of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, and merged back into a single cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program suffers a 33{\%} increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7{\%} of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by application.",
author = "Czarek Dubnicki and LeBlanc, {Thomas J.}",
year = "1993",
language = "English (US)",
isbn = "0897915097",
pages = "170--180",
booktitle = "Proceedings of the Ninth Annual International Symposium on Computer Architecture",
publisher = "Publ by ACM",

}

TY - GEN

T1 - Adjustable block size coherent caches

AU - Dubnicki, Czarek

AU - LeBlanc, Thomas J.

PY - 1993

Y1 - 1993

N2 - Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the number of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, and merged back into a single cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by application.

AB - Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the number of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, and merged back into a single cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by application.

UR - http://www.scopus.com/inward/record.url?scp=0027805837&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027805837&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0897915097

SP - 170

EP - 180

BT - Proceedings of the Ninth Annual International Symposium on Computer Architecture

PB - Publ by ACM

ER -