Using processor affinity in loop scheduling on shared-memeory multiprocessors

Evangelos P. Markatos, Thomas J. LeBlanc

Research output: Contribution to journalArticle

102 Citations (Scopus)

Abstract

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. In this paper, we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.

Original languageEnglish (US)
Pages (from-to)379-400
Number of pages22
JournalIEEE Transactions on Parallel and Distributed Systems
Volume5
Issue number4
DOIs
StatePublished - Apr 1994

Fingerprint

Multiprocessor
Affine transformation
Scheduling
Data storage equipment
Scheduling algorithms
Shared-memory multiprocessors
Synchronization
Iteration
Scheduling Algorithm
Parallelism
Workload
Silicon
Communication
Completion Time
Penalty
kernel
Minimise
Symmetry
Necessary

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Electrical and Electronic Engineering
  • Theoretical Computer Science

Cite this

Using processor affinity in loop scheduling on shared-memeory multiprocessors. / Markatos, Evangelos P.; LeBlanc, Thomas J.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 4, 04.1994, p. 379-400.

Research output: Contribution to journalArticle

Markatos, Evangelos P. ; LeBlanc, Thomas J. / Using processor affinity in loop scheduling on shared-memeory multiprocessors. In: IEEE Transactions on Parallel and Distributed Systems. 1994 ; Vol. 5, No. 4. pp. 379-400.
@article{87176a5f96404591b801a315357fe322,
title = "Using processor affinity in loop scheduling on shared-memeory multiprocessors",
abstract = "Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. In this paper, we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.",
author = "Markatos, {Evangelos P.} and LeBlanc, {Thomas J.}",
year = "1994",
month = "4",
doi = "10.1109/71.273046",
language = "English (US)",
volume = "5",
pages = "379--400",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Using processor affinity in loop scheduling on shared-memeory multiprocessors

AU - Markatos, Evangelos P.

AU - LeBlanc, Thomas J.

PY - 1994/4

Y1 - 1994/4

N2 - Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. In this paper, we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.

AB - Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. In this paper, we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.

UR - http://www.scopus.com/inward/record.url?scp=0028419803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028419803&partnerID=8YFLogxK

U2 - 10.1109/71.273046

DO - 10.1109/71.273046

M3 - Article

AN - SCOPUS:0028419803

VL - 5

SP - 379

EP - 400

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 4

ER -