The Advantages of Multiple Parallelizations in Combinatorial Search

L. A. Crowl, M. E. Crovella, T. J. Leblanc, M. L. Scott

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Applications typically have several potential sources of parallelism, and in choosing a particular parallelization, the programmer must balance the benefits of each source of parallelism with the corresponding overhead. The trade-offs are often difficult to analyze, as they may depend on the hardware architecture, software environment, input data, and properties of the algorithm. An example of this dilemma occurs in a wide range of problems that involve processing trees, wherein processors can be assigned either to separate subtrees, or to parallelizing the work performed on individual tree nodes. We explore the complexity of the trade-offs involved in this decision by considering alternative parallelizations of combinatorial search, examining the factors that determine the best-performing implementation for this important class of problems. Using subgraph isomorphism as a representative search problem, we show how the density of the solution space, the number of solutions desired, the number of available processors, and the underlying architecture all affect the choice of an efficient parallelization. Our experiments, which span seven different shared-memory multiprocessors and a wide range of input graphs, indicate that relative performance depends on each of these factors. On some machines and for some inputs, a sequential depth-first search of the solution space, applying simple loop-level parallelism at each node in the search tree, performs best. On other machines or other inputs, parallel tree search performs best. In still other cases, a hybrid solution, containing both parallel tree search and loop parallelism, works best. We present a quantitative analysis that explains these results and present experimental data culled from thousands of program executions that validates the analysis. From these experiences we conclude that there is no one "best" parallelization that suffices over a range of machines, inputs, and precise problem specifications. As a corollary, we provide quantitative evidence that programming environments and languages should not focus exclusively on flat data parallelism, since nested parallelism or hybrid forms of parallelism may be required for an efficient implementation of some applications.

Original languageEnglish (US)
Pages (from-to)110-123
Number of pages14
JournalJournal of Parallel and Distributed Computing
Volume21
Issue number1
DOIs
StatePublished - Apr 1994
Externally publishedYes

Fingerprint

Parallelization
Parallelism
Search Trees
Trade-offs
Or-parallelism
Range of data
Data Parallelism
Software architecture
Shared-memory multiprocessors
Depth-first Search
Programming Environments
Hardware Architecture
Computer hardware
Search Problems
Dilemma
Number of Solutions
Vertex of a graph
Quantitative Analysis
Efficient Implementation
Programming Languages

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Hardware and Architecture
  • Computer Science Applications

Cite this

The Advantages of Multiple Parallelizations in Combinatorial Search. / Crowl, L. A.; Crovella, M. E.; Leblanc, T. J.; Scott, M. L.

In: Journal of Parallel and Distributed Computing, Vol. 21, No. 1, 04.1994, p. 110-123.

Research output: Contribution to journalArticle

Crowl, L. A. ; Crovella, M. E. ; Leblanc, T. J. ; Scott, M. L. / The Advantages of Multiple Parallelizations in Combinatorial Search. In: Journal of Parallel and Distributed Computing. 1994 ; Vol. 21, No. 1. pp. 110-123.
@article{66fa9b1421344500a9abd35593e6851b,
title = "The Advantages of Multiple Parallelizations in Combinatorial Search",
abstract = "Applications typically have several potential sources of parallelism, and in choosing a particular parallelization, the programmer must balance the benefits of each source of parallelism with the corresponding overhead. The trade-offs are often difficult to analyze, as they may depend on the hardware architecture, software environment, input data, and properties of the algorithm. An example of this dilemma occurs in a wide range of problems that involve processing trees, wherein processors can be assigned either to separate subtrees, or to parallelizing the work performed on individual tree nodes. We explore the complexity of the trade-offs involved in this decision by considering alternative parallelizations of combinatorial search, examining the factors that determine the best-performing implementation for this important class of problems. Using subgraph isomorphism as a representative search problem, we show how the density of the solution space, the number of solutions desired, the number of available processors, and the underlying architecture all affect the choice of an efficient parallelization. Our experiments, which span seven different shared-memory multiprocessors and a wide range of input graphs, indicate that relative performance depends on each of these factors. On some machines and for some inputs, a sequential depth-first search of the solution space, applying simple loop-level parallelism at each node in the search tree, performs best. On other machines or other inputs, parallel tree search performs best. In still other cases, a hybrid solution, containing both parallel tree search and loop parallelism, works best. We present a quantitative analysis that explains these results and present experimental data culled from thousands of program executions that validates the analysis. From these experiences we conclude that there is no one {"}best{"} parallelization that suffices over a range of machines, inputs, and precise problem specifications. As a corollary, we provide quantitative evidence that programming environments and languages should not focus exclusively on flat data parallelism, since nested parallelism or hybrid forms of parallelism may be required for an efficient implementation of some applications.",
author = "Crowl, {L. A.} and Crovella, {M. E.} and Leblanc, {T. J.} and Scott, {M. L.}",
year = "1994",
month = "4",
doi = "10.1006/jpdc.1994.1045",
language = "English (US)",
volume = "21",
pages = "110--123",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - The Advantages of Multiple Parallelizations in Combinatorial Search

AU - Crowl, L. A.

AU - Crovella, M. E.

AU - Leblanc, T. J.

AU - Scott, M. L.

PY - 1994/4

Y1 - 1994/4

N2 - Applications typically have several potential sources of parallelism, and in choosing a particular parallelization, the programmer must balance the benefits of each source of parallelism with the corresponding overhead. The trade-offs are often difficult to analyze, as they may depend on the hardware architecture, software environment, input data, and properties of the algorithm. An example of this dilemma occurs in a wide range of problems that involve processing trees, wherein processors can be assigned either to separate subtrees, or to parallelizing the work performed on individual tree nodes. We explore the complexity of the trade-offs involved in this decision by considering alternative parallelizations of combinatorial search, examining the factors that determine the best-performing implementation for this important class of problems. Using subgraph isomorphism as a representative search problem, we show how the density of the solution space, the number of solutions desired, the number of available processors, and the underlying architecture all affect the choice of an efficient parallelization. Our experiments, which span seven different shared-memory multiprocessors and a wide range of input graphs, indicate that relative performance depends on each of these factors. On some machines and for some inputs, a sequential depth-first search of the solution space, applying simple loop-level parallelism at each node in the search tree, performs best. On other machines or other inputs, parallel tree search performs best. In still other cases, a hybrid solution, containing both parallel tree search and loop parallelism, works best. We present a quantitative analysis that explains these results and present experimental data culled from thousands of program executions that validates the analysis. From these experiences we conclude that there is no one "best" parallelization that suffices over a range of machines, inputs, and precise problem specifications. As a corollary, we provide quantitative evidence that programming environments and languages should not focus exclusively on flat data parallelism, since nested parallelism or hybrid forms of parallelism may be required for an efficient implementation of some applications.

AB - Applications typically have several potential sources of parallelism, and in choosing a particular parallelization, the programmer must balance the benefits of each source of parallelism with the corresponding overhead. The trade-offs are often difficult to analyze, as they may depend on the hardware architecture, software environment, input data, and properties of the algorithm. An example of this dilemma occurs in a wide range of problems that involve processing trees, wherein processors can be assigned either to separate subtrees, or to parallelizing the work performed on individual tree nodes. We explore the complexity of the trade-offs involved in this decision by considering alternative parallelizations of combinatorial search, examining the factors that determine the best-performing implementation for this important class of problems. Using subgraph isomorphism as a representative search problem, we show how the density of the solution space, the number of solutions desired, the number of available processors, and the underlying architecture all affect the choice of an efficient parallelization. Our experiments, which span seven different shared-memory multiprocessors and a wide range of input graphs, indicate that relative performance depends on each of these factors. On some machines and for some inputs, a sequential depth-first search of the solution space, applying simple loop-level parallelism at each node in the search tree, performs best. On other machines or other inputs, parallel tree search performs best. In still other cases, a hybrid solution, containing both parallel tree search and loop parallelism, works best. We present a quantitative analysis that explains these results and present experimental data culled from thousands of program executions that validates the analysis. From these experiences we conclude that there is no one "best" parallelization that suffices over a range of machines, inputs, and precise problem specifications. As a corollary, we provide quantitative evidence that programming environments and languages should not focus exclusively on flat data parallelism, since nested parallelism or hybrid forms of parallelism may be required for an efficient implementation of some applications.

UR - http://www.scopus.com/inward/record.url?scp=0347067572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0347067572&partnerID=8YFLogxK

U2 - 10.1006/jpdc.1994.1045

DO - 10.1006/jpdc.1994.1045

M3 - Article

VL - 21

SP - 110

EP - 123

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 1

ER -