Sensitivity of Parallel Applications to Large Differences

This paper studies application performance on systems with strongly non-uniform remote memory access. In current generation NUMAs the speed difference between the slowest and fastest link in an interconnect—the “NUMA gap”—is typically less than an order of magnitude, and many conventional parallel p...

Full description

Bibliographic Details
Main Authors: Aske Plaat, Henri E. Bal, Rutger F. H. Hofman, Thilo Kielmann
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 1999
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.5880
http://www.cs.vu.nl/~kielmann/papers/fgcs00.pdf
Description
Summary:This paper studies application performance on systems with strongly non-uniform remote memory access. In current generation NUMAs the speed difference between the slowest and fastest link in an interconnect—the “NUMA gap”—is typically less than an order of magnitude, and many conventional parallel programs achieve good performance. We study how different NUMA gaps influence application performance, up to and including typical wide-area latencies and bandwidths. We find that for gaps larger than those of current generation NUMAs, performance suffers considerably (for applications that were designed for a uniform access interconnect). For many applications, however, performance can be greatly improved with comparatively simple changes: traffic over slow links can be reduced by making communication patterns hierarchical—like the interconnect. We find that in four out of our six applications the size of the gap can be increased by an order of magnitude or more without severely impacting speedup. We analyze why the improvements are needed, why they work so well, and how much non-uniformity they can mask.