|
Artifactual comm.
- Communication caused by artifacts of extended memory hierarchy
- Data accesses not satisfied in the cache or local memory cause communication
- Inherent communication is caused by data transfers determined by the program
- Artifactual communication is caused by poor allocation of data across distributed memories, unnecessary data in a transfer, unnecessary transfers due to system-dependent transfer granularity, redundant communication of data, finite replication capacity (in cache or memory)
- Inherent communication assumes infinite capacity and perfect knowledge of what should be transferred
Capacity problem
- Most probable reason for artifactual communication
- Due to finite capacity of cache, local memory or remote memory
- May view a multiprocessor as a three-level memory hierarchy for this purpose: local cache, local memory, remote memory
- Communication due to cold or compulsory misses and inherent communication are independent of capacity
- Capacity and conflict misses generate communication resulting from finite capacity
- Generated traffic may be local or remote depending on the allocation of pages
- General technique: exploit spatial and temporal locality to use the cache properly
Temporal locality
- Maximize reuse of data
- Schedule tasks that access same data in close succession
- Many linear algebra kernels use blocking of matrices to improve temporal (and spatial) locality
- Example: Transpose phase in Fast Fourier Transform (FFT); to improve locality, the algorithm carries out blocked transpose i.e. transposes a block of data at a time
Block transpose |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|