Module 3: Fundamentals of Parallel Computers: ILP vs TLP
  Lecture 5: Communication Architectures and Communication Costs
 


Ordering

  • How are the accesses to the same data ordered?
    • For sequential model, it is the program order: true dependence order
    • For shared memory, within a thread it is the program order, across threads some “valid interleaving” of accesses as expected by the programmer and enforced by synchronization operations (locks, point-to-point synchronization through flags, global synchronization through barriers)
    • Ordering issues are very subtle and important in shared memory model (some microprocessor re-ordering tricks may easily violate correctness when used in shared memory context)
    • For message passing, ordering across threads is implied through point-to-point send/receive pairs (producer-consumer relationship) and mutual exclusion is inherent (no shared variable)

Replication

  • How is the shared data locally replicated?
    • This is very important for reducing communication traffic
    • In microprocessors data is replicated in the cache to reduce memory accesses
    • In message passing, replication is explicit in the program and happens through receive (a private copy is created)
    • In shared memory a load brings in the data to the cache hierarchy so that subsequent accesses can be fast; this is totally hidden from the program and therefore the hardware must provide a layer that keeps track of the most recent copies of the data (this layer is central to the performance of shared memory multiprocessors and is called the cache coherence protocol )