|
Point-to-point Synch.
- Normally done in software with flags
P0: A = 1; flag = 1;
P1: while (!flag); print A;
- Some old machines supported full/empty bits in memory
- Each memory location is augmented with a full/empty bit
- Producer writes the location only if bit is reset
- Consumer reads location if bit is set and resets it
- Lot less flexible: one producer-one consumer sharing only (one producer-many consumers is very popular); all accesses to a memory location become synchronized (unless compiler flags some accesses as special)
- Possible optimization for shared memory
- Allocate flag and data structures (if small) guarded by flag in same cache line e.g., flag and A in above example
Barrier
- High-level classification of barriers
- Hardware and software barriers
- Will focus on two types of software barriers
- Centralized barrier: every processor polls a single count
- Distributed tree barrier: shows much better scalability
- Performance goals of a barrier implementation
- Low latency: After all processors have arrived at the barrier, they should be able to leave quickly
- Low traffic: Minimize bus transaction and contention
- Scalability: Latency and traffic should scale slowly with the number of processors
- Low storage: Barrier state should not be big
- Fairness: Preserve some strict order of barrier exit (could be FIFO according to arrival order); a particular processor should not always be the last one to exit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|