|
Hierarchical design
- Possible to combine bus-based SMP and DSM to build hierarchical shared memory
- Sun Wildfire connects four large SMPs (28 processors) over a scalable interconnect to form a 112p multiprocessor
- IBM POWER4 has two processors on-chip with private L1 caches, but shared L2 and L3 caches (this is called a chip multiprocessor); connect these chips over a network to form scalable multiprocessors
- Next few lectures will focus on bus-based SMPs only
Cache Coherence
- Intuitive memory model
- For sequential programs we expect a memory location to return the latest value written to that location
- For concurrent programs running on multiple threads or processes on a single processor we expect the same model to hold because all threads see the same cache hierarchy (same as shared L1 cache)
- For multiprocessors there remains a danger of using a stale value: in SMP or DSM the caches are not shared and processors are allowed to replicate data independently in each cache; hardware must ensure that cached values are coherent across the system and they satisfy programmers’ intuitive memory model
Example
- Assume a write-through cache i.e. every store updates the value in cache as well as in memory
- P0: reads x from memory, puts it in its cache, and gets the value 5
- P1: reads x from memory, puts it in its cache, and gets the value 5
- P1: writes x=7, updates its cached value and memory value
- P0: reads x from its cache and gets the value 5
- P2: reads x from memory, puts it in its cache, and gets the value 7 (now the system is completely incoherent)
- P2: writes x=10, updates its cached value and memory value
- Consider the same example with a writeback cache i.e. values are written back to memory only when the cache line is evicted from the cache
- P0 has a cached value 5, P1 has 7, P2 has 10, memory has 5 (since caches are not write through)
- The state of the line in P1 and P2 is M while the line in P0 is clean
- Eviction of the line from P1 and P2 will issue writebacks while eviction of the line from P0 will not issue a writeback (clean lines do not need writeback)
- Suppose P2 evicts the line first, and then P1
- Final memory value is 7: we lost the store x=10 from P2
What went wrong?
- For write through cache
- The memory value may be correct if the writes are correctly ordered
- But the system allowed a store to proceed when there is already a cached copy
- Lesson learned: must invalidate all cached copies before allowing a store to proceed
- Writeback cache
- Problem is even more complicated: stores are no longer visible to memory immediately
- Writeback order is important
- Lesson learned: do not allow more than one copy of a cache line in M state
- Need to formalize the intuitive memory model
- In sequential programs the order of read/write is defined by the program order; the notion of “last write” is well-defined
- For multiprocessors how do you define “last write to a memory location” in presence of independent caches?
- Within a processor it is still fine, but how do you order read/write across processors?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|