|
Shared vs. private in CMPs
- Shared caches are often very large in the CMPs
- They are banked to avoid worst-case wire delay
- The banks are usually distributed across the floor of the chip on an interconnect
- In shared caches, getting a block from a remote bank takes time proportional to the physical distance between the requester and the bank
- Non-uniform cache architecture (NUCA)
- This is same for private caches, if the data resides in a remote cache
- Shared cache may have higher average hit latency than the private cache
- Hopefully most hits in the latter will be local
- Shared caches are most likely to have less misses than private caches
- Latter wastes space due to replication
Cache coherence
- Nothing unique to multiprocessors
- Even uniprocessor computers need to worry about cache coherence
- For sequential programs we expect a memory location to return the latest value written
- For concurrent programs running on multiple threads or processes on a single processor we expect the same model to hold because all threads see the same cache hierarchy (same as shared L1 cache)
- For multiprocessors there remains a danger of using a stale value: hardware must ensure that cached values are coherent across the system and they satisfy programmers’ intuitive memory model
Cache coherence: Example
- Assume a write-through cache
- P0: reads x from memory, puts it in its cache, and gets the value 5
- P1: reads x from memory, puts it in its cache, and gets the value 5
- P1: writes x=7, updates its cached value and memory value
- P0: reads x from its cache and gets the value 5
- P2: reads x from memory, puts it in its cache, and gets the value 7 (now the system is completely incoherent)
- P2: writes x=10, updates its cached value and memory value
- Consider the same example with a writeback cache
- P0 has a cached value 5, P1 has 7, P2 has 10, memory has 5 (since caches are not write through)
- The state of the line in P1 and P2 is M while the line in P0 is clean
- Eviction of the line from P1 and P2 will issue writebacks while eviction of the line from P0 will not issue a writeback (clean lines do not need writeback)
- Suppose P2 evicts the line first, and then P1
- Final memory value is 7: we lost the store x=10 from P2
|