Objectives_template

Shared vs. private in CMPs

Shared caches are often very large in the CMPs
- They are banked to avoid worst-case wire delay
- The banks are usually distributed across the floor of the chip on an interconnect
In shared caches, getting a block from a remote bank takes time proportional to the physical distance between the requester and the bank
- Non-uniform cache architecture (NUCA)
This is same for private caches, if the data resides in a remote cache
Shared cache may have higher average hit latency than the private cache
- Hopefully most hits in the latter will be local
Shared caches are most likely to have less misses than private caches
- Latter wastes space due to replication

Cache coherence

Nothing unique to multiprocessors
- Even uniprocessor computers need to worry about cache coherence
- For sequential programs we expect a memory location to return the latest value written
- For concurrent programs running on multiple threads or processes on a single processor we expect the same model to hold because all threads see the same cache hierarchy (same as shared L1 cache)
- For multiprocessors there remains a danger of using a stale value: hardware must ensure that cached values are coherent across the system and they satisfy programmers’ intuitive memory model

Cache coherence: Example