Objectives_template

	Hierarchical design Possible to combine bus-based SMP and DSM to build hierarchical shared memory Sun Wildfire connects four large SMPs (28 processors) over a scalable interconnect to form a 112p multiprocessor IBM POWER4 has two processors on-chip with private L1 caches, but shared L2 and L3 caches (this is called a chip multiprocessor); connect these chips over a network to form scalable multiprocessors Next few lectures will focus on bus-based SMPs only Cache Coherence Intuitive memory model For sequential programs we expect a memory location to return the latest value written to that location For concurrent programs running on multiple threads or processes on a single processor we expect the same model to hold because all threads see the same cache hierarchy (same as shared L1 cache) For multiprocessors there remains a danger of using a stale value: in SMP or DSM the caches are not shared and processors are allowed to replicate data independently in each cache; hardware must ensure that cached values are coherent across the system and they satisfy programmers’ intuitive memory model Example Assume a write-through cache i.e. every store updates the value in cache as well as in memory P0: reads x from memory, puts it in its cache, and gets the value 5 P1: reads x from memory, puts it in its cache, and gets the value 5 P1: writes x=7, updates its cached value and memory value P0: reads x from its cache and gets the value 5 P2: reads x from memory, puts it in its cache, and gets the value 7 (now the system is completely incoherent) P2: writes x=10, updates its cached value and memory value Consider the same example with a writeback cache i.e. values are written back to memory only when the cache line is evicted from the cache P0 has a cached value 5, P1 has 7, P2 has 10, memory has 5 (since caches are not write through) The state of the line in P1 and P2 is M while the line in P0 is clean Eviction of the line from P1 and P2 will issue writebacks while eviction of the line from P0 will not issue a writeback (clean lines do not need writeback) Suppose P2 evicts the line first, and then P1 Final memory value is 7: we lost the store x=10 from P2 What went wrong? For write through cache The memory value may be correct if the writes are correctly ordered But the system allowed a store to proceed when there is already a cached copy Lesson learned: must invalidate all cached copies before allowing a store to proceed Writeback cache Problem is even more complicated: stores are no longer visible to memory immediately Writeback order is important Lesson learned: do not allow more than one copy of a cache line in M state Need to formalize the intuitive memory model In sequential programs the order of read/write is defined by the program order; the notion of “last write” is well-defined For multiprocessors how do you define “last write to a memory location” in presence of independent caches? Within a processor it is still fine, but how do you order read/write across processors?