Objectives_template

	Memory consistency Coherence protocol is not enough to completely specify the output(s) of a parallel program Coherence protocol only provides the foundation to reason about legal outcome of accesses to the same memory location Consistency model tells us the possible outcomes arising from legal ordering of accesses to all memory locations A shared memory machine advertises the supported consistency model; it is a “contract” with the writers of parallel software and the writers of parallelizing compilers Implementing memory consistency model is really a hardware-software tradeoff: a strict sequential model (SC) offers execution that is intuitive, but may suffer in terms of performance; relaxed models (RC) make program reasoning difficult, but may offer better performance SC Recall that an execution is SC if the memory operations form a valid total order i.e. it is an interleaving of the partial program orders Sufficient conditions require that a new memory operation cannot issue until the previous one is completed This is too restrictive and essentially disallows compiler as well as hardware any re-ordering of instructions No microprocessor that supports SC implements sufficient conditions Instead, all out-of-order execution is allowed, and a proper recovery mechanism is implemented in case of a memory order violation Let’s discuss the MIPS R10000 implementation SC in MIPS R10000 Issues instructions out of program order, but commits in order The problem is with speculatively executed loads: a load may execute and use a value long before it finally commits In the meantime, some other processor may modify that value through a store and the store may commit (i.e. become globally visible) before the load commits: may violate SC (why?) How do you detect such a violation? How do you recover and guarantee an SC execution? Any special consideration for prefetches? Binding and non-binding prefetches In MIPS R10000 a store remains at the head of the active list until it is completed in cache Can we just remove it as soon as it issues and let the other instructions commit (the store can complete from store buffer at a later point)? How far can we go and still guarantee SC? The Stanford DASH multiprocessor, on receiving a read reply that is already invalidated, forces the processor to retry that load Why can’t it use the value in the cache line and then discard the line? Does the cache controller need to take any special action when a line is replaced from the cache?