Objectives_template

	Write atomicity and SC Sequential consistency (SC) requires write atomicity i.e. total order of all writes seen by all processors should be identical Since a BusRdX or BusUpgr does not wait until the invalidations are actually applied to the caches, you have to be careful P0: A=1; B=1; P1: print B; print A Under SC (A, B) = (0, 1) is not allowed Suppose to start with P1 has the line containing A in cache, but not the line containing B The stores of P0 queue the invalidation of A in P1’s cache controller P1 takes read miss for B, but the response of B is re-ordered by P1’s cache controller so that it overtakes the invalidaton (thought it may be better to prioritize reads) Another example P0: A=1; print B; P1: B=1; print A; Under SC (A, B) = (0, 0) is not allowed Same problem if P0 executes both instructions first, then P1 executes the write of B (which let’s assume generates an upgrade so that it is marked complete as soon as the address arbitration phase finishes), then the upgrade completion is re-ordered with the pending invalidation of A So, the reason these two cases fail is that the new values are made visible before older invalidations are applied One solution is to have a strict FIFO queue between the bus controller and the cache hierarchy But it is sufficient as long as replies do not overtake invalidations; otherwise the bus responses can be re-ordered without violating write atomicity and hence SC (e.g., if there are only read and write responses in the queue, it sometimes may make sense to prioritize read responses) In-order response In-order response can simplify quite a few things in the design The fully associative request table can be replaced by a FIFO queue Conflicting requests where one is a write can actually be allowed now (multiple reads were allowed even before although only the first one actually appears on the bus) Consider a BusRdX followed by a BusRd from two different processors With in-order response it is guaranteed that the BusRdX response will be granted the data bus before the BusRd response (which may not be true for ooo response and hence such a conflict is disallowed) So when the cache controller generating the BusRdX sees the BusRd it only notes that it should source the line for this request after its own write is completed The performance penalty may be huge Essentially because of the memory Consider a situation where three requests are pending to cache lines A, B, C in that order A and B map to the same memory bank while C is in a different bank Although the response for C may be ready long before that of B, it cannot get the bus