Objectives_template

	Split-transaction bus Atomic bus leads to underutilization of bus resources Between the address is taken off the bus and the snoop responses are available the bus stays idle Even after the snoop result is available the bus may remain idle due to high memory access latency Split-transaction bus divides each transaction into two parts: request and response Between the request and response of a particular transaction there may be other requests and/or responses from different transactions Outstanding transactions that have not yet started or have completed only one phase are buffered in the requesting cache controllers New issues Split-transaction bus introduces new protocol races P0 and P1 have a line in S state and both issue BusUpgr, say, in consecutive cycles Snoop response arrives later because it takes time Now both P0 and P1 may think that they have ownership Flow control is important since buffer space is finite In-order or out-of-order response? Out-of-order response may better tolerate variable memory latency by servicing other requests Pentium Pro uses in-order response SGI Challenge and Sun Enterprise use out-of-order response i.e. no ordering is enforced SGI Powerpath-2 bus Used in SGI Challenge Conflicts are resolved by not allowing multiple bus transactions to the same cache line Allows eight outstanding requests on the bus at any point in time Flow control on buffers is provided by negative acknowledgments (NACKs): the bus has a dedicated NACK line which remains asserted if the buffer holding outstanding transactions is full; a NACKed transaction must be retried The request order determines the total order of memory accesses, but the responses may be delivered in a different order depending on the completion time of them In subsequent slides we call this design Powerpath-2 since it is loosely based on that Logically two separate buses Request bus for launching the command type (BusRd, BusWB etc.) and the involved address Response bus for providing the data response, if any Since responses may arrive in an order different from the request order, a 3-bit tag is assigned to each request Responses launch this tag on the tag bus along with the data reply so that the address bus may be left free for other requests The data bus is 256-bit wide while a cache line is 128 bytes One data response phase needs four bus cycles along with one additional hardware turnaround cycle