Module 12: "Multiprocessors on a Snoopy Bus"
  Lecture 25: "Protocols for Split-transaction Buses"
 

Split-transaction bus

  • Atomic bus leads to underutilization of bus resources
    • Between the address is taken off the bus and the snoop responses are available the bus stays idle
    • Even after the snoop result is available the bus may remain idle due to high memory access latency
  • Split-transaction bus divides each transaction into two parts: request and response
    • Between the request and response of a particular transaction there may be other requests and/or responses from different transactions
    • Outstanding transactions that have not yet started or have completed only one phase are buffered in the requesting cache controllers

New issues

  • Split-transaction bus introduces new protocol races
    • P0 and P1 have a line in S state and both issue BusUpgr, say, in consecutive cycles
    • Snoop response arrives later because it takes time
    • Now both P0 and P1 may think that they have ownership
  • Flow control is important since buffer space is finite
  • In-order or out-of-order response?
    • Out-of-order response may better tolerate variable memory latency by servicing other requests
    • Pentium Pro uses in-order response
    • SGI Challenge and Sun Enterprise use out-of-order response i.e. no ordering is enforced

SGI Powerpath-2 bus

  • Used in SGI Challenge
    • Conflicts are resolved by not allowing multiple bus transactions to the same cache line
    • Allows eight outstanding requests on the bus at any point in time
    • Flow control on buffers is provided by negative acknowledgments (NACKs): the bus has a dedicated NACK line which remains asserted if the buffer holding outstanding transactions is full; a NACKed transaction must be retried
    • The request order determines the total order of memory accesses, but the responses may be delivered in a different order depending on the completion time of them
    • In subsequent slides we call this design Powerpath-2 since it is loosely based on that
  • Logically two separate buses
    • Request bus for launching the command type (BusRd, BusWB etc.) and the involved address
    • Response bus for providing the data response, if any
    • Since responses may arrive in an order different from the request order, a 3-bit tag is assigned to each request
    • Responses launch this tag on the tag bus along with the data reply so that the address bus may be left free for other requests
  • The data bus is 256-bit wide while a cache line is 128 bytes
    • One data response phase needs four bus cycles along with one additional hardware turnaround cycle