Module 15: "Memory Consistency Models"
  Lecture 35: "Release Consistency and Delayed Consistency"
 

Release consistency

  • Relaxes WO even further
    • Categorize synchronization operations into acquire and release
    • Acquire is a read (e.g., a normal load) or an atomic read-modify-write (e.g., LL/SC pair) used to get access to certain set of protected variables; popular examples of acquire include LOCK operations and waiting on a flag
    • Release is a write (e.g., a normal store) or an atomic read-modify-write (e.g., LL/SC pair) used to grant access to certain set of protected variables to others; popular examples of release include UNLOCK operations and setting a flag
    • Barrier is release (arrival) as well as acquire (departure)
  • Couple of observations
    • There is no harm in issuing the acquire operation and the instructions after it, and committing them before all instructions preceding acquire have committed
    • There is no harm in executing and committing the instructions after release even before the release is committed
    • Of course, such re-ordering applies to only those instructions that access different memory addresses
    • Must not issue any instruction following an acquire before the acquire commits (semantics of acquire must be maintained)
    • Must commit all instructions ahead of release before release can commit (at this point all new values become visible to others)
  • Synchronization operations must be labeled properly
  • Otherwise compiler must insert fence instructions at acquire and release boundaries (may lose some of the benefits of RC)
  • Note that for barrier synchronization, RC does not offer any extra advantage over WO

Hardware support

  • WO/RC on Alpha processors
    • The processor does not implement load-invalidate replay; not needed for WO or RC (but WO is not fully exploited)
    • Offers two fence instructions: memory barrier (MB) and write memory barrier (WMB)
    • MB is same as sync in R10000 i.e. disables issue of any further memory operations until all memory operations before MB have committed
    • WMB only enforces ordering among stores like stbar in SPARC i.e. a store after WMB cannot bypass it, but a load can
    • MB can be used to easily implement WO: insert MB at every synchronization operation
  • Relaxed memory order (RMO) in SPARC v9
    • Offers four flavors of fence instructions (RR, RW, WR, WW): possible to synthesize an array of consistency models

Industry situation

  • A fairly debated issue
    • Chip designers naturally want to make hardware simple and that points to relaxed models
    • Processors designed by MIPS Technology implement SC
    • Processors from Sun Microsystems support TSO and/or PSO
    • Intel processors come with PC
    • Alpha and IBM PowerPC processors support WO; Power4, Power5 do not guarantee write atomicity
  • Multiprocessors normally follow the model supported by the underlying microprocessor
    • Pentium Pro Quad SMP allows writes and subsequent interventions to complete even before all invalidation acknowledgments are collected (violates write atomicity)
  • Delayed-exclusive replies in Origin 2000
    • In Origin 2000 the memory controller at the requester node does not send the upgrade ack or PUTX reply to the local processor until all inval acks are collected
    • Note that memory controller actually could fool the processor by sending the exclusive reply as soon as it arrives; this is called eager-exclusive reply and is exercised by most Alpha servers (they can do it because they are not SC)
    • Eager-exclusive reply complicates memory controller design: must hold on to the OTT entry until all inval acks are collected; must block subsequent interventions from proceeding and must block writeback for that line until all inval acks are collected (relaxing these two would violate write atomicity)