|
Release consistency
- Relaxes WO even further
- Categorize synchronization operations into acquire and release
- Acquire is a read (e.g., a normal load) or an atomic read-modify-write (e.g., LL/SC pair) used to get access to certain set of protected variables; popular examples of acquire include LOCK operations and waiting on a flag
- Release is a write (e.g., a normal store) or an atomic read-modify-write (e.g., LL/SC pair) used to grant access to certain set of protected variables to others; popular examples of release include UNLOCK operations and setting a flag
- Barrier is release (arrival) as well as acquire (departure)
- Couple of observations
- There is no harm in issuing the acquire operation and the instructions after it, and committing them before all instructions preceding acquire have committed
- There is no harm in executing and committing the instructions after release even before the release is committed
- Of course, such re-ordering applies to only those instructions that access different memory addresses
- Must not issue any instruction following an acquire before the acquire commits (semantics of acquire must be maintained)
- Must commit all instructions ahead of release before release can commit (at this point all new values become visible to others)
- Synchronization operations must be labeled properly
- Otherwise compiler must insert fence instructions at acquire and release boundaries (may lose some of the benefits of RC)
- Note that for barrier synchronization, RC does not offer any extra advantage over WO
Hardware support
- WO/RC on Alpha processors
- The processor does not implement load-invalidate replay; not needed for WO or RC (but WO is not fully exploited)
- Offers two fence instructions: memory barrier (MB) and write memory barrier (WMB)
- MB is same as sync in R10000 i.e. disables issue of any further memory operations until all memory operations before MB have committed
- WMB only enforces ordering among stores like stbar in SPARC i.e. a store after WMB cannot bypass it, but a load can
- MB can be used to easily implement WO: insert MB at every synchronization operation
- Relaxed memory order (RMO) in SPARC v9
- Offers four flavors of fence instructions (RR, RW, WR, WW): possible to synthesize an array of consistency models
Industry situation
- A fairly debated issue
- Chip designers naturally want to make hardware simple and that points to relaxed models
- Processors designed by MIPS Technology implement SC
- Processors from Sun Microsystems support TSO and/or PSO
- Intel processors come with PC
- Alpha and IBM PowerPC processors support WO; Power4, Power5 do not guarantee write atomicity
- Multiprocessors normally follow the model supported by the underlying microprocessor
- Pentium Pro Quad SMP allows writes and subsequent interventions to complete even before all invalidation acknowledgments are collected (violates write atomicity)
- Delayed-exclusive replies in Origin 2000
- In Origin 2000 the memory controller at the requester node does not send the upgrade ack or PUTX reply to the local processor until all inval acks are collected
- Note that memory controller actually could fool the processor by sending the exclusive reply as soon as it arrives; this is called eager-exclusive reply and is exercised by most Alpha servers (they can do it because they are not SC)
- Eager-exclusive reply complicates memory controller design: must hold on to the OTT entry until all inval acks are collected; must block subsequent interventions from proceeding and must block writeback for that line until all inval acks are collected (relaxing these two would violate write atomicity)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|