|
Relaxed models
- Implementing SC requires complex hardware
- Is there an example that clearly shows the disaster of not implementing all these?
- Observe that cache coherence protocol is orthogonal
- But such violations are rare
- Does it make sense to invest so much time (for verification) and hardware (associative lookup logic in load queue)?
- Many processors today relax the consistency model to get rid of complex hardware and achieve some extra performance at the cost of making program reasoning complex
- P0: A=1; B=1; flag=1; P1: while (!flag); print A; print B;
- SC is too restrictive; relaxing it does not always violate programmers’ intuition
- Three attributes
- System specification: which orders are preserved and which are not; if all program orders are not preserved what support is provided (software and hardware) to enforce a particular order that the programmer wishes
- Programmer’s interface: set of rules, if followed, will lead to an execution as expected by the programmer; normally specified in terms of high-level language annotations and labels
- Translation mechanism: how to translate programmer’s annotations to hardware actions
- Let’s take a look at a few relaxed models: TSO, PSO, PC, WO/WC, RC, DC
Total store ordering
- Allows a read to bypass (i.e. commit before) an earlier incomplete write
- This essentially means a blocked store at the head of the ROB can be removed (but remains in write buffer) and subsequent instructions are allowed to commit bypassing the blocked store
- Can hide latency of write operations
- Note that this is the only allowed re-ordering
- Programmer’s intuition is preserved in most cases, but not always
- P0: A=1; flag=1; P1: while (!flag); print A; [same as SC]
- P0: A=1; B=1; P1: print B; print A; [same as SC]
- P0: A=1; print B; P1: B=1; print A; [violates SC]
- Implemented in many Sun UltraSPARC microprocessors
- How do I enforce SC in the last example if I really care?
- May be needed when porting this program from R10000 to UltraSPARC
- Must ensure that a read cannot bypass earlier writes
- Microprocessors provide “fence” instructions for this purpose
- SPARC v9 specification provides MEMBAR (memory barrier) instruction of different flavors
- Here we only need to use one of these flavors, namely, write-to-read fence just before the load instruction
- This fence will not allow graduation of load until all stores before it graduates
- If fence instruction is not available, substituting the read by a read-modify-write (e.g., ldstub in SPARC) also works
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|