Module 15: "Memory Consistency Models"
  Lecture 35: "Release Consistency and Delayed Consistency"
 

Eager-exclusive reply

  • Programmers’ interface
    • P0: A=1; flag=1;  P1: while (!flag); print A;
    • Programmer thinks A will be printed as 1
    • A machine that implements eager-exclusive replies can produce a value of 0 (even if the processor is SC); how?
    • P0: A=1; WMB; flag=1; P1: while (!flag); print A;
    • Formally, at every release boundary (e.g., before UNLOCK, BARRIER, flag set) a WMB must be inserted; WMB will wait for all pending inval acks to be collected before allowing any further issue from store queue
    • Eager-exclusive replies essentially require the same treatment at the programmers’ interface as if RC allowed only write re-ordering (is it similar to PSO? No)
  • The write atomicity is anyway broken
    • P0: A=1; flag=1; P1: while(!flag); print A;
    • Why bother to complicate OTT by holding back writebacks and interventions?

Compilers’ job

  • Must be careful while carrying out code motion
    • Usually it is very hard to formally argue about correctness of a re-ordering
    • The compiled code must not follow a model that is weaker than whatever is advertised to the programmer
    • The underlying microprocessor normally supports a model that is at least as strong as whatever the compiler offers; it could be stronger, but cannot be weaker (that would be unfaithful to the programmer)
    • A piece of program compiled for RC will still work fine on an SC microprocessor, but will fail to exploit some of the optimizations done by the compiler (here working fine means it will produce an RC-compliant output and that is the programmers’ interface)
    • PC, PSO, TSO do not offer much opportunity for compiler optimization
  • Synchronized or properly labeled programs
    • A program that labels all synchronization operations with some directives is called a synchronized program
    • For locks and barriers it is easy to do; how to identify all event synchronizations through flags?
      • Often the programmer can label these manually
    • Formally, it is necessary to label all competing operations; two conflicting operations (i.e. to the same location and one is a write) from two different processes are said to be competing if they appear back-to-back in at least one possible SC execution
    • A program is synchronized if all competing operations are labeled as synchronization operations
    • Once this (hard part) is done, compiler inserts fence instructions at proper places depending on how relaxed the model is
  • Current state of automatically producing synchronized programs is quite sad
    • Requires enumerating all SC-compliant re-orderings
    • The existing analysis is either conservative or very expensive
    • So normally it becomes programmers’ job
    • One extreme would be to label all operations as competing; this will lead to performance poorer than SC (why?)
    • Sometimes a programmer may choose to leave some competing operations unlabeled (i.e. intentionally introduce data races); leads to better performance if either it does not compromise correctness (due to some peculiar program behavior) or it is okay to have non-deterministic outcome (e.g., red-black ordering)