Module 11: "Synchronization"
  Lecture 23: "Barriers and Speculative Synchronization"
 

Hardware support

  • Read broadcast
    • Possible to reduce the number of bus transactions from P-1 to 1 in the best case
    • A processor seeing a read miss to flag location (possibly from a fellow processor) backs off and does not put its read miss on the bus
    • Every processor picks up the read reply from the bus and the release completes with one bus transaction
    • Needs special hardware/compiler support to recognize these flag addresses and resort to read broadcast

Hardware barrier

  • Useful if frequency of barriers is high
    • Need a couple of wired-AND bus lines: one for odd barriers and one for even barriers
    • A processor arrives at the barrier and asserts its input line and waits for the wired-AND line output to go HIGH
    • Not very flexible: assumes that all processors will always participate in all barriers
    • Bigger problem: what if multiple processes belonging to the same parallel program are assigned to each processor?
    • No SMP supports it today
    • However, possible to provide flexible hardware barrier support in the memory controller of DSM multiprocessors: memory controller can recognize accesses to special barrier counter or barrier flag, combine them in memory and reply to processors only when the barrier is complete (no retry due to failed lock)

Speculative synch.

  • Speculative synchronization
    • Basic idea is to introduce speculation in the execution of critical sections
    • Assume that no other processor will have conflicting data accesses in the critical section and hence don’t even try to acquire the lock
    • Just venture into the critical section and start executing
    • Note the difference between this and speculative execution of critical section due to speculation on the branch following SC: there you still contend for the lock generating network transactions
  • Martinez and Torrellas. In ASPLOS 2002.
  • Rajwar and Goodman. In ASPLOS 2002.
  • We will discuss Martinez and Torrellas

Why is it good?

  • In  many cases compiler/user inserts synchronization conservatively
    • Hard to know exact access pattern
    • The addresses accessed may depend on input
  • Take a simple example of a hash table
    • When the hash table is updated by two processes you really do not know which bins they will insert into
    • So you conservatively make the hash table access a critical section
    • For certain input values it may happen that the processes could actually update the hash table concurrently