Module 7: Synchronization
  Lecture 14: Scalable Locks and Barriers
 


Tree Barrier

TreeBarrier ( pid , P) {
unsigned int i , mask;
for ( i = 0, mask = 1; (mask &
pid ) != 0; ++ i , mask <<= 1) {
while (!flag[ pid ][ i ]);
flag[ pid ][ i ] = 0;
}
if ( pid < (P - 1)) {
flag[ pid + mask][ i ] = 1;
while (!flag[ pid ][MAX- 1]);
flag[ pid ][MAX - 1] = 0;
}
for (mask >>= 1; mask > 0; mask >>= 1) {
flag[ pid - mask][MAX-1] = 1;
}

  • Convince yourself that this works
  • Take 8 processors and arrange them on leaves of a tree of depth 3
  • You will find that only odd nodes move up at every level during acquire (implemented in the first for loop)
  • The even nodes just set the flags (the first statement in the if condition): they bail out of the first loop with mask=1
  • The release is initiated by the last processor in the last for loop; only odd nodes execute this loop (7 wakes up 3, 5, 6; 5 wakes up 4; 3 wakes up 1, 2; 1 wakes up 0)
  • Each processor will need at most log (P) + 1 flags
  • Avoid false sharing: allocate each processor's flags on a separate chunk of cache lines
  • With some memory wastage (possibly worth it) allocate each processor's flags on a separate page and map that page locally in that processor's physical memory
    • Avoid remote misses in DSM multiprocessor
    • Does not matter in bus-based SMPs