Module 7: Synchronization
  Lecture 14: Scalable Locks and Barriers
 


Array-based Lock

  • Solves the O(P 2 ) traffic problem
  • The idea is to have a bit vector (essentially a character array if boolean type is not supported)
  • Each processor comes and takes the next free index into the array via fetch & inc
  • Then each processor loops on its index location until it becomes set
  • On unlock a processor is responsible to set the next index location if someone is waiting
  • Initial fetch & inc still needs O(P) traffic, but the wait loop now needs O(1) traffic
  • Disadvantage: storage overhead is O(P)
  • Performance concerns
    • Avoid false sharing: allocate each array location on a different cache line
    • Assume a cache line size of 128 bytes and a character array: allocate an array of size 128P bytes and use every 128 the position in the array
    • For distributed shared memory the location a processor loops on may not be in its local memory: on acquire it must take a remote miss; allocate P pages and let each processor loop on one bit in a page? Too much wastage; better solution: MCS lock (Mellor- Crummey & Scott)
  • Correctness concerns
    • Make sure to handle corner cases such as determining if someone is waiting on the next location (this must be an atomic operation) while unlocking
    • Remember to reset your index location to zero while unlocking