|
Goals of a lock algorithm
- Low latency: if no contender the lock should be acquired fast
- Low traffic: worst case lock acquire traffic should be low; otherwise it may affect unrelated transactions
- Scalability: Traffic and latency should scale slowly with the number of processors
- Low storage cost: Maintaining lock states should not impose unrealistic memory overhead
- Fairness: Ideally processors should enter CS according to the order of lock request (TS or TTS does not guarantee this)
Ticket lock
- Similar to Bakery algorithm but simpler
- A nice application of fetch & inc
- Basic idea is to come and hold a unique ticket and wait until your turn comes
- Bakery algorithm failed to offer this uniqueness thereby increasing complexity
Shared: ticket = 0, release_count = 0;
Lock: fetch & inc reg1, ticket_addr
Wait: lw reg2, release_count_addr /* while (release_count != ticket); */
sub reg3, reg2, reg1
bnez reg3, Wait
Unlock: addi reg2, reg2, 0x1 /* release_count++ */
sw reg2, release_count_addr
- Initial fetch & inc generates O(P) traffic on bus-based machines (may be worse in DSM depending on implementation of fetch & inc)
- But the waiting algorithm still suffers from 0.5P2 messages asymptotically
- Researchers have proposed proportional backoff i.e. in the wait loop put a delay proportional to the difference between ticket value and last read release_count
- Latency and storage-wise better than Bakery
- Traffic-wise better than TTS and Bakery (I leave it to you to analyze the traffic of Bakery)
- Guaranteed fairness: the ticket value induces a FIFO queue
Array-based lock
- Solves the O(P2) traffic problem
- The idea is to have a bit vector (essentially a character array if boolean type is not supported)
- Each processor comes and takes the next free index into the array via fetch & inc
- Then each processor loops on its index location until it becomes set
- On unlock a processor is responsible to set the next index location if someone is waiting
- Initial fetch & inc still needs O(P) traffic, but the wait loop now needs O(1) traffic
- Disadvantage: storage overhead is O(P)
- Performance concerns
- Avoid false sharing: allocate each array location on a different cache line
- Assume a cache line size of 128 bytes and a character array: allocate an array of size 128P bytes and use every 128th position in the array
- For distributed shared memory the location a processor loops on may not be in its local memory: on acquire it must take a remote miss; allocate P pages and let each processor loop on one bit in a page? Too much wastage; better solution: MCS lock (Mellor-Crummey & Scott)
- Correctness concerns
- Make sure to handle corner cases such as determining if someone is waiting on the next location (this must be an atomic operation) while unlocking
- Remember to reset your index location to zero while unlocking
|