Objectives_template

	Goals of a lock algorithm Low latency: if no contender the lock should be acquired fast Low traffic: worst case lock acquire traffic should be low; otherwise it may affect unrelated transactions Scalability: Traffic and latency should scale slowly with the number of processors Low storage cost: Maintaining lock states should not impose unrealistic memory overhead Fairness: Ideally processors should enter CS according to the order of lock request (TS or TTS does not guarantee this) Ticket lock Similar to Bakery algorithm but simpler A nice application of fetch & inc Basic idea is to come and hold a unique ticket and wait until your turn comes Bakery algorithm failed to offer this uniqueness thereby increasing complexity Shared: ticket = 0, release_count = 0; Lock: fetch & inc reg1, ticket_addr Wait: lw reg2, release_count_addr /* while (release_count != ticket); / sub reg3, reg2, reg1 bnez reg3, Wait Unlock: addi reg2, reg2, 0x1 / release_count++ */ sw reg2, release_count_addr Initial fetch & inc generates O(P) traffic on bus-based machines (may be worse in DSM depending on implementation of fetch & inc) But the waiting algorithm still suffers from 0.5P2 messages asymptotically Researchers have proposed proportional backoff i.e. in the wait loop put a delay proportional to the difference between ticket value and last read release_count Latency and storage-wise better than Bakery Traffic-wise better than TTS and Bakery (I leave it to you to analyze the traffic of Bakery) Guaranteed fairness: the ticket value induces a FIFO queue Array-based lock Solves the O(P²) traffic problem The idea is to have a bit vector (essentially a character array if boolean type is not supported) Each processor comes and takes the next free index into the array via fetch & inc Then each processor loops on its index location until it becomes set On unlock a processor is responsible to set the next index location if someone is waiting Initial fetch & inc still needs O(P) traffic, but the wait loop now needs O(1) traffic Disadvantage: storage overhead is O(P) Performance concerns Avoid false sharing: allocate each array location on a different cache line Assume a cache line size of 128 bytes and a character array: allocate an array of size 128P bytes and use every 128th position in the array For distributed shared memory the location a processor loops on may not be in its local memory: on acquire it must take a remote miss; allocate P pages and let each processor loop on one bit in a page? Too much wastage; better solution: MCS lock (Mellor-Crummey & Scott) Correctness concerns Make sure to handle corner cases such as determining if someone is waiting on the next location (this must be an atomic operation) while unlocking Remember to reset your index location to zero while unlocking