Objectives_template

	Centralized Barrier How fast is it? Assume that the program is perfectly balanced and hence all processors arrive at the barrier at the same time Latency is proportional to P due to the critical section (assume that the lock algorithm exhibits at most O(P) latency) The amount of traffic of acquire section (the CS) depends on the lock algorithm; after everyone has settled in the waiting loop the last processor will generate a BusRdX during release (flag write) and others will subsequently generate BusRd before releasing: O(P) Scalability turns out to be low partly due to the critical section and partly due to O(P) traffic of release No fairness in terms of who exits first Tree Barrier Does not need a lock, only uses flags Arrange the processors logically in a binary tree (higher degree also possible) Two siblings tell each other of arrival via simple flags (i.e. one waits on a flag while the other sets it on arrival) One of them moves up the tree to participate in the next level of the barrier Introduces concurrency in the barrier algorithm since independent subtrees can proceed in parallel Takes log(P) steps to complete the acquire A fixed processor starts a downward pass of release waking up other processors that in turn set other flags Shows much better scalability compared to centralized barriers in DSM multiprocessors; the advantage in small bus-based systems is not much, since all transactions are any way serialized on the bus; in fact the additional log (P) delay may hurt performance in bus-based SMPs