Objectives_template

Module 16: "Software Distributed Shared Memory Multiprocessors"

Lecture 36: "Software Distributed Shared Memory Multiprocessors"

	Performance factors Where does SDSM stand? HLRC and multiple writer protocols do improve performance dramatically But SDSM is still lagging behind its hardware counterpart by a considerable margin The main bottlenecks are: false sharing, cost of protocol processing, time spent in taking page faults i.e. the interrupt overhead As a result, coarse-grain sharing is very well suited Also, synchronization does not scale well on SDSM because all primitives must be implemented with explicit messages Suggestions: hardware support for diff processing in memory controller (e.g., page copy engine)? Dedicated hardware thread for protocol processing and capability to deliver interrupt from a protocol thread to kernel (partitioned contexts?)? Arbitrary grain Why not let the user specify which variables (or formally called “objects”) should be kept coherent To each synchronization point attach the “objects” (nothing to do with OOP) for which write notices must be propagated (leads to “shared object space programs”) If nothing is attached to a synchronization point just fall back to release consistency The big advantage is that false sharing may disappear completely Disadvantages: a careful analysis of the program is needed, an efficient run-time library must intercept all synchronization events and manage the attached objects This is known as entry consistency Same philosophy has been applied to page-based SVM also leading to scope consistency Implementing ERC Single writer Simple scheme: maintain sharer list at the owner and transfer it with ownership to the next writer; at release send write notices to all sharers for all pages that the writer has written to since its last release Problem1: Multiple invalidations to the same node Solution1: Maintain a directory entry per page and store the sharer list there; releaser first consults the directory and then sends invalidations Problem2: Invalidating copies more recent than the releaser’s copy (not a correctness issue, just a performance problem) Solution2: Attach version number to each copy; increment version number on write; receiver applies invalidation only if its version number is lower than releaser’s; is it better with directory? Single writer When to collect the invalidation acknowledgments? Conservative: wait for all acknowledgments immediately at release Observation: following the same argument as LRC we can push the time to collect all acknowledgments until the next incoming acquire (the acquire will come to the last releaser because it probably has the dirty page with the synchronization variable) This optimization allows the releaser to proceed past release while the acknowledgments are collected in background; again without hardware support, collection of each acknowledgment may need an interrupt Under heavy contention the next acquire may immediately follow the release Multiple writers Doesn’t make sense to talk about sharing list unless the sharing list is kept coherent across all writers (this may require broadcasting read access faults to all owners) Two ways to communicate write notices: broadcast write notices at release or use a directory to find sharers How does a faulting processor obtain the diffs? Two solutions: use a home node and releaser sends diffs to the home node or visit all “causal” releasers and apply diffs in appropriate order; order of diffs is very hard to decide and therefore, multiple writer ERC systems use updates instead of invalidations if no home (diffs are sent at release and are not demand-based); is the order okay now? Non-deterministic if not race-free Update-based multiple writer ERC protocol is used in Munin What about version numbers? Not helpful