Module 16: "Software Distributed Shared Memory Multiprocessors"
  Lecture 36: "Software Distributed Shared Memory Multiprocessors"
 

Performance factors

  • Where does SDSM stand?
    • HLRC and multiple writer protocols do improve performance dramatically
    • But SDSM is still lagging behind its hardware counterpart by a considerable margin
    • The main bottlenecks are: false sharing, cost of protocol processing, time spent in taking page faults i.e. the interrupt overhead
    • As a result, coarse-grain sharing is very well suited
    • Also, synchronization does not scale well on SDSM because all primitives must be implemented with explicit messages
    • Suggestions: hardware support for diff processing in memory controller (e.g., page copy engine)? Dedicated hardware thread for protocol processing and capability to deliver interrupt from a protocol thread to kernel (partitioned contexts?)?

Arbitrary grain

  • Why not let the user specify which variables (or formally called “objects”) should be kept coherent
    • To each synchronization point attach the “objects” (nothing to do with OOP) for which write notices must be propagated (leads to “shared object space programs”)
    • If nothing is attached to a synchronization point just fall back to release consistency
    • The big advantage is that false sharing may disappear completely
    • Disadvantages: a careful analysis of the program is needed, an efficient run-time library must intercept all synchronization events and manage the attached objects
    • This is known as entry consistency
    • Same philosophy has been applied to page-based SVM also leading to scope consistency

Implementing ERC

  • Single writer
    • Simple scheme: maintain sharer list at the owner and transfer it with ownership to the next writer; at release send write notices to all sharers for all pages that the writer has written to since its last release
    • Problem1: Multiple invalidations to the same node
    • Solution1: Maintain a directory entry per page and store the sharer list there; releaser first consults the directory and then sends invalidations
    • Problem2: Invalidating copies more recent than the releaser’s copy (not a correctness issue, just a performance problem)
    • Solution2: Attach version number to each copy; increment version number on write; receiver applies invalidation only if its version number is lower than releaser’s; is it better with directory?
  • Single writer
    • When to collect the invalidation acknowledgments?
    • Conservative: wait for all acknowledgments immediately at release
    • Observation: following the same argument as LRC we can push the time to collect all acknowledgments until the next incoming acquire (the acquire will come to the last releaser because it probably has the dirty page with the synchronization variable)
    • This optimization allows the releaser to proceed past release while the acknowledgments are collected in background; again without hardware support, collection of each acknowledgment may need an interrupt
    • Under heavy contention the next acquire may immediately follow the release
  • Multiple writers
    • Doesn’t make sense to talk about sharing list unless the sharing list is kept coherent across all writers (this may require broadcasting read access faults to all owners)
    • Two ways to communicate write notices: broadcast write notices at release or use a directory to find sharers
    • How does a faulting processor obtain the diffs?
    • Two solutions: use a home node and releaser sends diffs to the home node or visit all “causal” releasers and apply diffs in appropriate order; order of diffs is very hard to decide and therefore, multiple writer ERC systems use updates instead of invalidations if no home (diffs are sent at release and are not demand-based); is the order okay now? Non-deterministic if not race-free
    • Update-based multiple writer ERC protocol is used in Munin
    • What about version numbers? Not helpful