Module 16: "Software Distributed Shared Memory Multiprocessors"
  Lecture 36: "Software Distributed Shared Memory Multiprocessors"
 

Why SDSM?

  • Hardware DSM is hard to design
    • Must have tightly integrated communication assist and NI
    • The CA should probably be custom designed for performance
    • Expensive in terms of time to market and the amount of custom design in memory system
    • But still want to retain shared memory programming
  • Software DSM
    • Provides shared virtual memory (SVM) over message passing programs
    • Just take the commodity nodes, connect them over a commodity high-speed network, augment commodity OS with an SVM kernel, and port your shared memory programs to SVM
    • Coherence granularity is a page

SVM for dummy

  • Embed a coherence protocol in the page fault handler
    • On a page fault, figure out if the page is mapped on some other node
    • If yes, get a copy of the page and map it in local memory in some free page frame and return from interrupt
    • If no, swap it in from disk and map it as usual
    • If it was a page fault generated by a load, set only read permission in the PTE; subsequent write will generate another access fault and then you invalidate all copies in the system
    • Multiple nodes are allowed to have a virtual page mapped at different physical frames locally; thus the sharing really happens in the virtual address space and physical address space is private

SVM overheads

  • Performance factors
    • Every protocol invocation requires an interrupt and context switch
    • Messages are sent through message passing libraries as opposed to specialized NI
    • The entire protocol runs in software; there is no hardware support
    • Even remote requests interrupt local processes and pollute local caches due to protocol processing
    • The granularity of coherence is too big; causes unnecessary communication and false sharing
    • This last point was the major problem when such systems took off; attempts to limit false sharing and communication volume led to numerous innovations in SDSM coherence protocols

Use of RC

  • A good place to make use of relaxed models
    • With SC there is no other choice but to invalidate all sharers and wait for all acknowledgments on every write to a page; immediately the invalidated readers may proceed to bring the page back and performance will degrade sharply
    • SDSM systems invariably advertise RC or WO or some other relaxed model, but not SC
    • Under WO since all accesses between synchronization points can be re-ordered arbitrarily, the writer can hold back all write notices (i.e. invalidations) until that point
    • For RC this needs to be done only at release boundaries
    • Note how different the use of RC is from hardware DSM; there RC is used to hide write latency and invalidations are sent immediately; here RC is used to limit communication (close to delayed consistency)