Module 14: "Directory-based Cache Coherence"
  Lecture 29: "Basics of Directory"
 

Basics of directory

  • Theoretically speaking each directory entry should have a dirty bit and a bitvector of length P
    • On a read from processor k, if dirty bit is off read cache line from memory, send it to k, set bit[k] in vector; if dirty bit is on read owner id from vector (different interpretation of bitvector), send read intervention to owner, owner replies line directly to k (how?), sends a copy to home, home updates memory, directory controller sets bit[k] and bit[owner] in vector
    • On a write from processor k, if dirty bit is off send invalidations to all sharers marked in vector, wait for acknowledgments, read cache line from memory, send it to k, zero out vector and write k in vector, set dirty bit; if dirty bit on same as read, but now intervention is of readX type and memory does not write the line back, dirty bit is set and vector=k  

Directory organization

  • Centralized vs. distributed
    • Centralized directory helps to resolve many races, but becomes a bandwidth bottleneck
    • One solution is to provide a banked directory structure: with each memory bank associate its directory bank
    • But since memory is distributed, this essentially leads to distributed directory structure i.e. each node is responsible for holding the directory entries corresponding to the memory lines it is holding
    • Why did we decide to have a distributed memory organization instead of dance hall?

Is directory useful?

  • One drawback of directory
    • Before looking up the directory you cannot decide what to do (even if you start reading memory speculatively)
    • So directory introduces one level of indirection in every request that misses in processor’s cache hierarchy
    • Therefore, broadcast is definitely preferable over directory if the system can offer enough memory controller and router bandwidth to handle broadcast messages (network link bandwidth is normally not the bottleneck since most messages do not carry data; observe that you would never broadcast a reply); AMD Opteron adopted this scheme, but target is small scale
  • Directory is preferable
    • If number of sharers is small because in this case a broadcast would waste enormous amount of memory controller bandwidth