Module 8: Memory Consistency Models and Case Studies of Multi-core
  Lecture 15: Memory Consistency Models and Case Studies of Multi-core
 


Power Consumption?

  • Hey, didn't I just make my power consumption roughly N-fold by putting N cores on the die?
    • Yes, if you do not scale down voltage or frequency
    • Usually CMPs are clocked at a lower frequency
  • Oops! My games run slower!
    • Voltage scaling happens due to smaller process technology
    • Overall, roughly cubic dependence of power on voltage or frequency
    • Need to talk about different metrics
  • Performance/Watt (same as reciprocal of energy)
  • More general, Performance k+1 /Watt (k > 0)
    • Need smarter techniques to further improve these metrics Online voltage/frequency scaling

ABCs of CMP

  • Where to put the interconnect?
    • Do not want to access the interconnect too frequently because these wires are slow
    • It probably does not make much sense to have the L1 cache shared among the cores: requires very high bandwidth and may necessitate a redesign of the L1 cache and surrounding load/store unit which we do not want to do; so settle for private L1 caches, one per core
    • Makes more sense to share the L2 or L3 caches
    • Need a coherence protocol at L2 interface to keep private L1 caches coherent: may use a high-speed custom designed snoopy bus connecting the L1 controllers or may use a simple directory protocol
    • An entirely different design choice is not to share the cache hierarchy at all (dual-core AMD and Intel): rids you of the on-chip coherence protocol, but no gain in communication latency