|
Power Consumption?
- Hey, didn't I just make my power consumption roughly N-fold by putting N cores on the die?
- Yes, if you do not scale down voltage or frequency
- Usually CMPs are clocked at a lower frequency
- Oops! My games run slower!
- Voltage scaling happens due to smaller process technology
- Overall, roughly cubic dependence of power on voltage or frequency
- Need to talk about different metrics
- Performance/Watt (same as reciprocal of energy)
- More general, Performance k+1 /Watt (k > 0)
-
Need smarter techniques to further improve these metrics Online voltage/frequency scaling
ABCs of CMP
- Where to put the interconnect?
- Do not want to access the interconnect too frequently because these wires are slow
- It probably does not make much sense to have the L1 cache shared among the cores: requires very high bandwidth and may necessitate a redesign of the L1 cache and surrounding load/store unit which we do not want to do; so settle for private L1 caches, one per core
- Makes more sense to share the L2 or L3 caches
- Need a coherence protocol at L2 interface to keep private L1 caches coherent: may use a high-speed custom designed snoopy bus connecting the L1 controllers or may use a simple directory protocol
- An entirely different design choice is not to share the cache hierarchy at all (dual-core AMD and Intel): rids you of the on-chip coherence protocol, but no gain in communication latency
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|