|
Moore’s law
- The number of transistors on a die doubles every 18-24 months
- Exponential growth in available transistor count
- If transistor utilization is constant, this would lead to exponential performance growth; but life is slightly more complicated
- Wires don’t scale with transistor technology: wire delay becomes the bottleneck
- Short wires are good: dictates localized logic design
- But superscalar processors exercise a “centralized” control requiring long wires (or pipelined long wires)
- However, to utilize the transistors well, we need to overcome the memory wall problem
- To hide memory latency we need to extract more independent instructions i.e. more ILP
- Extracting more ILP directly requires more available in-flight instructions
- But for that we need bigger ROB which in turn requires a bigger register file
- Also we need to have bigger issue queues to be able to find more parallelism
- None of these structures scale well: main problem is wiring
- So the best solution to utilize these transistors effectively with a low cost must not require long wires and must be able to leverage existing technology: CMP satisfies these goals exactly (use existing processors and invest transistors to have more of these on-chip instead of trying to scale the existing processor for more ILP)
|
Power consumption?
- Hey, didn’t I just make my power consumption roughly N-fold by putting N cores on the die?
- Yes, if you do not scale down voltage or frequency
- Usually CMPs are clocked at a lower frequency
- Oops! My games run slower!
- Voltage scaling happens due to smaller process technology
- Overall, roughly cubic dependence of power on voltage or frequency
- Need to talk about different metrics
- Performance/Watt (same as reciprocal of energy)
- More general, Performancek+1/Watt (k > 0)
- Need smarter techniques to further improve these metrics
- Online voltage/frequency scaling
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|