|
On-chip TLP
- Current trend:
- Tight integration
- Minimize communication latency (data communication is the bottleneck)
- Since we have transistors
- Put multiple cores on chip (Chip multiprocessing)
- They can communicate via either a shared bus or switch-based fabric on-chip (can be custom designed and clocked faster)
- Or put support for multiple threads without replicating cores (Simultaneous multi-threading)
- Both choices provide a good cost/performance trade-off
Economics
- Ultimately who controls what gets built?
- It is cost vs. performance trade-off
- Given a time budget (to market) and a revenue projection, how much performance can be afforded
- Normal trend is to use commodity microprocessors as building blocks unless there is a very good reason
- Reuse existing technology as much as possible
- Large-scale scientific computing mostly exploits message-passing machines (easy to build, less costly); even google uses same kind of architecture [use commodity parts]
- Small to medium-scale shared memory multiprocessors are needed in the commercial market (databases)
- Although large-scale DSMs (256 or 512 nodes) are built by SGI, demand is less
Summary
- Parallel architectures will be ubiquitous soon
- Even on desktop (already we have SMT/HT, multi-core)
- Economically attractive: can build with COTS (commodity-off-the-shelf) parts
- Enormous application demand (scientific as well as commercial)
- More attractive today with positive technology and architectural trends
- Wide range of parallel architectures: SMP servers, DSMs, large clusters, CMP, SMT, CMT, …
- Today’s microprocessors are, in fact, complex parallel machines trying to extract ILP as well as TLP
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|