Objectives_template

	On-chip TLP Current trend: Tight integration Minimize communication latency (data communication is the bottleneck) Since we have transistors Put multiple cores on chip (Chip multiprocessing) They can communicate via either a shared bus or switch-based fabric on-chip (can be custom designed and clocked faster) Or put support for multiple threads without replicating cores (Simultaneous multi-threading) Both choices provide a good cost/performance trade-off Economics Ultimately who controls what gets built? It is cost vs. performance trade-off Given a time budget (to market) and a revenue projection, how much performance can be afforded Normal trend is to use commodity microprocessors as building blocks unless there is a very good reason Reuse existing technology as much as possible Large-scale scientific computing mostly exploits message-passing machines (easy to build, less costly); even google uses same kind of architecture [use commodity parts] Small to medium-scale shared memory multiprocessors are needed in the commercial market (databases) Although large-scale DSMs (256 or 512 nodes) are built by SGI, demand is less Summary Parallel architectures will be ubiquitous soon Even on desktop (already we have SMT/HT, multi-core) Economically attractive: can build with COTS (commodity-off-the-shelf) parts Enormous application demand (scientific as well as commercial) More attractive today with positive technology and architectural trends Wide range of parallel architectures: SMP servers, DSMs, large clusters, CMP, SMT, CMT, … Today’s microprocessors are, in fact, complex parallel machines trying to extract ILP as well as TLP