Module 2: "Parallel Computer Architecture: Today and Tomorrow"
  Lecture 4: "Shared Memory Multiprocessors"
 

Exploiting TLP: NOW

  • Simplest solution: take the commodity boxes, connect them over gigabit ethernet and let them talk via messages
    • The simplest possible message-passing machine
    • Also known as Network of Workstations (NOW)
    • Normally PVM (Parallel Virtual Machine) or MPI (Message Passing Interface) is used for programming
    • Each processor sees only local memory
    • Any remote data access must happen through explicit messages (send/recv calls trapping into kernel)
  • Optimizations in the messaging layer are possible (user level messages, active messages)

Supercomputers

  • Historically used for scientific computing
  • Initially used vector processors
  • But uniprocessor performance gap of vector processors and microprocessors is narrowing down
    • Microprocessors now have heavily pipelined floating-point units, large on-chip caches, modern techniques to extract ILP
  • Microprocessor based supercomputers come in large-scale: 100 to 1000 (called massively parallel processors or MPPs)
  • However, vector processor based supercomputers are much smaller scale due to cost disadvantage
    • Cray finally decided to use Alpha µP in T3D

Exploiting TLP: Shared memory

  • Hard to build, but offers better programmability compared to message-passing clusters
  • The “conventional” load/store architecture continues to work
  • Communication takes place through load/store instructions
  • Central to design: a cache coherence protocol
    • Handling data coherency among different caches
  • Special care needed for synchronization