Module 3: "Recap: Single-threaded Execution"
  Lecture 5: "Pipelining and Hazards"
 

 

Multi-cycle execution

  • Simplest implementation
    • Assume each of five stages takes a cycle
    • Five cycles to execute an instruction
    • After instruction i finishes you start fetching instruction i+1
    • Without “long latency” instructions CPI is 5
  • Alternative implementation
    • You could have a five times slower clock to accommodate all the logic within one cycle
    • Then you can say CPI is 1 excluding mult/div, mem op
    • But overall execution time really doesn’t change
  • What can you do to lower the CPI?

Pipelining

  • Simple observation
    • In the multi-cycle implementation when the ALU is executing, say, an add instruction the decoder is idle
    • Exactly one stage is active at any point in time
    • Wastage of hardware
  • Solution: pipelining
    • Process five instructions in parallel
    • Each instruction is in a different stage of processing
    • Each stage is called a pipeline stage
    • Need registers between pipeline stages to hold partially processed instructions (called pipeline latches): why?

More on pipelining

  • What do you gain?
    • Parallelism: called instruction-level parallelism (ILP)
    • Ideal CPI of 1 at the same clock speed as multi-cycle implementation: ideally 5 times reduction in execution time
  • What are the problems?
    • Slightly more complex
    • Control and data hazards
    • These hazards put a limit on available ILP