## Lecture 17: Pipelining

- Today's topics:
  - 5-stage pipeline
  - Hazards
  - Data dependence handling with bypassing
  - Data dependence examples

# A 5-Stage Pipeline



Source: H&P textbook

### Example A

Unpipelined processor: 5ns + 0.2ns latch overhead

```
Cycle Time:
Clock Speed:
CPI: IPC:
Throughput:
```

• Pipelined processor: 5 stages, longest stage 1.2ns + 0.2ns latch

```
Cycle Time:
Clock Speed:
CPI: IPC:
Throughput:
```

#### **Quantitative Effects**

- As a result of pipelining:
  - Time in ns per instruction goes up
  - Each instruction takes more cycles to execute
  - But... average CPI remains roughly the same
  - Clock speed goes up
  - ➤ Total execution time goes down, resulting in lower average time per instruction
  - Under ideal conditions, speedup
    - = ratio of elapsed times between successive instruction completions
    - = number of pipeline stages = increase in clock speed

#### Hazards

- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction
- Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

## Conflicts/Problems

- I-cache and D-cache are accessed in the same cycle it helps to implement them separately
- Registers are read and written in the same cycle easy to deal with if register read/write time equals cycle time/2
- Instructions can't skip the DM stage, else conflict for RW
- Consuming instruction may have to wait for producer
- Branch target changes only at the end of the second stage
   -- what do you do in the meantime?

#### Structural Hazards

- Example: a unified instruction and data cache 

   stage 4 (MEM) and stage 1 (IF) can never coincide
- The later instruction and all its successors are delayed until a cycle is found when the resource is free → these are pipeline bubbles
- Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache, add more register ports)

#### Data Hazards

- An instruction produces a value in a given pipeline stage
- A subsequent instruction consumes that value in a pipeline stage
- The consumer may have to be delayed so that the time of consumption is later than the time of production

## Example 1 – No Bypassing

• Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2 $\rightarrow$ R3 and I2 is R3+R4 $\rightarrow$ R5 and I3 is R7+R8 $\rightarrow$ R9

| CYC-1 | CYC-2 | CYC-3 | CYC-4 | CYC-5 | CYC-6 | CYC-7 | CYC-8 |
|-------|-------|-------|-------|-------|-------|-------|-------|
| IF    |
| D/R   |
| ALU   |
| DM    |
| RW    |

## Example 1 – No Bypassing

• Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9

| CYC-1    | CYC-2     | CYC-3     | CYC-4     | CYC-5     | CYC-6     | CYC-7     | CYC-8    |    |
|----------|-----------|-----------|-----------|-----------|-----------|-----------|----------|----|
| IF<br>I1 | IF<br>I2  | IF<br>I3  | IF<br>I3  | IF<br>I3  | IF<br>I4  | IF<br>15  | IF       |    |
| D/R      | D/R<br>I1 | D/R<br>I2 | D/R<br>I2 | D/R<br>I2 | D/R<br>I3 | D/R<br>14 | D/R      |    |
| ALU      | ALU       | ALU<br>I1 | ALU       | ALU       | ALU<br>12 | ALU<br>13 | ALU      |    |
| DM       | DM        | DM        | DM<br>I1  | DM        | DM        | DM<br>I2  | DM<br>I3 |    |
| RW       | RW        | RW        | RW        | RW<br>I1  | RW        | RW        | RW<br>I2 | 10 |

### Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

| CYC-1 | CYC-2 | CYC-3 | CYC-4 | CYC-5 | CYC-6 | CYC-7 | CYC-8 |    |
|-------|-------|-------|-------|-------|-------|-------|-------|----|
| IF    |    |
| D/R   |    |
| ALU   |    |
| DM    |    |
| RW    | 11 |

#### Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.

| CYC-1    | CYC-2     | CYC-3              | CYC-4              | CYC-5     | CYC-6    | CYC-7    | CYC-8 |
|----------|-----------|--------------------|--------------------|-----------|----------|----------|-------|
| IF<br>11 | IF        | IF<br>12           | IF<br>L4           | IF        | IF       | IF       | IF    |
| l1       | 12        | 13                 | 14                 | 15        |          |          |       |
| D/R      | D/R<br>I1 | D/R<br>I2          | D/R<br>I3          | D/R<br>I4 | D/R      | D/R      | D/R   |
| ALU      | ALU       | L3 L3<br>ALU<br>I1 | L4 L3<br>ALU<br>I2 | ALU<br>I3 | ALU      | ALU      | ALU   |
| DM       | DM        | DM                 | DM<br>I1           | DM<br>I2  | DM<br>I3 | DM       | DM    |
| RW       | RW        | RW                 | RW                 | RW<br>I1  | RW<br>I2 | RW<br>I3 | RW    |

#### Problem 0

add \$1, \$2, \$3 add \$5, \$1, \$4

- Point of Production
- Point of Consumption

#### Without bypassing:

add \$1, \$2, \$3: IF DR AL DM RW

add \$5, \$1, \$4: IF DR DR DR AL DM RW

#### With bypassing:

add \$1, \$2, \$3: IF DR AL DM RW

add \$5, \$1, \$4: IF DR AL DM RW