Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook and lecture slides/videos. Solutions should be uploaded as a single pdf file on Canvas. Show your solution steps so you receive partial credit for incorrect answers and we know you have understood the material. Don't just show us the final answer.
Every homework has an automatic penalty-free 1.5 day extension to accommodate any covid/family-related disruptions. In other words, try to finish your homework by Wednesday 1:25pm to keep up with the lecture content, but if necessary, you may take until Thursday 11:59pm.
Consider a 32-bit in-order pipeline that has the following stages. Note the many differences from the examples in class: a stage that converts CISC instructions to micro-ops, one stage to do register reads, one stage to do register writes, four stages to access the data memory, and three stages for the FP-ALU. For the questions below, assume that each CISC instruction is simple and is converted to a single micro-op.
Fetch | uOp Convert | Decode | Regread | IntALU | Regwrite | ||||
IntALU | Datamem1 | Datamem2 | Datamem3 | Datamem4 | Regwrite | ||||
FPALU1 | FPALU2 | FPALU3 | Regwrite |
After instruction fetch, the instruction goes through the micro-op conversion stage, a Decode stage where dependences are analyzed, and a Regread stage where input operands are read from the register file. After this, an instruction takes one of three possible paths. Int-adds go through the stages labeled "IntALU" and "Regwrite". Loads/stores go through the stages labeled "IntALU", "Datamem1", "Datamem2", "Datamem3", "Datamem4", and "Regwrite". FP-adds go through the stages labeled "FPALU1", "FPALU2", "FPALU3", and "Regwrite". Assume that the register file has an infinite number of write ports so stalls are never introduced because of structural hazards. How many stall cycles are introduced between the following pairs of successive instructions (i) for a processor with no register bypassing and (ii) for a processor with full bypassing?
Consider the following skeletal code segment, where the branch is taken 90% of the time and not-taken 10% of the time.
Consider a 10-stage in-order processor, where the instruction is fetched in the first stage, and the branch outcome is known after three stages. Estimate the average CPI of the processor under the following scenarios (assume that all stalls in the processor are branch-related and branches account for 15% of all executed instructions):
Consider an unpipelined processor where it takes 8ns to go through the circuits and 0.2ns for the latch overhead. Assume that the Point of Production and Point of Consumption in the unpipelined processor are separated by 4ns. Assume that one-third of all instructions do not introduce a data hazard and two-thirds of all instructions depend on their preceding instruction. What is the throughput of the processor (in BIPS) for (i) an unpipelined processor, (ii) a 10-stage pipeline, and (iii) a 20-stage pipeline.