Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook and lecture slides/videos. Solutions should be uploaded as a single pdf file on Canvas. Show your solution steps so you receive partial credit for incorrect answers and we know you have understood the material. Don't just show us the final answer.
Every homework has an automatic penalty-free 1.5 day extension to accommodate any covid/family-related disruptions. In other words, try to finish your homework by Wednesday 1:25pm to keep up with the lecture content, but if necessary, you may take until Thursday 11:59pm.
Consider a basic in-order pipeline with bypassing (one instruction in each pipeline stage in any cycle). The pipeline has been extended to handle FP add. Assume the following delays between dependent instructions (note: these assumptions are different from the examples discussed in class):
Below is the source code and default assembly code for a loop.
Source Code:
for (i=1000; i>0; i--) {
w[i] = x[i] * w[i];
}
Assembly Code:
Loop:
L.D F2, 0(R1) // Get w[i]
L.D F4, 0(R2) // Get x[i]
MUL.D F6, F2, F4 // Multiply two numbers
S.D F6, 0(R1) // Store the result into w[i]
DADDUI R1, R1, #-8 // Decrement R1
DADDUI R2, R2, #-8 // Decrement R2
BNE R1, R3, Loop // Check if we've reached the end of the loop
NOP
Consider the following tournament branch predictor that employs a selector with 64K entries (2-bit saturating counters). The selector picks a prediction out of either a global predictor (16-bit global history is XOR-ed with 16 bits of branch PC to index into a table of 3-bit saturating counters) or a local predictor (8 bits of branch PC index into level-1, 11 bits of local history from level-1 are concatenated with 4 other bits of branch PC to generate the index into level-2 that has 3-bit saturating counters). What is the total capacity of the entire branch prediction system?