dce
2013
COMPUTER ARCHITECTURE
CE2013
BK
TP.HCM
Faculty of Computer Science and
Engineering
Department of Computer Engineering
Vo Tan Phuong
/>CuuDuongThanCong.com
/>
dce
2013
Chapter 4
Single-cycle & Pipeline
Processor
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
2
Single-Cycle Processor Overview
Jump or Branch Target Address
30
30
30
Next
PC
Imm26
+1
PCSrc
30
00
2013
Imm16
Instruction
Memory
Rs 5
32
Instruction
0
m
u
x
PC
dce
Rt 5
Address
RA
RB
E
0
BusB
m
u
x
0
m
u
Rd x
1
RW
BusW
ALU result
zero
BusA
Registers
J, Beq, Bne
A
L
U
Data
Memory
Address
0
32
Data_out
Data_in
m 32
u
x
1
1
1
5
clk
func
Op
RegDst
ALUop
ALU
Ctrl
RegWrite ExtOp
ALUSrc
MemRead
MemWrite
MemtoReg
Main
Control
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
3
dce
2013
Exercise 1
Fill the value of the control signals for following instruction:
a. slt $t0,$s0,$zero
Reg
Dst
Reg
Write
Ext
Op
ALU
Src
Beq
Bne
J
Mem
Read
Mem
Write
Mem
toReg
1
1
x
0
0
0
0
0
0
0
J
Mem
Read
Mem
Write
Mem
toReg
b. bne $t0,$zero,exit_label
Reg
Dst
Reg
Write
CuuDuongThanCong.com
Ext
Op
ALU
Src
Computer Architecture – Chapter 4.2
Beq
Bne
/>
©2013, CE
4
dce
2013
Exercise 2
•
We wish to add the instruction jalr (jump and link
register) to the single-cycle datapath. Add any necessary
datapath and control signals and draw the result
datapath. Show the values of the control signals to
control the execution of the jalr instruction.
• The jump and link register instruction is described
below:
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
5
dce
2013
Exercise 2
• One solution:
(Comment: JReg means Jump Register; RA means: Return Address)
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
6
dce
2013
Exercise 2
• The main control signals for the JALR instruction are the
same for other R-type instructions, such as ADD and SUB.
These control signals are shown in the table below:
• The ALU Control signals for the JALR instruction are shown
below. JReg = 1 and RA = 1. ALUCtrl is a don't care
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
7
dce
2013
Exercise 3
We want to compare the performance of a single-cycle CPU design
with a multi-cycle CPU. Suppose we add the multiply and divide
instructions. The operation times are as follows:
o Instruction memory access time = 190 ps, Data memory access time = 190
ps
o Register file read access time = 150 ps, Register file write access = 150 ps
o ALU delay for basic instructions = 190 ps, ALU delay for multiply or divide =
550 ps
Ignore the other delays in the multiplexers, control unit, sign-extension, etc.
Assume the following instruction mix: 30% ALU, 15% multiply & divide, 15%
load, 15% store, 15% branch, and 10% jump.
a. What is the total delay for each instruction class and the clock cycle for the
single-cycle CPU design
b. Assume we fix the clock cycle to 200 ps for a multi-cycle CPU, what is the
CPI for each instruction class and the speedup over a fixed-length clock
cycle?
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
8
dce
2013
Exercise 3
a. Total delay for each instruction:
Clock cycle = max delay = 1040ps
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
9
dce
2013
Exercise 3
b. CPI for each instruction:
CPI for Basic ALU = 4 cycles
CPI for Multiply & Divide = 6 cycles (ALU takes 3 cycles)
CPI for Load = 5 cycles
CPI for Store = 4 cycles
CPI for Branch = 3 cycles
CPI for Jump = 2 cycles
Average CPI = 0.3 * 4 + 0.15 * 6 + 0.15 * 5 + 0.15 * 4 + 0.15 * 3 + 0.1 *
2 = 4.1
Speedup of multi-cycle over single-cycle = (1040 * 1) / (200 * 4.1) =
1.27
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
10
dce
2013
Exercise 4
• Identify all the RAW data dependencies in the following
code. Which dependencies are data hazards that will be
resolved by forwarding? Which dependencies are data
hazards that will cause a stall? Using a graphical
representation of the pipeline, show the forwarding paths
and stalled cycles if any.
add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
11
dce
2013
Exercise 4
• RAW dependencies:
add $3, $4, $2 and sub $5, $3, $1 (forwarding)
add $3, $4, $2 and lw $6, 200($3) (forwarding)
lw $6, 200($3) and add $7, $3, $6 (stall 1, forward)
add $3, $4, $2 and add $7, $3, $6 (from register)
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
12
dce
2013
Exercise 5
• We have a program of 10^6 instructions in the format of “lw, add,
lw, add,…”. The add instruction depends only on the lw instruction
right before it. The lw instruction also depends only on the add
instruction right before it. If this program is executed on the 5-stage
MIPS pipeline:
a. Without forwarding, what would be the actual CPI?
It takes 6 cycles on average to complete one LW and one ADD.
1 cycle (to complete LW) + 2 cycles (bubbles) + 1 cycle (to complete ADD) + 2
cycles (bubbles) = 6 cycles
So, it takes 6 cycles to complete 2 instructions
Average CPI = 6/2 = 3
b. With forwarding, what would be the actual CPI?
It takes only 3 cycles on average to to complete one LW and one ADD.
1 cycle (to complete LW) + 1 cycle (bubble) + 1 cycle (to complete ADD) = 3
cycles
So, it takes 3 cycles to complete 2 instructions
Average CPI = 3/2 = 1.5
CuuDuongThanCong.com
Computer Architecture – Chapter 4.2
/>
©2013, CE
13