Part IV
Data Path and Control
Slide 1
About This Presentation
This presentation is intended to support the use of the textbook
Computer Architecture: From Microprocessors to Supercomputers,
Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated
regularly by the author as part of his teaching of the upperdivision course ECE 154, Introduction to Computer Architecture,
at the University of California, Santa Barbara. Instructors can use
these slides freely in classroom teaching and for other
educational purposes. Any other use is strictly prohibited. ©
Behrooz Parhami
Edition
Released
Revised
Revised
Revised
Revised
First
July 2003
July 2004
July 2005
Mar. 2006
Feb. 2007
Slide 2
A Few Words About Where We Are Headed
Performance = 1 / Execution time
simplified to 1 / CPU execution time
CPU execution time = Instructions × CPI / (Clock rate)
Performance = Clock rate / ( Instructions ×
Try to achieve CPI = 1
with clock that is as
high as that for CPI > 1
designs; is CPI < 1
feasible? (Chap 15-16)
Design memory & I/O
structures to support
ultrahigh-speed CPUs
(chap 17-24)
Define an instruction set;
make it simple enough
to require a small number
of cycles and allow high
clock rate, but not so
simple that we need many
instructions, even for very
simple tasks (Chap 5-8)
CPI )
Design hardware
for CPI = 1; seek
improvements with
CPI > 1 (Chap 13-14)
Design ALU for
arithmetic & logic
ops (Chap 9-12)
Slide 3
IV Data Path and Control
Design a simple computer (MicroMIPS) to learn about:
• Data path – part of the CPU where data signals flow
• Control unit – guides data signals through data path
• Pipelining – a way of achieving greater performance
Topics in This Part
Chapter 13 Instruction Execution Steps
Chapter 14 Control Unit Synthesis
Chapter 15 Pipelined Data Paths
Chapter 16 Pipeline Performance Limits
Slide 4
13 Instruction Execution Steps
A simple computer executes instructions one at a time
• Fetches an instruction from the loc pointed to by PC
• Interprets and executes the instruction, then repeats
Topics in This Chapter
13.1 A Small Set of Instructions
13.2 The Instruction Execution Unit
13.3 A Single-Cycle Data Path
13.4 Branching and Jumping
13.5 Deriving the Control Signals
13.6 Performance of the Single-Cycle Design
Slide 5
13.1 A Small Set of Instructions
R
I
31
op
25
rs
20
rt
15
rd
10
sh
fn
5
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
Opcode
Source 1
or base
Source 2
or dest’n
Destination
Unused
Opcode ext
imm
Operand / Offset, 16 bits
J
jta
Jump target address, 26 bits
inst
Instruction, 32 bits
Fig. 13.1
MicroMIPS instruction formats and naming of the various fields.
We will refer to this diagram later
Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor)
Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)
Two I-format memory access instructions (lw, sw)
Three I-format conditional branch instructions (bltz, beq, bne)
Four unconditional jump instructions (j, jr, jal, syscall)
Slide 6
0
The MicroMIPS
Instruction Set
Copy
Arithmetic
Logic
Memory access
Control transfer
Table 13.1
Instruction
Usage
Load upper immediate
Add
Subtract
Set less than
Add immediate
Set less than immediate
AND
OR
XOR
NOR
AND immediate
OR immediate
XOR immediate
Load word
Store word
Jump
Jump register
Branch less than 0
Branch equal
Branch not equal
Jump and link
System call
lui
rt,imm
add
rd,rs,rt
sub
rd,rs,rt
slt
rd,rs,rt
addi rt,rs,imm
slti rd,rs,imm
and
rd,rs,rt
or
rd,rs,rt
xor
rd,rs,rt
nor
rd,rs,rt
andi rt,rs,imm
ori
rt,rs,imm
xori rt,rs,imm
lw
rt,imm(rs)
sw
rt,imm(rs)
j
L
jr
rs
bltz rs,L
beq
rs,rt,L
bne
rs,rt,L
jal
L
syscall
op fn
15
0
0
0
8
10
0
0
0
0
12
13
14
35
43
2
0
1
4
5
3
0
Slide 7
32
34
42
36
37
38
39
8
12
13.2 The Instruction Execution Unit
beq,bne
syscall
R
31
I
op
25
rs
20
15
rd
10
sh
fn
5
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
Opcode
Source 1
or base
Source 2
or dest’n
Destination
Unused
Opcode ext
0
imm
Operand / Offset, 16 bits
Next addr
jta
bltz,jr
J
jta
Jump target address, 26 bits
inst
j,jal
Instruction, 32 bits
rs,rt,rd
PC
Instr
cache
rt
(rs)
Reg
file
inst
12 A/L,
lui,
lw,sw
ALU
22 instructions
Address
Data
Data
cache
(rt)
imm
op fn
Control
Fig. 13.2
Abstract view of the instruction execution unit for MicroMIPS.
For naming of instruction fields, see Fig. 13.1.
Slide 8
13.3 A Single-Cycle Data Path
Incr PC
Next addr
jta
Next PC
(PC)
PC
Instr
cache
rs
rt
inst
rd
31
imm
op
Br&Jump
Instruction fetch
Fig. 13.3
ALUOvfl
0
1
2
Register
writeback
(rs)
Ovfl
Reg
file
ALU
(rt)
/
16
ALU
out
Data
cache
Data
out
Data
in
Func
0
32
SE / 1
Data
addr
0
1
2
Register input
fn
RegDst
RegWrite
Reg access / decode
ALUSrc
ALUFunc
ALU operation
DataRead
RegInSrc
DataWrite
Data access
Key elements of the single-cycle MicroMIPS data path.
Slide 9
Const′Var
Shift function
Constant
5
amount
0
Amount
5
1
5
Variable
amount
2
00
01
10
11
No shift
Logical left
Logical right
Arith right
Shifter
Function
class
32
5 LSBs
Shifted y
x
c0
32
Adder
y
32
k
/
c 31
imm
x±y
0 or 1
00
01
10
11
Shift
Set less
Arithmetic
Logic
2
0
1
Shorthand
symbol
for ALU
s
MSB
32
2
32
An ALU for
MicroMIPS
lui
Control
c 32
3
x
Func
Add′Sub
s
ALU
Logic
unit
AND
OR
XOR
NOR
00
01
10
11
y
32input
NOR
Ovfl
Zero
2
Logic function
Zero
Ovfl
Fig. 10.19 A multifunction ALU with 8 control signals (2 for function class,
1 arithmetic, 3 shift, 2 logic) specifying the operation.
Slide 10
13.4 Branching and Jumping
Update
options
for PC
(PC)31:2 + 1
(PC)31:2 + 1 + imm
(PC)31:28 | jta
(rs)31:2
SysCallAddr
Default option
When instruction is branch and condition is met
When instruction is j or jal
When the instruction is jr
Start address of an operating system routine
Lowest 2 bits of
PC always 00
IncrPC
/
30
BrTrue
/
30
Adder
c in
0
1
2
3
NextPC
/
30
PCSrc
Fig. 13.4
/
30
/
30
/
30
/
30
4 MSBs 1
/
30
Branch
condition
checker
/
32
30
MSBs
SE
/
30
/
32
/
30
4
16
imm
MSBs
/
26
(rt)
(rs)
(PC)31:2
jta
SysCallAddr
BrType
Next-address logic for MicroMIPS (see top part of Fig. 13.3).
Slide 11
13.5 Deriving the Control Signals
Table 13.2 Control signals for the single-cycle MicroMIPS implementation.
Control signal
Reg
file
ALU
Data
cache
Next
addr
0
1
2
3
RegWrite
Don’t write
Write
RegDst1, RegDst0
rt
rd
$31
RegInSrc1, RegInSrc0
Data out
ALU out
IncrPC
ALUSrc
(rt )
imm
Add′ Sub
Add
Subtract
LogicFn1, LogicFn0
AND
OR
XOR
NOR
FnClass1, FnClass0
lui
Set less
Arithmetic
Logic
DataRead
Don’t read
Read
DataWrite
Don’t write
Write
BrType1, BrType0
No branch
beq
bne
bltz
PCSrc1, PCSrc0
IncrPC
jta
(rs)
SysCallAddr
Slide 12
00
01
10
11
00
01
10
0
0
FnClass
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
PCSrc
0
1
1
0
1
00
10
10
01
10
01
11
11
11
11
11
11
11
10
10
BrType
10 10
1
0
0
0
1
1
0
0
0
0
1
1
1
1
1
LogicFn
01
01
01
01
01
01
01
01
01
01
01
01
01
00
Add’Sub
00
01
01
01
00
00
01
01
01
01
00
00
00
00
DataW rite
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
0
DataRead
001111
000000 100000
000000 100010
000000 101010
001000
001010
000000 100100
000000 100101
000000 100110
000000 100111
001100
001101
001110
100011
101011
000010
000000 001000
000001
000100
000101
000011
000000 001100
ALUSrc
fn
RegInSrc
Load upper immediate
Add
Subtract
Set less than
Add immediate
Set less than immediate
AND
OR
XOR
NOR
AND immediate
OR immediate
XOR immediate
Load word
Store word
Jump
Jump register
Branch on less than 0
Branch on equal
Branch on not equal
Jump and link
System call
op
RegDst
Table 13.3
Instruction
RegWrite
Control
Signal
Settings
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
01
10
00
00
00
01
11
11
01
10
00
Slide 13
Control Signals in the Single-Cycle Data Path
Incr PC
Next addr
jta
Next PC
(PC)
PC
Instr
cache
001111
000000
Br&Jump
BrType
Fig. 13.3
0
1
2
rd
31
(rs)
Ovfl
Reg
file
ALU
(rt)
/
16
imm
op
00 00
00 00
rs
rt
inst
lui
slt
PCSrc
ALUOvfl
ALU
out
Register input
fn
00
01
1
1
RegDst
RegWrite
010101
1
0
ALUSrc
x xx 00
1 xx 01
ALUFunc
Data
cache
Data
out
Data
in
Func
0
32
SE / 1
Data
addr
0
0
0
0
0
1
2
01
01
DataRead
RegInSrc
DataWrite
Add′ Sub LogicFn FnClass
Key elements of the single-cycle MicroMIPS data path.
Slide 14
0
3
4
5
bltzInst
jInst
jalInst
beqInst
bneInst
8
addiInst
1
2
10
sltiInst
12
13
14
15
andiInst
oriInst
xoriInst
luiInst
35
lwInst
43
63
Fig. 13.5
/6
RtypeInst
0
8
fn Decoder
1
fn
/6
op Decoder
Instruction
Decoding
op
jrInst
12
syscallInst
32
addInst
34
subInst
36
37
38
39
andInst
orInst
xorInst
norInst
42
sltInst
swInst
63
Instruction decoder for MicroMIPS built of two 6-to-64 decoders.
Slide 15
Control Signal Generation
Auxiliary signals identifying instruction classes
arithInst = addInst ∨subInst ∨sltInst ∨addiInst ∨sltiInst
logicInst = andInst ∨orInst ∨xorInst ∨norInst ∨andiInst ∨oriInst ∨xoriInst
immInst = luiInst ∨addiInst ∨sltiInst ∨andiInst ∨oriInst ∨xoriInst
Example logic expressions for control signals
RegWrite = luiInst ∨arithInst ∨logicInst ∨lwInst ∨jalInst
addInst
subInst
jInst
ALUSrc = immInst ∨lwInst ∨swInst
Add′ Sub = subInst ∨sltInst ∨sltiInst
DataRead = lwInst
PCSrc0 = jInst ∨jalInst ∨syscallInst
.
.
.
.
Control
.
.
sltInst
Slide 16
Putting It All Together
Fig. 13.4
/
30
IncrPC
/
30
/
30
Adder
0
1
2
3
/
30
/
30
/
30
/
30
/
30
/
30
1
4 MSBs
/
32
/
32
/
30
4
16
imm
MSBs
Const′Var
(rt)
(rs)
/
26
0
Amount
5
1
5
Variable
amount
2
00
01
10
11
Function
class
imm
Shifted y
x
jta
Adder
BrType
0 or 1
c0
32
y
No shift
Logical left
Logical right
Arith right
Shifter
5 LSBs
(PC)31:2
SysCallAddr
PCSrc
Shift function
Cons tant
5
amount
32
30
MSBs
SE
c in
NextPC
Branch
condition
checker
BrTrue
Fig. 10.19
32
k
/
c
c 32 31
x±y
lui
Shift
Set less
Arithmetic
Logic
2
0
1
Shortha
symb
for AL
s
MSB
32
2
32
00
01
10
11
Cont
3
x
Fun
Add′Sub
A
Incr PC
Next addr
jta
Next PC
(PC)
PC
Instr
cache
Logic
unit
Fig. 13.3
rs
rt
inst
rd
31
imm
op
AND
OR
XOR
NOR
ALUOvfl
0
1
2
00
01
10
11
2
Logic function
(rs)
Ovfl
Reg
file
ALU
(rt)
/
16
0
32
SE / 1
Func
ALU
out
Data
addr
Data
in
Data
cache
Data
out
Zero
Ovfl
addInst
subInst
jInst
0
1
2
Register input
fn
y
32input
NOR
.
.
.
.
Control
.
.
sltInst
Br&Jump
RegDst
RegWrite
ALUSrc
ALUFunc
DataRead
RegInSrc
DataWrite
Slide 17
O
Zero
13.6 Performance of the Single-Cycle Design
An example combinational-logic data path to compute z := (u + v)(w – x) / y
u
Add/Sub
latency
2 ns
Multiply
latency
6 ns
Divide
latency
15 ns
+
v
w
Note that the divider gets its
correct inputs after ≅ 9 ns,
but this won’t cause a problem
if we allow enough total time
×
−
Total
latency
23 ns
/
z
x
y
Beginning with inputs u, v, w, x, and y
stored in registers, the entire computation
can be completed in ≅ 25 ns, allowing 1
ns each for register readout and write
Slide 18
Performance Estimation for Single-Cycle MicroMIPS
Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz
R-type 44%
6 ns
Load
24%
8 ns
Store
12%
7 ns
Branch 18%
5 ns
Jump
2%
3 ns
Weighted mean ≅ 6.36 ns
ALU-type
P
C
Load
P
C
Store
P
C
Branch
P
C
Jump
P
C
(and jr)
(except
jr & jal)
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Fig. 13.6 The MicroMIPS data path unfolded (by depicting the register write
step as a separate block) so as to better visualize the critical-path latencies.
Slide 19
How Good is Our Single-Cycle Design?
Clock rate of 125 MHz not impressive
How does this compare with
current processors on the market?
Not bad, where latency is concerned
Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz
A 2.5 GHz processor with 20 or so pipeline stages has a latency of about
0.4 ns/cycle × 20 cycles = 8 ns
Throughput, however, is much better for the pipelined processor:
Up to 20 times better with single issue
Perhaps up to 100 times better with multiple issue
Slide 20
14 Control Unit Synthesis
The control unit for the single-cycle design is memoryless
• Problematic when instructions vary greatly in complexity
• Multiple cycles needed when resources must be reused
Topics in This Chapter
14.1 A Multicycle Implementation
14.2 Choosing the Clock Cycle
14.3 The Control State Machine
14.4 Performance of the Multicycle Design
14.5 Microprogramming
14.6 Exception Handling
Slide 21
14.1 A Multicycle Implementation
Clock
Time
needed
Time
allotted
Instr 1
Instr 2
Instr 3
Instr 4
Clock
Time
needed
Time
allotted
3 cycles
5 cycles
3 cycles
4 cycles
Instr 1
Instr 2
Instr 3
Instr 4
Fig. 14.1
Time
saved
Single-cycle versus multicycle instruction execution.
Slide 22
A Multicycle Data Path
Inst Reg
PC
x Reg
jta
Address
rs,rt,rd
imm
Cache
(rs)
z Reg
Reg
file
ALU
(rt)
Data
Data Reg
op
y Reg
fn
Control
Fig. 14.2
Abstract view of a multicycle instruction execution unit for
MicroMIPS. For naming of instruction fields, see Fig. 13.1.
Slide 23
Multicycle Data Path with Control Signals Shown
Three major changes relative to
the single-cycle data path:
26
/
1. Instruction & data
caches combined
Corrections are
shown in red
Inst Reg
4 MSBs
rt
0
rd 1
31 2
Cache
Data Reg
PCWrite
MemWrite
MemRead
Fig. 14.3
path.
op
Reg
file
IRWrite
(rt)
imm 16
/
fn
32 y Reg
SE /
RegInSrc
RegDst
ALUZero
x Mux
ALUOvfl
0
Zero
z Reg
1
Ovfl
(rs)
0
12
Data
0
1
SysCallAddr
rs
PC
Inst′Data
30
/
3. Registers added for
jta intercycle data x Reg
Address
0
1
2. ALU performs double duty
for address calculation
RegWrite
y Mux
4
0
1
2
×4 3
ALUSrcX
30
×4
ALU
0
1
2
3
Func
ALU out
ALUFunc
ALUSrcY
PCSrc
JumpAddr
Key elements of the multicycle MicroMIPS data
Slide 24
14.2 Clock Cycle and Control Signals
Table 14.1
Program
counter
Cache
Register
file
ALU
Control signal
0
1
2
3
JumpAddr
jta
SysCallAddr
PCSrc1, PCSrc0
Jump addr
x reg
z reg
ALU out
PCWrite
Don’t write
Write
Inst′ Data
PC
z reg
MemRead
Don’t read
Read
MemWrite
Don’t write
Write
IRWrite
Don’t write
Write
RegWrite
Don’t write
Write
RegDst1, RegDst0
rt
rd
$31
RegInSrc1, RegInSrc0
Data reg
z reg
PC
ALUSrcX
PC
x reg
ALUSrcY1, ALUSrcY0
4
y reg
imm
4 × imm
Add′ Sub
Add
Subtract
LogicFn1, LogicFn0
AND
OR
XOR
NOR
FnClass1, FnClass0
lui
Set less
Arithmetic
Logic
Slide 25