Tải bản đầy đủ (.ppt) (80 trang)

Data path and control (kỹ THUẬT số SLIDE)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.65 MB, 80 trang )

Part IV
Data Path and Control

Slide 1


About This Presentation
This presentation is intended to support the use of the textbook
Computer Architecture: From Microprocessors to Supercomputers,
Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated
regularly by the author as part of his teaching of the upperdivision course ECE 154, Introduction to Computer Architecture,
at the University of California, Santa Barbara. Instructors can use
these slides freely in classroom teaching and for other
educational purposes. Any other use is strictly prohibited. ©
Behrooz Parhami
Edition

Released

Revised

Revised

Revised

Revised

First

July 2003


July 2004

July 2005

Mar. 2006

Feb. 2007

Slide 2


A Few Words About Where We Are Headed
Performance = 1 / Execution time

simplified to 1 / CPU execution time

CPU execution time = Instructions × CPI / (Clock rate)
Performance = Clock rate / ( Instructions ×
Try to achieve CPI = 1
with clock that is as
high as that for CPI > 1
designs; is CPI < 1
feasible? (Chap 15-16)
Design memory & I/O
structures to support
ultrahigh-speed CPUs
(chap 17-24)
 

Define an instruction set;

make it simple enough
to require a small number
of cycles and allow high
clock rate, but not so
simple that we need many
instructions, even for very
simple tasks (Chap 5-8)

CPI )

Design hardware
for CPI = 1; seek
improvements with
CPI > 1 (Chap 13-14)

Design ALU for
arithmetic & logic
ops (Chap 9-12)
Slide 3


IV Data Path and Control
Design a simple computer (MicroMIPS) to learn about:
• Data path – part of the CPU where data signals flow
• Control unit – guides data signals through data path
• Pipelining – a way of achieving greater performance
Topics in This Part
Chapter 13 Instruction Execution Steps
Chapter 14 Control Unit Synthesis
Chapter 15 Pipelined Data Paths

Chapter 16 Pipeline Performance Limits
Slide 4


13 Instruction Execution Steps
A simple computer executes instructions one at a time
• Fetches an instruction from the loc pointed to by PC
• Interprets and executes the instruction, then repeats
Topics in This Chapter
13.1 A Small Set of Instructions
13.2 The Instruction Execution Unit
13.3 A Single-Cycle Data Path
13.4 Branching and Jumping
13.5 Deriving the Control Signals
13.6 Performance of the Single-Cycle Design
Slide 5


13.1 A Small Set of Instructions
R
I

31

op

25

rs


20

rt

15

rd

10

sh

fn

5

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

Opcode


Source 1
or base

Source 2
or dest’n

Destination

Unused

Opcode ext

imm
Operand / Offset, 16 bits

J

jta
Jump target address, 26 bits

inst
Instruction, 32 bits

Fig. 13.1

MicroMIPS instruction formats and naming of the various fields.

We will refer to this diagram later
Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor)
Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)

Two I-format memory access instructions (lw, sw)
Three I-format conditional branch instructions (bltz, beq, bne)
Four unconditional jump instructions (j, jr, jal, syscall)
Slide 6

0


The MicroMIPS
Instruction Set

Copy

Arithmetic

Logic

Memory access

Control transfer

Table 13.1

Instruction

Usage

Load upper immediate
Add 
Subtract

Set less than
Add immediate 
Set less than immediate
AND
OR
XOR
NOR
AND immediate
OR immediate
XOR immediate
Load word
Store word
Jump 
Jump register
Branch less than 0
Branch equal
Branch not equal 
Jump and link
System call 

lui
rt,imm
add
rd,rs,rt
sub
rd,rs,rt
slt
rd,rs,rt
addi rt,rs,imm
slti rd,rs,imm

and
rd,rs,rt
or
rd,rs,rt
xor
rd,rs,rt
nor
rd,rs,rt
andi rt,rs,imm
ori
rt,rs,imm
xori rt,rs,imm
lw
rt,imm(rs)
sw
rt,imm(rs)
j
L
jr
rs
bltz rs,L
beq
rs,rt,L
bne
rs,rt,L
jal
L
syscall

op fn

15
0
0
0
8
10
0
0
0
0
12
13
14
35
43
2
0
1
4
5
3
0
Slide 7

32
34
42
36
37
38

39

8

12


13.2 The Instruction Execution Unit
beq,bne

syscall

R

31

I

op

25

rs

20

15

rd


10

sh

fn

5

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

Opcode

Source 1
or base

Source 2
or dest’n

Destination


Unused

Opcode ext

0

imm
Operand / Offset, 16 bits

Next addr
jta

bltz,jr

J

jta
Jump target address, 26 bits

inst

j,jal

Instruction, 32 bits

rs,rt,rd

PC

Instr

cache

rt

(rs)

Reg
file

inst

12 A/L,
lui,
lw,sw

ALU

22 instructions

Address
Data

Data
cache

(rt)
imm
op fn

Control


Fig. 13.2
Abstract view of the instruction execution unit for MicroMIPS.
For naming of instruction fields, see Fig. 13.1.
Slide 8


13.3 A Single-Cycle Data Path
Incr PC

Next addr
jta

Next PC
(PC)

PC

Instr
cache

rs
rt

inst
rd
31
imm
op


Br&Jump

Instruction fetch

Fig. 13.3

ALUOvfl

0
1
2

Register
writeback

(rs)
Ovfl

Reg
file

ALU
(rt)

/
16

ALU
out


Data
cache

Data
out

Data
in

Func

0
32
SE / 1

Data
addr

0
1
2

Register input

fn

RegDst
RegWrite

Reg access / decode


ALUSrc
ALUFunc

ALU operation

DataRead
RegInSrc
DataWrite

Data access

Key elements of the single-cycle MicroMIPS data path.
Slide 9


Const′Var

Shift function

Constant
5
amount

0

Amount

5


1

5

Variable
amount

2

00
01
10
11

No shift
Logical left
Logical right
Arith right

Shifter

Function
class

32
5 LSBs

Shifted y

x


c0

32

Adder
y

32

k
/

c 31

imm
x±y

0 or 1

00
01
10
11

Shift
Set less
Arithmetic
Logic


2
0
1

Shorthand
symbol
for ALU

s

MSB

32

2

32

An ALU for
MicroMIPS

lui

Control

c 32

3

x

Func

Add′Sub

s

ALU
Logic
unit
AND
OR
XOR
NOR

00
01
10
11

y

32input
NOR

Ovfl
Zero

2

Logic function


Zero

Ovfl

Fig. 10.19 A multifunction ALU with 8 control signals (2 for function class,
1 arithmetic, 3 shift, 2 logic) specifying the operation.
Slide 10


13.4 Branching and Jumping
Update
options
for PC

(PC)31:2 + 1
(PC)31:2 + 1 + imm
(PC)31:28 | jta
(rs)31:2
SysCallAddr

Default option
When instruction is branch and condition is met
When instruction is j or jal
When the instruction is jr
Start address of an operating system routine

Lowest 2 bits of
PC always 00
IncrPC


/
30

BrTrue
/
30

Adder

c in
0
1
2
3

NextPC

/
30

PCSrc

Fig. 13.4

/
30
/
30
/

30
/
30

4 MSBs 1

/
30

Branch
condition
checker

/
32
30
MSBs

SE
/
30

/
32

/
30

4
16

imm
MSBs

/
26

(rt)
(rs)

(PC)31:2
jta

SysCallAddr
BrType

Next-address logic for MicroMIPS (see top part of Fig. 13.3).
Slide 11


13.5 Deriving the Control Signals
Table 13.2 Control signals for the single-cycle MicroMIPS implementation.
Control signal

Reg
file

ALU
Data
cache
Next

addr

0

1

2

3

RegWrite

Don’t write

Write

RegDst1, RegDst0

rt

rd

$31

RegInSrc1, RegInSrc0

Data out

ALU out


IncrPC

 

ALUSrc

(rt )

imm

 

 

Add′ Sub

Add

Subtract

 

LogicFn1, LogicFn0

AND

OR

XOR


NOR

FnClass1, FnClass0

lui

Set less

Arithmetic

Logic

DataRead

Don’t read

Read

 

 

DataWrite

Don’t write

Write

 


 

BrType1, BrType0

No branch

beq

bne

bltz

PCSrc1, PCSrc0

IncrPC

jta

(rs)

SysCallAddr
Slide 12


00
01
10
11
00
01

10
0
0

FnClass

0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0

0

0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0

PCSrc

0
1
1
0
1


00
10
10
01
10
01
11
11
11
11
11
11
11
10
10

BrType

10 10

1
0
0
0
1
1
0
0
0
0

1
1
1
1
1

LogicFn

01
01
01
01
01
01
01
01
01
01
01
01
01
00

Add’Sub

00
01
01
01
00

00
01
01
01
01
00
00
00
00

DataW rite

1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0

0
0
1
0

DataRead

001111
000000 100000
000000 100010
000000 101010
001000
001010
000000 100100
000000 100101
000000 100110
000000 100111
001100
001101
001110
100011
101011
000010
000000 001000
000001
000100
000101
000011
000000 001100


ALUSrc

fn

RegInSrc

Load upper immediate
Add
Subtract
Set less than
Add immediate
Set less than immediate
AND
OR
XOR
NOR
AND immediate
OR immediate
XOR immediate
Load word
Store word
Jump
Jump register
Branch on less than 0
Branch on equal
Branch on not equal
Jump and link
System call

op


RegDst

Table 13.3

Instruction

RegWrite

Control
Signal
Settings

00
00
00
00
00
00
00
00
00
00
00
00
00
00
00

00

00
00
00
00
00
00
00
00
00
00
00
00
00
00
01
10
00
00
00
01
11

11
01
10
00

Slide 13



Control Signals in the Single-Cycle Data Path
Incr PC

Next addr
jta

Next PC
(PC)

PC

Instr
cache

001111
000000

Br&Jump

BrType

Fig. 13.3

0
1
2

rd
31


(rs)
Ovfl

Reg
file

ALU
(rt)

/
16

imm

op

00 00
00 00

rs
rt

inst

lui
slt

PCSrc

ALUOvfl


ALU
out

Register input

fn

00
01

1
1

RegDst
RegWrite

010101

1
0

ALUSrc

x xx 00
1 xx 01
ALUFunc

Data
cache


Data
out

Data
in

Func

0
32
SE / 1

Data
addr

0
0

0
0

0
1
2

01
01

DataRead

RegInSrc
DataWrite

Add′ Sub LogicFn FnClass

Key elements of the single-cycle MicroMIPS data path.
Slide 14


0

3
4
5

bltzInst
jInst
jalInst
beqInst
bneInst

8

addiInst

1
2

10


sltiInst

12
13
14
15

andiInst
oriInst
xoriInst
luiInst

35

lwInst

43
63

Fig. 13.5

/6

RtypeInst

0

8

fn Decoder


1

fn

/6

op Decoder

Instruction
Decoding

op

jrInst

12

syscallInst

32

addInst

34

subInst

36
37

38
39

andInst
orInst
xorInst
norInst

42

sltInst

swInst
63

Instruction decoder for MicroMIPS built of two 6-to-64 decoders.
Slide 15


Control Signal Generation
Auxiliary signals identifying instruction classes
arithInst = addInst ∨subInst ∨sltInst ∨addiInst ∨sltiInst
logicInst = andInst ∨orInst ∨xorInst ∨norInst ∨andiInst ∨oriInst ∨xoriInst
immInst = luiInst ∨addiInst ∨sltiInst ∨andiInst ∨oriInst ∨xoriInst

Example logic expressions for control signals
RegWrite = luiInst ∨arithInst ∨logicInst ∨lwInst ∨jalInst
addInst
subInst
jInst


ALUSrc = immInst ∨lwInst ∨swInst
Add′ Sub = subInst ∨sltInst ∨sltiInst
DataRead = lwInst
PCSrc0 = jInst ∨jalInst ∨syscallInst

.
.

.
.

Control
.

.
sltInst
Slide 16


Putting It All Together
Fig. 13.4
/
30

IncrPC

/
30


/
30

Adder

0
1
2
3

/
30

/
30
/
30
/
30
/
30

/
30

1

4 MSBs

/

32
/
32

/
30

4
16
imm
MSBs

Const′Var

(rt)
(rs)

/
26

0

Amount

5

1

5


Variable
amount

2

00
01
10
11

Function
class

imm

Shifted y

x

jta

Adder

BrType

0 or 1

c0

32


y

No shift
Logical left
Logical right
Arith right

Shifter

5 LSBs

(PC)31:2

SysCallAddr

PCSrc

Shift function

Cons tant
5
amount

32

30
MSBs

SE


c in

NextPC

Branch
condition
checker

BrTrue

Fig. 10.19

32

k
/

c
c 32 31

x±y

lui
Shift

Set less
Arithmetic
Logic


2
0
1

Shortha
symb
for AL

s

MSB

32

2

32

00
01
10
11

Cont
3

x

Fun


Add′Sub

A

Incr PC

Next addr
jta

Next PC
(PC)

PC

Instr
cache

Logic
unit

Fig. 13.3
rs
rt

inst
rd
31
imm
op


AND
OR
XOR
NOR

ALUOvfl

0
1
2

00
01
10
11

2

Logic function

(rs)
Ovfl

Reg
file

ALU
(rt)

/

16

0
32
SE / 1

Func

ALU
out

Data
addr
Data
in

Data
cache

Data
out

Zero

Ovfl

addInst
subInst
jInst


0
1
2

Register input

fn

y

32input
NOR

.
.

.
.

Control
.

.
sltInst

Br&Jump

RegDst
RegWrite


ALUSrc
ALUFunc

DataRead
RegInSrc
DataWrite

Slide 17

O
Zero


13.6 Performance of the Single-Cycle Design
An example combinational-logic data path to compute z := (u + v)(w – x) / y

u

Add/Sub
latency
2 ns

Multiply
latency
6 ns

Divide
latency
15 ns


+
v
w

Note that the divider gets its
correct inputs after ≅ 9 ns,
but this won’t cause a problem
if we allow enough total time

×


Total
latency
23 ns

/

z

x
y

Beginning with inputs u, v, w, x, and y
stored in registers, the entire computation
can be completed in ≅ 25 ns, allowing 1
ns each for register readout and write
Slide 18



Performance Estimation for Single-Cycle MicroMIPS
Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz
R-type 44%
6 ns
Load
24%
8 ns
Store
12%
7 ns
Branch 18%
5 ns
Jump
2%
3 ns
Weighted mean ≅ 6.36 ns

ALU-type


P
C

Load

P
C

Store

P
C

Branch

P
C

Jump

P
C

(and jr)

(except
jr & jal)

Not

used

Not
used

Not
used

Not
used

Not
used

Not
used

Not
used

Not
used

Not
used

Fig. 13.6 The MicroMIPS data path unfolded (by depicting the register write
step as a separate block) so as to better visualize the critical-path latencies.
Slide 19



How Good is Our Single-Cycle Design?
Clock rate of 125 MHz not impressive
How does this compare with
current processors on the market?
Not bad, where latency is concerned

Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz

A 2.5 GHz processor with 20 or so pipeline stages has a latency of about
0.4 ns/cycle × 20 cycles = 8 ns
Throughput, however, is much better for the pipelined processor:
Up to 20 times better with single issue
Perhaps up to 100 times better with multiple issue
Slide 20


14 Control Unit Synthesis

The control unit for the single-cycle design is memoryless
• Problematic when instructions vary greatly in complexity
• Multiple cycles needed when resources must be reused
Topics in This Chapter
14.1 A Multicycle Implementation
14.2 Choosing the Clock Cycle
14.3 The Control State Machine
14.4 Performance of the Multicycle Design
14.5 Microprogramming
14.6 Exception Handling
Slide 21


14.1 A Multicycle Implementation
Clock
Time
needed
Time
allotted

Instr 1

Instr 2

Instr 3

Instr 4

Clock
Time

needed
Time
allotted

3 cycles

5 cycles

3 cycles

4 cycles

Instr 1

Instr 2

Instr 3

Instr 4

Fig. 14.1

Time
saved

Single-cycle versus multicycle instruction execution.
Slide 22


A Multicycle Data Path

Inst Reg
PC

x Reg

jta

Address

rs,rt,rd
imm

Cache

(rs)

z Reg

Reg
file

ALU
(rt)

Data
Data Reg

op

y Reg


fn

Control

Fig. 14.2
Abstract view of a multicycle instruction execution unit for
MicroMIPS. For naming of instruction fields, see Fig. 13.1.
Slide 23


Multicycle Data Path with Control Signals Shown
Three major changes relative to
the single-cycle data path:
26
/

1. Instruction & data
caches combined
Corrections are
shown in red

Inst Reg

4 MSBs

rt
0
rd 1
31 2


Cache

Data Reg

PCWrite

MemWrite

MemRead

Fig. 14.3
path.

op

Reg
file

IRWrite

(rt)

imm 16
/
fn

32 y Reg
SE /


RegInSrc

RegDst

ALUZero
x Mux
ALUOvfl
0
Zero
z Reg
1
Ovfl

(rs)

0
12

Data

0
1

SysCallAddr

rs

PC

Inst′Data


30
/

3. Registers added for
jta intercycle data x Reg

Address

0
1

2. ALU performs double duty
for address calculation

RegWrite

y Mux
4
0
1
2
×4 3

ALUSrcX

30

×4


ALU

0
1
2
3

Func
ALU out

ALUFunc

ALUSrcY

PCSrc
JumpAddr

Key elements of the multicycle MicroMIPS data
Slide 24


14.2 Clock Cycle and Control Signals
Table 14.1

Program
counter

Cache

Register

file

ALU

Control signal

0

1

2

3

JumpAddr

jta

SysCallAddr

PCSrc1, PCSrc0

Jump addr

x reg

z reg

ALU out


PCWrite

Don’t write

Write

 

 

Inst′ Data

PC

z reg

 

 

MemRead

Don’t read

Read

 

 


MemWrite

Don’t write

Write

 

 

IRWrite

Don’t write

Write

 

 

RegWrite

Don’t write

Write

 

 


RegDst1, RegDst0

rt

rd

$31

 

RegInSrc1, RegInSrc0

Data reg

z reg

PC

 

ALUSrcX

PC

x reg

 

 


ALUSrcY1, ALUSrcY0

4

y reg

imm

4 × imm

Add′ Sub

Add

Subtract

 

 

LogicFn1, LogicFn0

AND

OR

XOR

NOR


FnClass1, FnClass0

lui

Set less

Arithmetic

Logic

Slide 25


×