Advanced Computer Architecture - Lecture 11: Computer hardware design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.46 MB, 48 trang )

CS 704
Advanced Computer Architecture

Lecture 11
Computer Hardware Design
(Pipeline and Instruction Level Parallelism)

Prof. Dr. M. Ashraf Chughtai

Today’s Topics
Recap Lecture 10
Structural Hazards
Data Hazards
Control Hazards

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

2

Recap: Lecture 10
Multi cycle datapath verses pipeline
datapath
Key components of pipeline data path
Performance enhancement due to pipeline
Introduction to hazards in pipelined

datapath

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

3

Structural Hazards
Attempt to use the same resource two
different ways at the same time, e.g.,
Single memory port is accessed for
instruction fetch and data read in the same
clock cycle would be a structural hazard
…. Example : next slide

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

4

Single Memory is a Structural Hazard
Time (clock cycles)

Instr 5

Mem

Reg

Mem

Reg

Mem

Reg

Mem

Reg

Mem

Reg

Mem

Reg

ALU

Instr 4

Reg

ALU

Instr 3

Mem

Reg

ALU

Instr 2

Mem

ALU

O
r
d
e
r

Instr 1 Load Mem Reg

ALU

I

n
s
t
r.

Mem

Reg

Two memory read operations in the 4th cycle:

The LOAD instruction accesses memory to read data and the
4th instruction fetched from the same memory
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

5

Single Memory is a Structural Hazard
Time (clock cycles)

Stall
Instr 4

Reg

Mem

Reg

Mem

Reg

Mem

Reg

Mem

Reg

ALU

Instr 3

Mem

ALU

ADD

Reg

Bubble

Instr 2

Mem

ALU

O
r
d
e
r

Instr 1 Load Mem Reg

ALU

I
n
s
t
r.

Mem

Reg

Insert stall (bubble) to avoid memory
structural hazard

MAC/VU-Advanced

Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

6

Structural Hazards
Structural hazard exists when
Single write port of register accessed for two
WB operations in same clock cycle –
this situation does not exist in 5-stage pipeline
But it may exist in 4 and 5 stage multi-cycle
pipeline
Explanation next…………………
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

7

Pipelining the Load Instruction
Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clock
1st lw Ifetch

Reg/Dec

2nd lw Ifetch
3rd lw

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

The five independent functional units in the pipeline
datapath are: Inst. Fetch, Dec/Reg. Rd, ALU for Exec, Data
Mem and Register File’s Write port for the Wr stage
Here, we have separate register’s read and write ports so
registers read and write is allowed at the same time
Each functional unit is used once
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

8

The Four Stages of R-type

Rtype

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Ifetch

Reg/Dec

Exec

Wr

R-type instruction does not access data memory,
so it only takes 4 clocks, or say 4 stages to
complete
Here, the ALU is used to operate on the register
operands
The result is written in to the register during WB
stage
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

9

Pipelining the R-type and Load
Instruction
Cycle 1 Cycle 2 Cycle 3 Cycle 4
Cycle 5 Cycle 6 Cycle 7 Cycle 8

Cycle 9

Clock
Rtype Ifetch
Rtype

Reg/Dec

Exec

Ifetch

Reg/Dec

Exec

Ifetch

Reg/Dec

Load

Ops! We have a problem!

Wr

Rtype Ifetch

Wr
Exec

Mem

Wr

Reg/Dec

Exec

Wr

Rtype Ifetch

Reg/Dec

Exec

Wr

We have pipeline conflict or structural hazard:
– Two instructions try to write to the register file at the
same time!
– Only one write port
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

10

Important Observation
Each functional unit can only be used once per
instruction
Each functional unit must be used at the same
stage for all instructions:
– Load uses Register File’s Write Port during its
5th stage
– R-type uses Register File’s Write Port during its
4th stage
Two possible solutions ………. Next
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

11

Solution 1: Insert “Bubble” into the Pipeline
Cycle 1 Cycle 2

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock
Ifetch
Load

Reg/Dec

Exec

Ifetch

Reg/Dec

Rtype Ifetch

Wr
Exec

Mem

Reg/Dec

Exec

Wr
Wr

Rtype Ifetch Reg/Dec Pipeline Exec
Rtype Ifetch
Bubble Reg/Dec
Ifetch

Wr
Exec
Reg/Dec

Wr

Exec

Insert a “bubble” into the pipeline to prevent 2 writes at the
same cycle
– The control logic can be complex.
– Lose instruction fetch and issue opportunity.
No instruction is started in Cycle 6!
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

12

Solution 2: Delay R-type’s Write by One
Cycle
Delay R-type’s register write
by one cycle:
– Now R-type instructions also use Reg File’s write port at Stage 5
– Mem stage is a NO-OP stage: nothing is being done.
1

2

Rtype Ifetch

Cycle 1 Cycle 2

Reg/Dec

3
Exec

4
Mem

5
Wr

Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock
Rtype Ifetch
Rtype

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Ifetch

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Wr

Reg/Dec

Exec

Mem

Load

Rtype Ifetch

Rtype Ifetch
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

Wr
13

Eliminating Structural Hazards?
Structural hazards can be eliminated or
minimized by either using the stall operation
or adding multiple functional units
Time
Program Flow

Load

IFetch Dcd

2nd Inst.

Exec

IFetch Dcd

Mem

WB

Exec

Mem

WB

Exec

Mem

3rd Inst

IFetch Dcd

4th Inst

stall

5th Inst.

MAC/VU-Advanced
Computer Architecture

IFetch Dcd

WB
Exec

IFetch Dcd
Lecture 11 –Computer Hardware
Design (5)

Mem
Exec

WB
Mem

WB

14

Example: Dual-port vs.
Single-port

Machine A: Dual ported memory
Machine B: Single ported memory, but its
pipelined implementation has a 1.05 times
faster clock rate
Ideal CPI = 1 for both
Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1)

x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x

1.05

Stall degrades the performance
Here, is an example:
Suppose data reference instructions constitute 40% of mix,
and processor with structural hazard has clock rate 1.05
times higher than the processor without hazard
The Average Instruction time = CPI x Clock Cycle Time
= (1 + 0.4 x 1) x clock cycle time Ideal / 1.05
= 1.4 / 1.05 x clock cycle time Ideal
= 1.3 x clock cycle time Ideal

The processor without structural hazard is
1.3 times faster
than with Structural hazard
MAC/VU-Advanced
Lecture 11 –Computer Hardware
Computer Architecture

Design (5)

16

Additional Functional Units increase cost
Memory structural hazard is removed by

- using two Cache memory units:
Instruction memory
Data Memory
Two write ports in register file allow 4-stage
and 5-stage pipe mix

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

17

Data Hazards
Attempt to use item before it is ready; e.g.,
One sock of pair in dryer and one in
washer; can’t fold until get sock from
washer through dryer
Instruction depends on result of prior
instruction still in the pipeline
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

18

Data Hazards
Pipelining changes the relative timing of
instruction by overlapping their execution
This overlap introduces the Data and Control
Hazard
Data Hazard occurs when order of operand
read/write is changed viz-z-viz sequential access
to the operands, which gives rise to data
dependency
Let us consider an example ……
MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

19

Example Data Hazard on R1
Add

R1 ,R2,R3

Sub

R4, R1 ,R3

And

R6, R1 ,R7

Or

R8, R1 ,R9

Xor
MAC/VU-Advanced
Computer Architecture

R10, R1 ,R11
Lecture 11 –Computer Hardware
Design (5)

20

Data Hazard due to Dependencies backwards
in time are hazards
Time (clock cycles)
ME W
DmM RegB

Im

Reg

ALU

Dm

Im

Reg

ALU

Dm

Im

Reg

ALU

Dm

Im

Reg

ALU

O Or R8,R1,R9
r
d Xor R10,R1,R11
e
r

Im

ID/R
Reg
F

ALU

Add R1,R2,R3

I
n
s Sub R4,R1,R3
t
r. And R6,R1,R7

IF

E
X

Reg
Reg
Reg
Dm

Reg

Add instruction provide its results to sub after 3 cycles, to
and after 2 and to Or after 1 clock cycles

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

21

Data Hazard Solution #1 - Stall
stall cycles after next IF and
decode, before the register
read
Time (clock cycles)

Stall

Stall

Stall

sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

ALU

Reg

Reg

ALU

Dm

Im

ALU

O
r
d
e
r

WB

ALU

I
n
s
t
r.

add r1,r2,r3

ID/RF EX MEM
ALU

IF

xor r10,r1,r11

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware

Design (5)

Reg
Reg
Reg
Dm

Reg

22

XOR: No Data Hazard here, as register is read after
being written
Time (clock cycles)

Reg

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

ALU

or r8,r1,r9

Dm

ALU

and r6,r1,r7

Reg

ALU

O
r
d
e
r

sub r4,r1,r3

WB

ALU

I
n
s
t
r.

MEM

ALU

IF

add r1,r2,r3

EX

ID/RF

Im

xor r10,r1,r11

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

Reg
Reg
Reg
Dm

Reg

23

Data Hazard Solution - Forwarding
“Forward” result from one stage to another
From the EX/MEM pipeline register to Sub ALU stage,
MEM/WB pipeline register to AND ALU stage
Time (clock cycles)
IF

Dm

Im

Reg

Dm

Im

Reg

Dm

Im

Reg

ALU

or r8,r1,r9

Reg

ALU

and r6,r1,r7

Im

Dm

ALU

sub r4,r1,r3

Reg

Reg

ALU

O
r
d
e
r

Im

WB

ALU

I
n
s
t
r.

add r1,r2,r3

ID/RF EX MEM

xor r10,r1,r11

MAC/VU-Advanced
Computer Architecture

No forwarding
As register is written in
the first half and read in
the second half cycle

Lecture 11 –Computer Hardware
Design (5)

Reg
Reg
Reg
Dm

Reg

24

Forwarding (or Bypassing):
What about Loads?

sub r4,r1,r3

Dm

Im

Reg

ALU

lw r1,0(r2)

ID/R
F
Reg

ALU

Time (clock cycles)
I
F
Im

EX

MEM

WB
Reg

Dm

Reg

Dependencies backwards in time are hazards

In this case, we Can’t solve with forwarding:
Must delay/stall instruction dependent on
loads

MAC/VU-Advanced
Computer Architecture

Lecture 11 –Computer Hardware
Design (5)

25

Advanced Computer Architecture - Lecture 11: Computer hardware design

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về