Tải bản đầy đủ (.pdf) (38 trang)

Advanced Computer Architecture - Lecture 12: Instruction level parallelism

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1 MB, 38 trang )

CS 704
Advanced Computer Architecture

Lecture 12
Instruction Level Parallelism
(Introduction to multi cycle pipelined datapath)

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Recap: Pipelining Basics
Longer Pipelines – FP Instructions
Loop Level Parallelism
FP Loop Hazards
Summary

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

2


Recap: Pipelined datapath and control
In the previous lecture we reviewed the
pipelined datapath to understand the basics
of ILP – overlap among the instruction
execution to enhance performance


Key components of pipeline data path
Performance enhancement due to pipeline:

– Pipelining helps instruction bandwidth
but not latency
MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

3


Recap: Pipeline Hazards
Structural hazards

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

4


Recap: Pipeline Hazards

….. Cont’d


Data Hazards

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

5


Recap: Three Generic Data Hazards
Read After Write (RAW): (dependence)
– instrJ tries to read operand before instri writes it;

i: add r1,r2,r3
j: sub r4,r1,r3

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

6


Recap: Three Generic Data Hazards
Write After Read (WAR): anti-dependence



i: sub r4,r1,r3

j: add r1,r2,r3
- Also called Name dependence(renaming)

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

7


Recap: Three Generic Data Hazards
• Write After Write (WAW)
i: sub r1,r4,r3
j: add r1,r2,r3

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

8


Recap: Pipeline Hazards


….. Cont’d

Control hazards
How to overcome Hazards?

Stall

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

9


Recap: How to remove Hazards?
Structural Hazard:
Multiple functional units
Data Hazard
: Forwarding or bypassing
Control Hazards:
Predict, delay branch

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)


10


Instruction Level Parallelism
– clock speed
– number of instructions that can
execute in parallel, i.e., increasing
ILP

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

11


How to achieve Instruction Level Parallelism?

A superscalar processor:
- - pre-fetch and decode
- Start several branch instruction streams
- Finally, discard all but the correct stream

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level

Parallelism (1)

12


Superscalar Design

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

13


MIPS Longer Pipelines – FP Instructions

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

14


MIPS Longer Pipelines – FP Instructions
For example to ADD two FP minimum
four steps are performed in the

following sequence:

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

15


Flow diagram of MIPS FP Adder

Draw flow diagram of pp284

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

16


Steps for FP Addition
Step 1: Exponents of two numbers are compared,
the smaller number is shifted to the right to till its
exponent matches to the larger exponent
Step 2: Add the significands
Step 3: Normalize the sum – shift right and

increment or shift left and decrement
Step 4: If no overflow or underflow then round the
significand to number of bits
Stop if further normalization is not required,
otherwise go to step 3
MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

17


MIPS Longer Pipelines

…… Cont’d

- The latency of functional unit is defined as:
the number of cycles between the instructions that
produces a result and the one that uses the result
of the operation
- The initiation or repeat interval is defined as:
the number of cycles that must elapse between
issuing two operations (repeat of an operation) of the
same type

MAC/VU-Advanced
Computer Architecture


Lecture 12 –Instruction Level
Parallelism (1)

18


MIPS Longer Pipelines

…… Cont’d

Latency Initiation (repeat)
Interval
Integer ALU
Data Memory (Int / FP Load)
FP ADD
FP/ Integer Multiply
FP/Integer Divide

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

=0
=1
=3
=6
= 24


1
1
1
1
25

19


Typical MIPS FP Pipeline
Let us consider a typical MIPS FP pipeline
with three un-pipelined FP functional units
Insert Fig. A.29 (page A-48)

Explanation next please
MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

20


Typical MIPS FP Pipeline

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level

Parallelism (1)

21


MIPS FP Pipeline with Pipelined FUs
The previous FP pipeline can be extended
by adding additional pipeline stages in the
functional units
Insert Fig. A.31(page A-50)

Explanation next please
MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

22


Working of extended FP Pipeline

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

23



Working of extended FP Pipeline
Note that additional pipeline register have
been inserted between intervening stage,
e.g., A1/A2, A2/A3, …..
Furthermore, ID/EX register must be
expanded to connect ID to A1, M1, EX and
DIV Function Units
Here, the FP divide FP is not pipelined but it
requires 24 clock cycles to complete

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

24


FP Pipeline Timing: Example

MAC/VU-Advanced
Computer Architecture

Lecture 12 –Instruction Level
Parallelism (1)

25



×