Tải bản đầy đủ (.pdf) (42 trang)

Advanced Computer Architecture - Lecture 17: Instruction level parallelism

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.53 MB, 42 trang )

CS 704
Advanced Computer Architecture

Lecture 17
Instruction Level Parallelism
(High-performance Instructions delivery - Multiple Issue)

Prof. Dr. M. Ashraf Chughtai


Recap:

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

2


High-Performance Processors

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

3



Reducing branch penalties for HighPerformance Processors
Branch Target Buffer
Integrated Instruction Fetch Units
Return Address Predictors

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

4


1: Branch Target Buffer

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

5


2: Integrated Instruction Fetch Units
Integrated Branch Prediction
Instruction Prefetch
Instruction memory access and buffering


MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

6


2: Integrated Instruction Fetch Units

….. Cont’d

Integrated Branch Prediction

The Branch-predictor is included
in the Instruction Fetch Unit
So, it predicts and drive the
fetch-pipe

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

7



2: Integrated Instruction Fetch Units

….. Cont’d

Instruction Prefetch
An instruction pre-fetch queue is
part of IIFU
The queue holds multiple
instructions and deliver more than
one instructions in one cycle
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

8


2: Integrated Instruction Fetch Units

….. Cont’d

Instruction-Memory access and buffering
Fetching multiple instructions per clock
cycle may require accessing multiple
cache lines, which is a complex
operation
IIFU facilitates to overcome these
complexities and hides the cost of

crossing cache-blocks
IIFU also provides instruction buffering
and on-demand issue
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

9


3: Return Address Predictors
The Return-Address predictor predicts the
indirect jumps, i.e., the jumps whose
address varies at rum time
High-level language programs generate
such jumps for indirect procedure calls and
select or case statements

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

10


Summary: Minimizing Control Hazard Penalties


MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

11


Multiple Instruction-Issue Processors
All of the schemes described so far can at
best achieve 1 instruction/cycle
There exist two variations to these
schemes:
- Superscalar processors
- Very Long Instruction Word (VLIW)
processors
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

12


1: Superscalar



Static Scheduling–
Dynamic Scheduling

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

13


Superscalar
 The statically scheduled processors use inorder execution
 The dynamically scheduled use out-of-order
execution
 Superscalar concept has been used in:
– IBM Power2
– Sun Ultra SPARC
– Pentium III/4
– DEC Alpha
– HP 8000
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

14



2:Very Long Instruction Words – VLIW processor

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

15


2: VLIW / EPIC Machines

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

16


2: VLIW Processors
VLIW includes new features for:
- predication,
- rotating registers and
- speculations, etc.
Typical implementations are:
– i860, Trimedia, Itanium


We will talk about statically scheduled
superscalar today and about compiling
for VLIW/EPIC later
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

17


Statically Scheduled Superscalar Processor

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

18


Statically Scheduled Superscalar Processor
Instruction Issue Process:

 The multiple instruction issue is a complex
process
 During instruction fetch, the pipeline

receives all the instruction that could
potentially issue, called Issue-packet (it may
have say from 1 to 4 instructions)
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

19


Example: Statically Scheduled Superscalar MIPS Processor
As an example let us consider a MIPS superscalar
that has:

Number of Instructions issue/clock:
2 instructions - 1 FP operation, 1 Integer operations
(The integer operations include Load/store to integer
or FP register, branch and Integer ALU operation)

Issuing two instructions per cycle would
require Fetch and Decode 64-bits/clock
cycle
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)


20


Example:

… Cont’d

Fetching two instructions need careful handling
of the cache,
as either the first instruction may be at end of the
cache block
or the second instruction my be at the beginning
of the cache block
Hazard detection
The restriction of one FP and one Integer makes
the hazard checking simple.
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

21


Example:

… Cont’d

We simply have to determine the likelihood of

hazards between two instructions in an issuepacket
If this situation exist then the Simple solution is to
treat this as a structural hazard (issue only 1 of
them)
However, the only difficulties arise when Integer
Instruction is a FP load/store/move instruction
it may create contention of the FP port and
create RAW hazard when second instruction of
the pair depends
on the first
MAC/VU-Advanced
Lecture 17 – Instruction Level
Computer Architecture

Parallelism -Dynamic (6)

22


Example
Issuing: If placement is not a problem, then
fetch and issue is completed in three
steps”
Fetch Two instructions from the cache
Determine whether 0, 1 or 2 instructions
can issue
Issue them to the correct functional unit
MAC/VU-Advanced
Computer Architecture


Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

23


Example superscalar pipeline in operation
Let us see how the instructions look like when the go in
pair in a pipe

EX

EX

WB

EX

EX
EX

MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

WB
EX


WB

EX

EX
24


Dynamic Scheduling in Superscalar Processors

1. Extending Tomasulo’s concept to support

two instruction-issue superscalar pipeline
 Here, we do not want to issue instruction to
reservation station out of order, as this may
lead to the violation of program semantics.
 Further, to gain full advantage of Dynamic
scheduling remove the constraints of
issuing one FP and integer instruction in a
clock.
MAC/VU-Advanced
Computer Architecture

Lecture 17 – Instruction Level
Parallelism -Dynamic (6)

25



×