CS 704
Advanced Computer Architecture
Lecture 18
Instruction Level Parallelism
(Hardware-based speculations and exceptions)
Prof. Dr. M. Ashraf Chughtai
Today's Topics
Recap
Hardware-based Speculations
- Speculating on the outcome of branches
- Extension in the Tomasulo’s hardware
- Handling Exceptions
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
2
Recap: Lecture 17
Last time we discussed three basic
concepts to accomplish multiple
instructions issue:
Branch Target Buffer
Integrated Instruction Fetch Units
Return Address Predictors
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
3
Recap: Lecture 17
Branch Target-buffer provides the target
branch address at the IF stage
Its variation, branch folding, buffers the
actual target-instruction instead of or along
with target address
Both facilitate to minimize branch-hazard
stalls allowing multiple instruction issue in
one clock cycle
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
4
Recap Lecture 17… Cont’d
Integrated Instruction Fetch Unit (IIFU)
integrates the following three functions into
a single step :
Branch Prediction
Instruction Prefetch
Instruction memory access and buffering
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
5
Recap: Lecture 17… Cont’d
The Return-Address predictor
is one that predicts the indirect jumps,
i.e., the jumps for indirect procedure
calls and select or case statements
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
6
Recap: Lecture 17
… Cont’d
Then we discussed the features of:
Superscalar processors
VLIW processors
In the superscalar pipeline processors the
multiple instructions issued in one clock
cycle can be scheduled using both the static
as well as dynamic scheduling techniques
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
7
Recap: Lecture 17… Cont’d
Whereas, the VLIW-based processors
schedule multiple instruction issues in one
clock cycle using only the static scheduling
approaches
Then we discussed the performance
enhancement and factors limiting the
performance in superscalar pipes –
statically
scheduled
MAC/VU-Advanced
Computer Architecture
scheduled
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
and
dynamically
8
Today’s Focus
Last time, in the loop-based example, we
observed that
the control hazards, which prevent us from
starting the next iteration before we know
whether the branch was correctly predicted
or not, causes one-cycle penalty, on every
loop iteration
Today we will focus on the hardware-based
speculation to address this limitation
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
9
Hardware-based Speculation: Introduction
Hardware-based speculation offers many
advantages
– Can incorporate hardware-based
branch prediction
– Does not require additional
bookkeeping code
– Does not depend on a compiler
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
10
Hardware-based Speculation
This approach has been implemented in
the :
- PowerPC 620
- MIPS R10000
- Intel P6, and
- AMD K5
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
11
Hardware Based Speculation: Basics
We have observed that
exploiting more instruction level
parallelism, increases the
burden of maintaining control
dependence
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
12
Hardware Based Speculation: Basics
Where, the branch prediction
reduces the direct stall
attributable to branches, a
multiple-issue processor may
need to execute a branch every
clock cycle to maintain
maximum performance
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
13
Hardware Based Speculation: Basics
Hence, exploiting more parallelism
requires that we must overcome the
limitations of control dependence
These limitations are overcome by the
speculation on the outcome of
branches and executing the program
for speculations
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
14
Hardware Based Speculation: Basics
Here, we:
Fetch, Issue and
Execute instructions
as if our branch predictions were always
correct.
We know that dynamic scheduling without
speculation fetches and issues but does
not execute such instructions until
prediction is checked and found correct
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
15
Hardware Support: Speculative Execution
Main idea:
allow execution of an instruction dependent
on a predicted-taken branch such that there
are no consequences (including exceptions
such as memory violation) if branch is not
actually taken
Further, we don’t want a speculative
instruction to cause exceptions that stop
programs (i.e. memory violation)
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
16
Hardware Support: Speculative Execution
This can be achieved:
If hardware support for speculation
buffers the results and exceptions
from instructions,
until it is known that the instruction
would execute
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
17
Hardware Based Speculation: Basics
This shows that:
Hardware based speculation combines
three key ideas:
Dynamic Branch Prediction
Speculation
Dynamic scheduling
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
18
Hardware Based Speculation: Basics
1.
Dynamic branch prediction facilitates to
choose which instruction to execute;
i.e., next in sequence or branch
2.
Speculate to allow the execution of the
instructions before the control
dependence is resolved
Here, the hardware has the ability to
undo the instructions hard to do if there
are exceptions
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
19
Hardware Based Speculation: Basics
3.
Dynamic scheduling to deal with the
scheduling of different combinations of
basic blocks
Thus, the hardware based speculation
follows the predicted flow of data
values to choose when to execute
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
20
Hardware Based Speculation:
Basics
To do so,
we must separate the
bypassing of results among
instructions, which (i.e., bypassing) is
needed to execute an instruction
speculatively,
from the actual completion of an
instruction
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
21
Hardware Based Speculation:
Basics
By making this separation we can
allow an instruction:
- to execute and
- to bypass its result to other
instructions
without …………….. Cont’d
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
22
Hardware Based Speculation:
Basics
without allowing the instruction to
perform any update that cannot be
undone,
until we know that the instruction
is no longer speculative
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
23
Hardware Based Speculation:
Basics
When the instruction is no longer
speculative, we allow it to update the
register file or memory
This additional step in the instruction
execution sequence is called
instruction commit
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
24
Hardware Based Speculation:
Basics
This shows that
The basic idea behind implementing
the speculation is
to allow instructions to
execute out-of- order
but force them to
commit in-order
MAC/VU-Advanced
Computer Architecture
Lecture 18 – Instruction Level
Parallelism -Dynamic (7)
25