CS 704
Advanced Computer Architecture
Lecture 23
Instruction Level Parallelism
(Hardware Support at Compile Time)
Prof. Dr. M. Ashraf Chughtai
Today’s Topics
Recap
H/W Support at Compile Time
– Conditional/Predicated Instructions
– H/W based Compiler Speculation
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
2
Recap: H/W and S/W Exploitation
We have studied both the Dynamic
and Static scheduling techniques
to exploit ILP for single or multiple
instructions issue per clock cycle and
to enhance the processor
performance
The dynamic approaches use
hardware modification which results
in superscalar and VLIW processors
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
3
Recap …… Cont’d
Furthermore, the pipeline
structure enhancement such as
– Tomasulo’s pipeline facilitates
to overcome the structural and
data hazards and
– Branch predictors minimize the
stalls due the control hazards
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
4
Recap …… Cont’d
The static scheduling approaches
include
– Loop unrolling
– Software Pipelining
– Trace Scheduling
– Superblock Scheduling
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
5
Recap …… Cont’d
These techniques are focused to
increase ILP by exploiting
processor issuing more than one
instruction every cycle
These techniques give better
performance when the behavior of
the branches is correctly
predictable at the compile time
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
6
Recap …… Cont’d
Otherwise, the parallelism could
not be completely exposed at the
compile time
This is due to the following two
reasons
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
7
Recap …… Cont’d
1. Control dependences limits the
amount of the parallelism that
can be exploited; and
2. Dependence between memory
reference instructions could
prevent code movement
necessary to increase parallelism
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
8
Hardware Support for VLIW
These limitations, particularly for
VILW processor, could be
overcome by providing hardware
support at the compile time
Today, we will introduce some
hardware support-based
techniques to help:
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
9
Hardware Support for VLIW
– overcoming these limitations;
and
– to expose more parallelism at
the compile time
The most commonly used such
techniques are:
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
10
Hardware Support for VLIW
1. Extension of the Instruction Set
by including Conditional or
Predicated (base something on
something) Instructions
2. Hardware speculation to enhance
the ability of compiler to :
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
11
Hardware Support for VLIW
to move code over branches,
while preserving exceptional
behavior
To allow the compiler to reorder
load/store instruction when no
conflict is suspected but not
certain
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
12
1: Instruction Set Extension
The extended instruction set including
Conditional or Predicated Instructions
allow the compiler to group
instructions across branches
eliminate branches
convert control dependence into
data dependence
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
13
Extension of Instruction Set
These approaches are equally useful
for hardware-intensive as well as
software-intensive scheme, i.e., the
dynamic as well as static scheduling
As discussed earlier, Predicate
registers are included, in the structure
of IA64 processor, to implement
predicated instructions to improve
performance
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
14
Conditional Instructions
Now let us discuss the concept behind
introducing the conditional
instructions in the instruction set
–
–
The conditional instructions have
an extra operand – a one-bit
predicate register
A condition is evaluated as part of
instruction execution to set the
value of predicate-register
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
15
Conditional or predicted Instructions
–
In HPL-PD from HP Lab, the value
of the predicate register is typically
set to “Compare-to-predicate
operation;
p1 = CMPP <= r1, r2
Here the predicate register p1 is
set if r2 is <= r1
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
16
Conditional or predicted Instructions
–
If condition is true (p1=1), the
instruction is executed normally.
–
If the condition is false (p1=0), the
instruction execution continues as
if the instructions were a nooperation
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
17
Conditional Instructions
Typical conditional instructions for
pipeline processors are:
Conditional Move – CMOVZ R1, R2, R3
it moves the value from one register to
another if the condition is true; i.e., third
operand – the predicate register R3 is Zero
Such instructions are used to
eliminate branch code sequence
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
18
Conditional Instructions
Conditional ADD –
(R8) ADD R1, R2, R3
assumes that the R1= R2+R3 occurs if the
predicate register – R8 is 1
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
19
Conditional Instructions
Conditional Load – LWC R1, 0(R2), R3
assumes that the load occurs unless the
third operand – R3 is Zero
The LW instruction, or a short block of
code, following the branch can be
converted to LWC and moved up to
second issue slot to improve the
execution time for several cycles
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
20
Conditional or predicated Instructions:
Example 1
Let us consider the conditional
statement:
If (A==0) { S=T;}
i.e., the value S is to be replaced by T if
the value A is zero
Assuming the register R1, R2, R3 holds
the value of A, S and T respectively.
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
21
Example 1
The code to implement this conditional
statements can be written as:
BNEZ
R1, L ; No-op if A (R1)!= 0
ADDU
R2, R3, R0 ; Else replace S (R2) by T (R3)
L
The IF statement can be implemented by the
conditional move as:
CMOVZ R2, R3, R1
Move R3 to R2 if the third operand R1=0
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
22
Conditional or predicated Instructions
Here, notice that using the Conditional
instruction CMOVZ,
the next operation is determined by the
contents of the third register instead of
condition evaluation
i.e., the control dependence has been
converted to data dependence
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
23
Conditional or predicated Instructions
This transformation has moved the
place to resolve dependence in a
pipelined processor
We know that, in a pipelined processor the
dependence for branches is resolved near
the front of the pipe
Whereas, the conditional instruction resolve
the dependence where the register-write
occurs
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
24
Conditional or predicated Instructions
This transformation is also used for
vector computers, where it is called
if-conversion
The if-conversion replaces
conditional branches with predicated
operations
For example: Let see the code
generated for the following two (2)
if-then-else statements
MAC/VU-Advanced
Computer Architecture
Lecture 23 – Instruction Level
Parallelism-Static (4)
25