CS 704
Advanced Computer Architecture
Lecture 13
Instruction Level Parallelism
(Dynamic Scheduling - Scoreboard Approach)
Prof. Dr. M. Ashraf Chughtai
Today's Topics
Recap - Lecture 11-12
Out-of-Order Execution
Dynamic Scheduling
Scoreboard Technique
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
2
Recap: Lecture 12
- FP and Integer Multiplier
- FP and Integer Divider
Here, we observed that :
- Only one instruction is issued on every clock cycle
- the integer ADD instructions go through the FP pipeline
as they go through in standard pipeline – as the integer
ALU operations have ZERO latency
- the FP add and FP/integer multiply and divide
instructions enter into loop when they reach EX-stage
due to longer latencies of these operations – thus
increases the number of stalls before the instruction is
issued to EX stage
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
3
Recap: Lecture 12
RAW and WAR hazards may occur because the
instruction are of varying length and may reach
WB out-of-order
There are different ways to RAW hazard:
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
4
Recap: Lecture 12
WAW hazard
(The jth instruction writes prior to the ith
instruction; the ith instruction overwrites
the result of jth instruction)
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
5
Recap: Lecture 12
Two ways to resolve WAW hazard
- Delay the issue of jth instruction until the
ith instruction enters the MEM stage
- Stamp out the ith instruction by detecting
the hazard and changing the control
(WB) so that the ith instruction does not
write.
Hence, the jth instruction can be issued
right-away.
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
6
Today's Topics
Out-of-Order Execution
Problems of Out-of-order
execution
Dynamic Scheduling
Scoreboard Technique
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
7
In-Order Execution
Simple Pipelined datapath facilitates
only the In-order instruction
execution, i.e.,
Instructions are fetched, decoded and issued in
the sequence of the program and
no later instruction can proceed if an
instruction is stalled due to hazard –
structural or data dependence
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
8
In-order Execution … Cont’d
For example: in the code
DIV.D
F0, F2, F4
ADD.D
F10, F0, F8
SUB.D
F12, F8, F14
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
9
In-order Execution … Cont’d
Conclusion
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
10
In-order Execution: MIPS 5-stage Pipeline
The MIPS 5-stage pipeline, both the structural and data
hazards are checked during the Instruction Decode (ID)
stage; and
the instruction is issued from ID stage, if it could
execute properly
Here, the issue process, at ID stage, is separated into
two parts:
Checking the structural hazard
Waiting for the absence of data hazard
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
11
Out-of-order Execution: MIPS 5-stage pipeline
DIV.D
F0, F2, F4
ADD.D
F10, F0, F8
SUB.D
F12, F8, F14
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
12
Today's Topics
Out-of-Order Execution
Problems of Out-of-order
execution
Dynamic Scheduling
Scoreboard Technique
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
13
Basic Problems of Out-of-order Execution
Consider the example FP code
DIV.D
ADD.D
SUB.D
MUL.D
F0, F2, F4
F6, F0, F8
F8, F10, F14
F6, F10, F8
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
14
Example Explained: RAW Hazard
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
15
Example Explained: WAW hazard
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
16
Example Explained: WAW hazard
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
17
Today's Topics
Out-of-Order Execution
Problems of Out-of-order
execution
Dynamic Scheduling
Scoreboard Technique
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
18
Scheduling for out-of-order execution
Static Scheduling:
Rearrangement of the instruction
execution by the compiler
Dynamic Scheduling:
Rearrangement of the instruction
execution by the hardware
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
19
Dynamic Scheduling
-
Issue:
- Read Operand:
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
20
Dynamic Scheduling
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
21
Dynamic Scheduling: Score boarding Technique
CDC 6600 contains:
- 4 FP units
- 5 Memory Reference Units
- 7 integer operation units
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
22
MIPS Processor with Scoreboard
Registers
Data Buses
FP Mul
FP Mul
FP Divide
FP Adder
Integer Unit
ScoreBoard
MAC/VU-Advanced
Computer Architecture
Control/status
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
23
Features of Scoreboard
The Scoreboard :
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
24
Components of Scoreboard
Instruction Status
Functional Unit Status
Register Result Status
MAC/VU-Advanced
Computer Architecture
Lecture 13 – Instruction Level
Parallelism -Dynamic (2)
25