Tải bản đầy đủ (.pdf) (85 trang)

Advanced Computer Architecture - Lecture 22: Instruction level parallelism

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.35 MB, 85 trang )

Static (3)

78


Part b) The unrolled and scheduled code for
the transferred code - Loop body takes 10
cycles
integer Inst
FP Inst
Clock Cycles
Foo
L.D F0,0(R1)
1
L.D F6,-8(R1)
2
L.D F4,0(R2)
3
L.D F8,-8(R2)
4
DADDUI
R1,R1,#-16
MUL.D F0,F0,F4
5
DADDUI R2,R2,#-16
MUL.D F6,F6,F8
6
stall
7
stall
8


BNEZ
R1,foo ADD.D F2,F0,F2
9
ADD.D F2,F0,F2
10
..
..

MAC/VU-Advanced
Lecture 22 – Instruction Level
Bar:Computer Architecture
ADD.D F2,F0,F2
14
Parallelism-Static (3)

79


Problem # 3
Consider a code
For (i=2; i<=100; i+=2)
a[i] = a[50*i+1]
Using GCD test, normalize the loop.
Start index at 1 and increment it by 1 on
every iteration.
Write the normalized version of the loop
then use GCD test to see if there is
dependence.
MAC/VU-Advanced
Computer Architecture


Lecture 22 – Instruction Level
Parallelism-Static (3)

80


Numerical problems
Solution
By normalizing the loop it leads to a
modified C code as shown below,
For (i=1; i<=50; i++) ;divide i by 2
{
a[2*i] = a[(100*i)+1]

; multiple constant by 2

}
MAC/VU-Advanced
Computer Architecture

Lecture 22 – Instruction Level
Parallelism-Static (3)

81


Numerical problems
The GCD test shows the potential for
dependences within an array indexed by

the function,
ai +b and
cj + d
If the condition (d-b) mod gcd (c,a) = 0
is satisfied

MAC/VU-Advanced
Computer Architecture

Lecture 22 – Instruction Level
Parallelism-Static (3)

82


Numerical problems
Applying GCD test, in that case we will
get,
a = 2, b = 0; c = 100, d =1
allows us to determine dependence in
loop.
Thus gcd will be, gcd(2,100) = 2
And
d–b=1
MAC/VU-Advanced
Computer Architecture

Lecture 22 – Instruction Level
Parallelism-Static (3)


83


Numerical problems

Here, as 1 is factor of 2.
Thus, GCD test indicates that there is a
dependence in the code.
In reality, there is no dependence in the code.
Since the loop load its value from
a[101], a[201]……a[5001]

and

again these values to
a[2], a[4], ….. a[100]
MAC/VU-Advanced
Computer Architecture

Lecture 22 – Instruction Level
Parallelism-Static (3)

84


Asslam-u-aLacum
and
ALLAH Hafiz
MAC/VU-Advanced
Computer Architecture


Lecture 22 – Instruction Level
Parallelism-Static (3)

85



×