Tải bản đầy đủ (.pdf) (60 trang)

Advanced Computer Architecture - Lecture 29: Memory hierarchy design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.74 MB, 60 trang )

CS 704
Advanced Computer Architecture

Lecture 29
Memory Hierarchy Design
Cache Performance Enhancement by:
Reducing Cache Miss Penalty

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Recap: Cache Design
Cache Performance
Reducing Miss Penalty
Summary

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

2


Recap: Memory Hierarchy Designer’s Concerns
Block placement: Where can a block be placed
in the upper level?

Block identification: How is a block found if it is
in the upper level?



Block replacement: Which block should be
replaced on a miss?

Write strategy: What happens on a write?

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

3


Recap: Write Buffer for Write Through
cache write strategies
– write back
write through
use of write-buffer

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

4


Recap: Write Buffer for Write Through
level-2 cache is introduce in between the

Level-1 cache and the DRAM main memory
-

Write Allocate and

-

No-Write Allocate

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

5


Recap: Write Miss Policies
Write Allocate:
– A block is allocated in the cache on a
write miss, i.e., the block to be written is
available in the cache
No-Write Allocate:
– The blocks stay out of the cache until the
program tries to read the blocks; i.e., the
block is modified only in the lower level
memory
MAC/VU-Advanced
Computer Architecture


Lecture 29 Memory Hierarchy (5)

6


Impact of Caches on CPU Performance
 CPU Exe c utio n Time  e quatio n 



CPU (e x­Time ) = 
(CPU Exe . c lo c k c yc le  +  Me mo ry   S tall c yc le s ) x  Clo c k Cyc le  Time


Impact of Caches on CPU Performance:
Example
 As s umptio ns
 the  c ac he  mis s  pe nalty o f 100 c lo c k c yc le s  
 all ins truc tio ns  no rmally take  1 c lo c k c yc le
 Ave rag e  mis s  rate  is  2%
 Ave rag e  me mo ry re fe re nc e s  pe r ins truc tio n = 1.5
 Ave rag e  numbe r o f c ac he  mis s e s  pe r 1000 ins t. = 30

Find the  impac t o f c ac he  o n pe rfo rmanc e  o f CPU 
c o ns ide ring  bo th the  mis s e s  pe r ins truc tio n and 
mis s  rate


Impact of Caches on CPU Performance:
Example

CPU Time  = 

(CPU Exe . c lo c k c yc le  +  Me mo ry   S tall c yc le s ) x  Clo c k Cyc le  Time

CPU Time   with c ac he   (inc luding  c ac he  mis s )
= (IC x (1.0 + (30/1000 x 100) x c lo c k c yc le  time

= IC x 4.00 x c lo c k c yc le  time
CPU Time   with c ac he   (inc luding  mis s  rate )
= (IC x (1.0 + (1.5 x 2% x 100) x c lo c k c yc le  time
= IC x 4.00 x c lo c k c yc le  time


Cache Performance (Review)
­ Numbe r o f Mis s e s  o r mis s  rate
­ Co s t pe r Mis s  o r mis s  pe nalty
 Me mo ry s tall c lo c k c yc le s  e qual to  the  s um o f 
 IC x  Re ads  pe r ins t.   x Re ad mis s  rate  x  Re ad Mis s  

Pe nalty ; and 
 IC x  write s  pe r ins t.  x  Write  Mis s  Rate  x  Write  
Mis s  Pe nalty


Cache Performance (Review)
Numbe r o f re ads  x re ad mis s  rate  x re ad mis s  
pe nalty +
Numbe r o f write   x write  mis s  rate  x write  mis s  
pe nalty


Ave rag ing  the  re ad and write  mis s  rate
 Me mo ry s tall c lo c k c yc le s  = 

Numbe r o f me mo ry ac c e s s  x Mis s e s  rate  x mis s  
pe nalty

 Ave rag e  Me mo ry Ac c e s s  Time  = 
Hit Time  x Mis s e s  rate   x  mis s  
pe nalty


Cache Performance (Review)
Note that the average memory access time
is an indirect measure of the CPU
performance and is not substitute for the
Execution Time
However, this formula can decide about the
split caches (i.e., instruction cache and data
cache) or unified cache
E.g., if we have to find out which of these
two types of caches has lower miss rate we
can use this formula as follows:
MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

12



Cache Performance: Example
Statement: Let us consider 32KB unified
cache with misses per 1000 instruction
equals 43.3 and instruction/data split caches
each of 16KB with instruction cache misses
per 1000 as 3.82 and data cache as 40.9;
Assume that
– 36% of the instructions are data transfer
instructions;
– 74% of memory references are instruction
references; and
MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

13


Cache Performance: Example
– hit takes 1 clock cycle where the miss
penalty is 100 cycles and
– a load or store takes one extra cycle on
unified cache
Assuming write-through caches with writebuffer and ignore stalls due to write buffer –
Find the average memory access time in
each case
Note to solve this problem we first find the miss
rate
and then average memory access time

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

14


Cache Performance: Solution
1: Miss Rate
= (Misses/1000) / (Accesses/ inst.)
Miss Rate 16KB Inst = (3.82/1000) /1.0

= 0.0038

Miss Rate 16KB data = (40.9/1000) /0.36 = 0.114

As about 74% of the memory access are
instructions therefore overall miss rate for
split caches = (74% x 0.0038) + (26% x 0.114)
= 0.0324
Miss Rate 32KB unified = (43.3/1000) /(1+0.36) = 0.0318
MAC/VU-Advanced
i.e
., the unified cache
has slightly lower miss rate
Computer Architecture
15
Lecture 29 Memory Hierarchy (5)



Cache Performance: solution
2: Average Memory Access Time
= %inst x (Hit time + Inst. Miss rate x miss penalty)
+ %data x (Hit time + data Miss rate x miss penalty)
Average Memory Access Time split =
74% x (1 + 0.0038 x 100) + 26% x (1 + 0.114 x 100) = 4.24

Average Memory Access Time unified =
74%x (1 + 0.0.0318 x 100) + 26% x (1+1+0.0318 x 100) = 4.44

i.e., the split caches have slightly better
average access time and also avoids
Structural Hazards
MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

16


Improving Cache Performance
Average memory access time gives
framework to optimize the cache
performance
The Average memory access time formula:
Average Memory Access time =
Hit Time + Miss Rate x Miss Penalty


MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

17


Four General Options
1. Reduce the miss penalty,

2. Reduce the miss rate,
3. Reduce miss Penalty or miss rate
via Parallelism
4. Reduce the time to hit in the cache

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

18


Reducing Miss Penalty
1. Multilevel Caches
2. Critical Word first and Early Restart
3. Priority to Read Misses Over write
Misses
4. Merging Write Buffers

5. Victim Caches
MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

19


1: Multilevel Caches (to reduce Miss Penalty)
This technique ignores the CPU but
concentrates on the interface between
cache and maim memory
Multiple levels of caches
Tradeoff between cache size (cache
effectiveness and cost (access time) a
small fastest memory is used as level-1
cache
MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

20


1: Multilevel Caches (Performance Analysis)
Average access Time is:
Access Time average
= Hit Time L1 + Miss Rate L1 x Miss Penalty L1

Where, Miss Penalty L1

= Hit Time L2 + Miss Rate L2 x Miss Penalty L2
Therefore,
The Average memory access time
= Hit Time L1 + Miss Rate L1 x (Hit Time L2 +
Miss Rate L2 x Miss Penalty L2 )
MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

21


1: Multilevel Caches (Performance Analysis)
Stall/instruction average =
Misses per instructionL1 x Hit Time L2 +
Misses per instructionL2 x Miss PenaltyL2

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

22


1: Multilevel Caches (to reduce Miss Penalty)


Local miss rate
Global miss rate

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

23


1: Multilevel Caches (to reduce Miss Penalty)
Local Miss Rate:
Measure of misses in a cache divided by the
total number of misses in this cache.
Global Miss Rate:
Measure of the number of misses in the
cache divided by the total number of
memory access generated by the CPU

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

24


1: Multilevel Caches (to reduce Miss Penalty)
Global miss rate

– 1st level cache = Miss Rate L1
– 2nd level cache = Miss Rate L1 x Miss Rate L2

MAC/VU-Advanced
Computer Architecture

Lecture 29 Memory Hierarchy (5)

25


×