CS 704
Advanced Computer Architecture
Lecture 29
Memory Hierarchy Design
Cache Performance Enhancement by:
Reducing Cache Miss Penalty
Prof. Dr. M. Ashraf Chughtai
Today’s Topics
Recap: Cache Design
Cache Performance
Reducing Miss Penalty
Recap: Memory Hierarchy Designer’s Concerns
Block placement: Where can a block be placed
in the upper level?
Block identification: How is a block found if it is
in the upper level?
Block replacement: Which block should be
replaced on a miss?
Write strategy: What happens on a write?
Recap: Write Buffer for Write Through
cache write strategies
– write back
write through
use of write-buffer
Recap: Write Buffer for Write Through
level-2 cache is introduce in between the
Level-1 cache and the DRAM main memory
Write Allocate and
No-Write Allocate
Recap: Write Miss Policies
Write Allocate:
– A block is allocated in the cache on a
write miss, i.e., the block to be written is
available in the cache
No-Write Allocate:
– The blocks stay out of the cache until the
program tries to read the blocks; i.e., the
block is modified only in the lower level
Impact of Caches on CPU Performance
CPU Exe c utio n Time e quatio n
CPU (e xTime ) =
(CPU Exe . c lo c k c yc le + Me mo ry S tall c yc le s ) x Clo c k Cyc le Time
Impact of Caches on CPU Performance:
As s umptio ns
the c ac he mis s pe nalty o f 100 c lo c k c yc le s
all ins truc tio ns no rmally take 1 c lo c k c yc le
Ave rag e mis s rate is 2%
Ave rag e me mo ry re fe re nc e s pe r ins truc tio n = 1.5
Ave rag e numbe r o f c ac he mis s e s pe r 1000 ins t. = 30
Find the impac t o f c ac he o n pe rfo rmanc e o f CPU
c o ns ide ring bo th the mis s e s pe r ins truc tio n and
mis s rate
Impact of Caches on CPU Performance:
CPU Time =
(CPU Exe . c lo c k c yc le + Me mo ry S tall c yc le s ) x Clo c k Cyc le Time
CPU Time with c ac he (inc luding c ac he mis s )
= (IC x (1.0 + (30/1000 x 100) x c lo c k c yc le time
= IC x 4.00 x c lo c k c yc le time
CPU Time with c ac he (inc luding mis s rate )
= (IC x (1.0 + (1.5 x 2% x 100) x c lo c k c yc le time
= IC x 4.00 x c lo c k c yc le time
Cache Performance (Review)
Numbe r o f Mis s e s o r mis s rate
Co s t pe r Mis s o r mis s pe nalty
Me mo ry s tall c lo c k c yc le s e qual to the s um o f
IC x Re ads pe r ins t. x Re ad mis s rate x Re ad Mis s
Pe nalty ; and
IC x write s pe r ins t. x Write Mis s Rate x Write
Mis s Pe nalty
Cache Performance (Review)
Numbe r o f re ads x re ad mis s rate x re ad mis s
pe nalty +
Numbe r o f write x write mis s rate x write mis s
pe nalty
Ave rag ing the re ad and write mis s rate
Me mo ry s tall c lo c k c yc le s =
Numbe r o f me mo ry ac c e s s x Mis s e s rate x mis s
pe nalty
Ave rag e Me mo ry Ac c e s s Time =
Hit Time x Mis s e s rate x mis s
pe nalty
Cache Performance (Review)
Note that the average memory access time
is an indirect measure of the CPU
performance and is not substitute for the
Execution Time
However, this formula can decide about the
split caches (i.e., instruction cache and data
cache) or unified cache
E.g., if we have to find out which of these
two types of caches has lower miss rate we
can use this formula as follows:
Cache Performance: Example
Statement: Let us consider 32KB unified
cache with misses per 1000 instruction
equals 43.3 and instruction/data split caches
each of 16KB with instruction cache misses
per 1000 as 3.82 and data cache as 40.9;
Assume that
– 36% of the instructions are data transfer
– 74% of memory references are instruction
references; and
Cache Performance: Example
– hit takes 1 clock cycle where the miss
penalty is 100 cycles and
– a load or store takes one extra cycle on
unified cache
Assuming write-through caches with writebuffer and ignore stalls due to write buffer –
Find the average memory access time in
each case
Note to solve this problem we first find the miss
and then average memory access time
Cache Performance: Solution
1: Miss Rate
= (Misses/1000) / (Accesses/ inst.)
Miss Rate 16KB Inst = (3.82/1000) /1.0
= 0.0038
Miss Rate 16KB data = (40.9/1000) /0.36 = 0.114
As about 74% of the memory access are
instructions therefore overall miss rate for
split caches = (74% x 0.0038) + (26% x 0.114)
= 0.0324
Miss Rate 32KB unified = (43.3/1000) /(1+0.36) = 0.0318
., the unified cache
has slightly lower miss rate
Cache Performance: solution
2: Average Memory Access Time
= %inst x (Hit time + Inst. Miss rate x miss penalty)
+ %data x (Hit time + data Miss rate x miss penalty)
Average Memory Access Time split =
74% x (1 + 0.0038 x 100) + 26% x (1 + 0.114 x 100) = 4.24
Average Memory Access Time unified =
74%x (1 + 0.0.0318 x 100) + 26% x (1+1+0.0318 x 100) = 4.44
i.e., the split caches have slightly better
average access time and also avoids
Structural Hazards
Improving Cache Performance
Average memory access time gives
framework to optimize the cache
The Average memory access time formula:
Average Memory Access time =
Hit Time + Miss Rate x Miss Penalty
Four General Options
1. Reduce the miss penalty,
2. Reduce the miss rate,
3. Reduce miss Penalty or miss rate
via Parallelism
4. Reduce the time to hit in the cache
Reducing Miss Penalty
1. Multilevel Caches
2. Critical Word first and Early Restart
3. Priority to Read Misses Over write
4. Merging Write Buffers
5. Victim Caches
1: Multilevel Caches (to reduce Miss Penalty)
This technique ignores the CPU but
concentrates on the interface between
cache and maim memory
Multiple levels of caches
Tradeoff between cache size (cache
effectiveness and cost (access time) a
small fastest memory is used as level-1
1: Multilevel Caches (Performance Analysis)
Average access Time is:
Access Time average
= Hit Time L1 + Miss Rate L1 x Miss Penalty L1
Where, Miss Penalty L1
= Hit Time L2 + Miss Rate L2 x Miss Penalty L2
The Average memory access time
= Hit Time L1 + Miss Rate L1 x (Hit Time L2 +
Miss Rate L2 x Miss Penalty L2 )
1: Multilevel Caches (Performance Analysis)
Stall/instruction average =
Misses per instructionL1 x Hit Time L2 +
Misses per instructionL2 x Miss PenaltyL2
1: Multilevel Caches (to reduce Miss Penalty)
Local miss rate
Global miss rate
1: Multilevel Caches (to reduce Miss Penalty)
Local Miss Rate:
Measure of misses in a cache divided by the
total number of misses in this cache.
Global Miss Rate:
Measure of the number of misses in the
cache divided by the total number of
memory access generated by the CPU
1: Multilevel Caches (to reduce Miss Penalty)
Global miss rate
– 1st level cache = Miss Rate L1
– 2nd level cache = Miss Rate L1 x Miss Rate L2
