Tải bản đầy đủ (.pdf) (54 trang)

Advanced Computer Architecture - Lecture 27: Memory hierarchy design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.61 MB, 54 trang )

CS 704
Advanced Computer Architecture

Lecture 27
Memory Hierarchy Design
(Cache Design Techniques)

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Recap: Caching and Locality
Cache Performance Metrics
Cache Designs
Addressing Techniques
Summary

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

2


Recap: Memory Hierarchy Principles
High speed storage at the
cheapest cost per byte
Different types of memory
modules are organize in hierarchy,
based on the:


Concept of Caching
Principle of Locality
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

3


Recap: Concept of Caching
A small, fastest and most expensive
storage be used as the staging area or
temporary-place
– store frequently-used subset of the data

or instructions from the relatively
cheaper, larger and slower memory; and
– To avoid having to go to the main

memory every time this information is
needed
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

4



Recap: Principle of Locality
principle of locality
To obtain data or instructions of a
program, the processor access a
relatively small portion of the address
space at any instant of time

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

5


Recap: Types of Locality
There are two different types of locality
Temporal locality
Spatial locality

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

6


Recap: Working of Memory Hierarchy
― the memory hierarchy will keep the

more recently accessed data items
closer to the processor because
chances are the processor will
access them again soon

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

7


Recap: Working of Memory Hierarchy .. Cont’d



NOT ONLY do we move the item
that has just been accessed
closer to the processor, but we
ALSO move the data items that
are adjacent to it
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

8



Recap: Cache Devices
Cache device is a small SRAM which is
made directly accessible to the processor
Cache sits between normal main memory
and CPU as data and instruction caches and
may be located on CPU chip or as a module
Data transfer between cache - CPU, and
cache- Main memory is performed by the
cache controller
Cache and main memory is organized in
equal sized blocks
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

9


Recap: Cache/Main Memory Data Transfer
An address-tag is associated with each
cache block that defines the relationship of
the cache block with the higher-level
memory (say main memory)
Data Transfer between CPU and Caches
takes place as the word transfer
Data transfer between Cache and the Main
memory takes place as the block transfer
MAC/VU-Advanced
Computer Architecture


Lecture 27 Memory Hierarchy (3)

10


Recap: Cache operation
CPU requests contents of main memory
location
Controller checks cache blocks for this data
If present, i.e., HIT, it gets data or instruction
from cache - fast
If not present, i.e., MISS, it reads required
block from main memory to cache, then
deliver from cache to CPU
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

11


Cache Memory Performance
Miss rate, Miss Penalty, and Average access time
are the major trade-off of Cache Memory
performance
Miss Rate: is the fraction of memory accesses that
are not found in the level-k memory or say the
cache

number of misses
Miss Rate =
total memory accesses

As, Hit rate is defined as the fraction of memory
access that are found in the level-k memory or say
the cache, therefore Miss Rate = 1 – Hit Rate
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

12


Cache Memory Performance
Miss Penalty is the memory stall cycles –
i.e., the number of cycles CPU is stalled for
a memory access; and is determined by the
sum of:
(i) The Cycles (time) to replace a block in the
upper level and
(ii) The Cycles (time) to deliver the block to
the processor
Average Access Time:
= Hit Time x (Hit Rate) + Miss Penalty x Miss Rate
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)


13


Cache Memory Performance
The performance of a CPU is the product of clock
cycle time and sum of CPU clock cycles and
memory stall cycles

CPU Execution Time =
(CPU Clock Cycles + Memory Stall Cycles) x clock cycle time

Where,
memory stall cycles=
= Number of Misses x Miss Penalty
= IC x (Misses / Instructions)x Miss Penalty
= IC x [(Memory Access / Instructions)] x Miss Rate x Miss Penalty
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

14


Memory Stall Cycles … cont’d
– Number of cycles for memory read and for
memory write may be different,
– Miss penalty for read may be different from
the write

– Memory Stall Clock Cycles =
Memory read stall cycles +
Memory Write stall cycles


MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

15


Cache Performance Example
Assume a computer has CPI=1.0 when all memory
accesses are hit; the only data accesses are
load/store access; and these are 50% of the total
instructions
If the miss rate is 2% and miss penalty is 25 clock
cycles, how much faster the computer will be if all
instructions are HIT
Execution Time for all Hit = IC x 1.0 x cycle time
CPU Execution time with real cache =
CPU Execution time + Memory Stall time
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

16



Cache Performance Example
Memory Stall Cycles =
= IC x (Instruction access + data access) per instruction
x
miss rate x miss penalty
= IC (1+ 0.5) x 0.02 x 25
= IC x 0.75
CPU Execution time (with cache)
= (IC x 1.0 + IC x 0.75) x clock time
= 1.75 x IC x Cycle time
Computer
MAC/VU-Advancedwith no cache misses is 1.75 times faster
Computer Architecture

Lecture 27 Memory Hierarchy (3)

17


Block Size Tradeoff: Miss Rate
Miss
Rate

Exploits Spatial Locality
Fewer blocks:
compromises
temporal locality


Block Size

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

18


Block Size Tradeoff: Miss Rate

• Miss rate probably will go to infinity. It is true that
if an item is accessed, it is likely that it will be
accessed again soon.

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

19


Block Size Tradeoff: Miss Rate
This is called the ping pong effect
The data is acting like a ping pong ball bouncing
in and out of the cache.
MISS RATE is not the only cache performance
metrics, we have to worry about the miss penalty


MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

20


Block Size Tradeoff:
Miss Penalty

Miss
Penalty

Block Size

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

21


Block Size Tradeoff: Average Access Time
Block size passes a certain point, the miss
rate actually goes up.
Block size passes a certain point, the miss
rate actually goes up.

Average Access Time
Performance metric than the miss rate or miss
penalty

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

22


Block Size Tradeoff: Average Access Time

Miss Miss
Rate Penalty

Average
Access
Time
Increased Miss Penalty
& Miss Rate

Block Size

Block Size
MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)


23


Block Size Tradeoff: Average Access Time
Not only is the miss penalty is
increasing,
Miss rate is increasing as well

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

24


How Do you Design a Cache?
Physical Address
Read/Write

Memory
“Black Box”

Inside it has:
Tag-Data Storage,
Muxes,
Comparators, . . .

Data


– read: data <= Mem [Physical Address]
– write: Mem [ Physical Address] <=
Data

MAC/VU-Advanced
Computer Architecture

Lecture 27 Memory Hierarchy (3)

25


×