CS 704
Advanced Computer Architecture
Lecture 32
Memory Hierarchy Design
(Main and Virtual Memories)
Prof. Dr. M. Ashraf Chughtai
Today’s Topics
Recap: Memory Hierarchy and Cache performance
Main Memory Performance
Virtual Memory Performance
Summary
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
2
Recap: Memory Hierarchy
design goal of memory system
Low cost as of cheapest memory fast
speed as of fastest memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
3
Recap: Memory Hierarchy
The fastest, smallest and most costly memories
The slowest, biggest and cheapest memories
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
4
Recap: Memory Hierarchy
– Average access speed
– Cost
– Cheapest technology
Semiconductor memories
Static and Dynamic RAMs
Upper levels in the memory hierarchy
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
5
Recap: Caches Design
The Caches use Static Random Access
Memory
Main Memory is Dynamic Random Access
Memory (DRAM)
(~8 ms, <5% time)
The magnetic, optical or other medias
virtual memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
6
Recap: Cache Design
Cache and main memory are organized in
equal sized blocks
Word transfer
Bock transfer
The CPU requests contents of main
memory
Word transfer is fast
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
7
Recap: Cache Performance
If misses
Miss penalty
Cache design and the performance
Techniques
– Miss rate
– Miss penalty
– Hit time
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
8
Main Memory Organization
Organizations of main memory
Source for Caches
Destination virtual memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
9
DRAM logical organization (4 M Bit)
MAC/VU-Advanced
Computer Architecture
Sense Amps & I/O
Data In
D
Data Out
Row Decoder
A0…A10
Address Buffer
11
Bit Line
Column Decoder
Q
Memory
Array
(2,048 x 2,048)
Word Line
Lec. 32– Memory Hierarchy Design (8)
Storage Cell
10
Main Memory Performance
Performance of DRAM
1:Fast page mode DRAM
2:Synchronous DRAM
3:Double Data Rate DRAM
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
11
Main Memory Performance
Fast page mode
Optimizes sequential access
Synchronous DRAM (SDRAM)
Avoid handshaking
Double Data Rate (DDR) DRAM
Transmit data
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
12
Main Memory Performance
latency
Average memory access time
Bandwidth
Number of bytes read/write per unit
time
Access Time
Cycle Time
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
13
Main Memory Performance
Inputs/outputs and multiprocessors
Low-latency memory
Multiprocessor demand higher bandwidth
2nd level caches with larger block size
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
14
Improving Main Memory Performance
The most commonly used techniques are
– Wider Main Memory
– Simple Interleaved Memory
– Independent Memory Banks
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
15
1: Wider Main Memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
16
1: Wider Main Memory
L1
cache
Wider L2 Cache
Main
Memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
17
1: Wider Main Memory: Example
4 words (i.e. 32 byte) block
–
–
–
Time to send address
Time to send the data word
Access time per word
=
=
=
4 clock cycles
4 clock cycles
56 clock cycles
Miss Penalty =
No. of words x [time to: send address + send data word +
access word]
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
18
1: Wider Main Memory
1: For 1 word organization
Miss Penalty = 4 x (4 +4+56) = 4 x (64)
= 256 Clock Cycles;
The memory bandwidth = bytes/clock cycle
= 32/256 = 1/8 byte /cycle
2: For 4-word organization
Miss Penalty = 1 x (4 +4+56) = 64 Clock Cycles; and
Memory bandwidth = 32/64 = 1/2 bytes/cycle;
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
19
1: Wider Main Memory: Demerits
L1
cache
Wider L2 Cache
Main
Memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
20
2: Interleaved Memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
21
2: Interleaved Memory
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
22
2: Interleaved Memory
–
–
–
–
bank 0 has all word whose: Address MOD 4 = 0
bank 1 has all word whose: Address MOD 4 = 1
bank 2 has all word whose: Address MOD 4 = 2
bank 3 has all word whose: Address MOD 4 = 3
Word
address
Bank 0
0
4
8
12
MAC/VU-Advanced
Computer Architecture
Word
address
1
5
9
13
Bank 1
Word
address
Bank 2
2
6
10
14
Lec. 32 Memory Hierarchy Design (8)
Word
address
Bank 3
3
7
11
152
23
2: Interleaved Memory: Example
Bandwidth Calculation:
bandwidth of 4 words interleaved memory using
the time model as used in case of wider memory
The miss penalty for 4-word interleave memory is:
= time to send address + time to access +
number of banks x time to send data
= 4 + 56 + 4 x 4 =76 clock cycles
Bandwidth = 32/76 = 0.4 byte per clock
Bandwidth = 32/256= 1/8 = 0.125 byte per clock
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
24
3: Independent Memory Banks
Memory banks offer independent accesses
Multiprocessors
I/O
CPU with Hit under n Misses
Non-blocking Caches
MAC/VU-Advanced
Computer Architecture
Lec. 32 Memory Hierarchy Design (8)
25