Advanced Computer Architecture - Lecture 32: Memory hierarchy design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.58 MB, 56 trang )

CS 704
Advanced Computer Architecture

Lecture 32
Memory Hierarchy Design
(Main and Virtual Memories)

Prof. Dr. M. Ashraf Chughtai

Today’s Topics
Recap: Memory Hierarchy and Cache performance
Main Memory Performance
Virtual Memory Performance
Summary

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

2

Recap: Memory Hierarchy

design goal of memory system

Low cost as of cheapest memory fast
speed as of fastest memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

3

Recap: Memory Hierarchy
The fastest, smallest and most costly memories
The slowest, biggest and cheapest memories

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

4

Recap: Memory Hierarchy
– Average access speed
– Cost
– Cheapest technology
Semiconductor memories
Static and Dynamic RAMs
Upper levels in the memory hierarchy

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

5

Recap: Caches Design
The Caches use Static Random Access
Memory
Main Memory is Dynamic Random Access
Memory (DRAM)
(~8 ms, <5% time)
The magnetic, optical or other medias
virtual memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

6

Recap: Cache Design
Cache and main memory are organized in
equal sized blocks
Word transfer
Bock transfer
The CPU requests contents of main
memory

Word transfer is fast
MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

7

Recap: Cache Performance
If misses
Miss penalty

Cache design and the performance
Techniques
– Miss rate
– Miss penalty
– Hit time

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

8

Main Memory Organization
Organizations of main memory

Source for Caches
Destination virtual memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

9

DRAM logical organization (4 M Bit)

MAC/VU-Advanced
Computer Architecture

Sense Amps & I/O

Data In

D

Data Out

Row Decoder

A0…A10

Address Buffer

11

Bit Line

Column Decoder

Q

Memory
Array
(2,048 x 2,048)
Word Line

Lec. 32– Memory Hierarchy Design (8)

Storage Cell
10

Main Memory Performance
Performance of DRAM
1:Fast page mode DRAM
2:Synchronous DRAM
3:Double Data Rate DRAM

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

11

Main Memory Performance
Fast page mode
Optimizes sequential access
Synchronous DRAM (SDRAM)
Avoid handshaking
Double Data Rate (DDR) DRAM
Transmit data

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

12

Main Memory Performance
latency
Average memory access time
Bandwidth
Number of bytes read/write per unit
time
Access Time
Cycle Time
MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

13

Main Memory Performance
Inputs/outputs and multiprocessors
Low-latency memory
Multiprocessor demand higher bandwidth
2nd level caches with larger block size

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

14

Improving Main Memory Performance
The most commonly used techniques are
– Wider Main Memory
– Simple Interleaved Memory
– Independent Memory Banks

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

15

1: Wider Main Memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

16

1: Wider Main Memory

L1
cache

Wider L2 Cache

Main
Memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

17

1: Wider Main Memory: Example
4 words (i.e. 32 byte) block
–
–
–

Time to send address
Time to send the data word
Access time per word

=
=
=

4 clock cycles
4 clock cycles
56 clock cycles

Miss Penalty =
No. of words x [time to: send address + send data word +
access word]

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

18

1: Wider Main Memory
1: For 1 word organization
Miss Penalty = 4 x (4 +4+56) = 4 x (64)
= 256 Clock Cycles;
The memory bandwidth = bytes/clock cycle
= 32/256 = 1/8 byte /cycle

2: For 4-word organization
Miss Penalty = 1 x (4 +4+56) = 64 Clock Cycles; and
Memory bandwidth = 32/64 = 1/2 bytes/cycle;

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

19

1: Wider Main Memory: Demerits

L1
cache

Wider L2 Cache

Main
Memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

20

2: Interleaved Memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

21

2: Interleaved Memory

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

22

2: Interleaved Memory
–

–
–
–

bank 0 has all word whose: Address MOD 4 = 0
bank 1 has all word whose: Address MOD 4 = 1
bank 2 has all word whose: Address MOD 4 = 2
bank 3 has all word whose: Address MOD 4 = 3

Word
address

Bank 0

0
4
8
12
MAC/VU-Advanced
Computer Architecture

Word
address

1
5
9
13

Bank 1

Word
address

Bank 2

2
6
10
14
Lec. 32 Memory Hierarchy Design (8)

Word
address

Bank 3

3
7
11
152
23

2: Interleaved Memory: Example
Bandwidth Calculation:
bandwidth of 4 words interleaved memory using
the time model as used in case of wider memory
The miss penalty for 4-word interleave memory is:
= time to send address + time to access +

number of banks x time to send data
= 4 + 56 + 4 x 4 =76 clock cycles
Bandwidth = 32/76 = 0.4 byte per clock
Bandwidth = 32/256= 1/8 = 0.125 byte per clock
MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

24

3: Independent Memory Banks
Memory banks offer independent accesses
Multiprocessors
I/O
CPU with Hit under n Misses
Non-blocking Caches

MAC/VU-Advanced
Computer Architecture

Lec. 32 Memory Hierarchy Design (8)

25

Advanced Computer Architecture - Lecture 32: Memory hierarchy design

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về