Chapter 8
CPU and Memory:
Design, Implementation, and Enhancement
The Architecture of Computer
Hardware and Systems Software:
An Information Technology Approach
3rd Edition, Irv Englander
John Wiley and Sons
2003
CPU Architecture Overview
CISC – Complex Instruction Set Computer
RISC – Reduced Instruction Set Computer
CISC vs. RISC Comparisons
VLIW – Very Long Instruction Word
EPIC – Explicitly Parallel Instruction
Computer
Chapter 8: CPU and Memory:
8-2
CISC Architecture
Examples
Intel x86, IBM Z-Series Mainframes, older
CPU architectures
Characteristics
Few general purpose registers
Many addressing modes
Large number of specialized, complex
instructions
Instructions are of varying sizes
Chapter 8: CPU and Memory:
8-3
Limitations of CISC Architecture
Complex instructions are infrequently
used by programmers and compilers
Memory references, loads and stores,
are slow and account for a significant
fraction of all instructions
Procedure and function calls are a
major bottleneck
Passing arguments
Storing and retrieving values in registers
Chapter 8: CPU and Memory:
8-4
RISC Features
Examples
Power PC, Sun Sparc, Motorola 68000
Limited and simple instruction set
Fixed length, fixed format instruction words
Enable pipelining, parallel fetches and executions
Limited addressing modes
Reduce complicated hardware
Register-oriented instruction set
Reduce memory accesses
Large bank of registers
Reduce memory accesses
Efficient procedure calls
Chapter 8: CPU and Memory:
8-5
CISC vs. RISC Processing
Chapter 8: CPU and Memory:
8-6
Circular Register Buffer
Chapter 8: CPU and Memory:
8-7
Circular Register Buffer
- After Procedure Call
Chapter 8: CPU and Memory:
8-8
CISC vs. RISC Performance
Comparison
RISC Simpler instructions
more instructions
more memory accesses
RISC more bus traffic and
increased cache memory misses
More registers would improve CISC
performance but no space available for them
Modern CISC and RISC architectures are
becoming similar
Chapter 8: CPU and Memory:
8-9
VLIW Architecture
Transmeta Crusoe CPU
128-bit instruction bundle = molecule
4 32-bit atoms (atom = instruction)
Parallel processing of 4 instructions
64 general purpose registers
Code morphing layer
Translates instructions written for other CPUs into
molecules
Instructions are not written directly for the Crusoe
CPU
Chapter 8: CPU and Memory:
8-10
EPIC Architecture
Intel Itanium CPU
128-bit instruction bundle
3 41-bit instructions
5 bits to identify type of instructions in bundle
128 64-bit general purpose registers
128 82-bit floating point registers
Intel X86 instruction set included
Programmers and compilers follow guidelines
to ensure parallel execution of instructions
Chapter 8: CPU and Memory:
8-11
Paging
Managed by the operating system
Built into the hardware
Independent of application
Chapter 8: CPU and Memory:
8-12
Logical vs. Physical Addresses
Logical addresses are relative locations
of data, instructions and branch target
and are separate from physical
addresses
Logical addresses mapped to physical
addresses
Physical addresses do not need to be
consecutive
Chapter 8: CPU and Memory:
8-13
Logical vs. Physical Address
Chapter 8: CPU and Memory:
8-14
Page Address Layout
Chapter 8: CPU and Memory:
8-15
Page Translation Process
Chapter 8: CPU and Memory:
8-16
Memory Enhancements
Memory is slow compared to CPU processing
speeds!
2Ghz CPU = 1 cycle in ½ of a billionth of a second
70ns DRAM = 1 access in 70 millionth of a second
Methods to improvement memory accesses
Wide Path Memory Access
Retrieve multiple bytes instead of 1 byte at a time
Memory Interleaving
Partition memory into subsections, each with its own
address register and data register
Cache Memory
Chapter 8: CPU and Memory:
8-17
Memory Interleaving
Chapter 8: CPU and Memory:
8-18
Why Cache?
Even the fastest hard disk has an
access time of about 10 milliseconds
2Ghz CPU waiting 10 milliseconds
wastes 20 million clock cycles!
Chapter 8: CPU and Memory:
8-19
Cache Memory
Blocks: 8 or 16 bytes
Tags: location in main memory
Cache controller
hardware that checks tags
Cache Line
Unit of transfer between storage and cache memory
Hit Ratio: ratio of hits out of total requests
Synchronizing cache and memory
Write through
Write back
Chapter 8: CPU and Memory:
8-20
Step-by-Step Use of Cache
Chapter 8: CPU and Memory:
8-21
Step-by-Step Use of Cache
Chapter 8: CPU and Memory:
8-22
Performance Advantages
Hit ratios of 90% common
50%+ improved execution speed
Locality of reference is why caching works
Most memory references confined to small region of
memory at any given time
Well-written program in small loop, procedure or
function
Data likely in array
Variables stored together
Chapter 8: CPU and Memory:
8-23
Two-level Caches
Why do the sizes of the caches have to be
different?
Chapter 8: CPU and Memory:
8-24
Cache vs. Virtual Memory
Cache speeds up memory access
Virtual memory increases amount of
perceived storage
independence from the configuration and
capacity of the memory system
low cost per bit
Chapter 8: CPU and Memory:
8-25