University
of
Washington
Sec3on
7:
Memory
and
Caches
¢
¢
¢
¢
¢
Cache
basics
Principle
of
locality
Memory
hierarchies
Cache
organiza3on
Program
op3miza3ons
that
consider
caches
Cache
Organiza3on
University
of
Washington
General
Cache
Organiza3on
(S,
E,
B)
E
=
2e
lines
per
set
set
line
S
=
2s
sets
v
valid
bit
tag
0
1
2
B-‐1
cache
size:
S
x
E
x
B
data
bytes
B
=
2b
bytes
of
data
per
cache
line
(the
data
block)
Cache
Organiza3on
University
of
Washington
Cache
Read
E
=
2e
lines
per
set
• Locate
set
• Check
if
any
line
in
set
has
matching
tag
• Yes
+
line
valid:
hit
• Locate
data
star?ng
at
offset
Address
of
byte
in
memory:
S
=
2s
sets
t
bits
s
bits
b
bits
tag
set
block
index
offset
data
begins
at
this
offset
v
valid
bit
tag
0
1
2
B-‐1
B
=
2b
bytes
of
data
per
cache
line
(the
data
block)
Cache
Organiza3on
University
of
Washington
Example:
Direct-‐Mapped
Cache
(E
=
1)
Direct-‐mapped:
One
line
per
set
Assume:
cache
block
size
8
bytes
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
S
=
2s
sets
Cache
Organiza3on
Address
of
int:
t
bits
0…01
find
set
100
University
of
Washington
Example:
Direct-‐Mapped
Cache
(E
=
1)
Direct-‐mapped:
One
line
per
set
Assume:
cache
block
size
8
bytes
Address
of
int:
valid?
+
match?:
yes
=
hit
v
tag
0
1
2
3
t
bits
4
5
6
7
block
offset
Cache
Organiza3on
0…01
100
University
of
Washington
Example:
Direct-‐Mapped
Cache
(E
=
1)
Direct-‐mapped:
One
line
per
set
Assume:
cache
block
size
8
bytes
Address
of
int:
valid?
+
match?:
yes
=
hit
v
tag
0
1
2
3
t
bits
4
5
6
7
block
offset
int
(4
Bytes)
is
here
No
match:
old
line
is
evicted
and
replaced
Cache
Organiza3on
0…01
100
University
of
Washington
E-‐way
Set-‐Associa3ve
Cache
(Here:
E
=
2)
E
=
2:
Two
lines
per
set
Assume:
cache
block
size
8
bytes
Address
of
short
int:
t
bits
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
Cache
Organiza3on
0…01
100
find
set
University
of
Washington
E-‐way
Set-‐Associa3ve
Cache
(Here:
E
=
2)
E
=
2:
Two
lines
per
set
Assume:
cache
block
size
8
bytes
Address
of
short
int:
t
bits
compare
both
valid?
+
match:
yes
=
hit
v
tag
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
block
offset
Cache
Organiza3on
0…01
100
University
of
Washington
E-‐way
Set-‐Associa3ve
Cache
(Here:
E
=
2)
E
=
2:
Two
lines
per
set
Assume:
cache
block
size
8
bytes
Address
of
short
int:
t
bits
compare
both
0…01
valid?
+
match:
yes
=
hit
v
tag
0 1 2 3 4 5 6 7
v
tag
0 1 2 3 4 5 6 7
block
offset
short
int
(2
Bytes)
is
here
No
match:
• One
line
in
set
is
selected
for
evic3on
and
replacement
• Replacement
policies:
random,
least
recently
used
(LRU),
…
Cache
Organiza3on
100
University
of
Washington
Types
of
Cache
Misses
¢
Cold
(compulsory)
miss
Đ Occurs
on
rst
access
to
a
block
Â
Conict
miss
§ Most
hardware
caches
limit
blocks
to
a
small
subset
(some7mes
just
one)
of
the
available
cache
slots
§ if
one
(e.g.,
block
i
must
be
placed
in
slot
(i
mod
size)),
direct-‐mapped
§ if
more
than
one,
n-‐way
set-‐associa7ve
(where
n
is
a
power
of
2)
§ Conflict
misses
occur
when
the
cache
is
large
enough,
but
mul7ple
data
objects
all
map
to
the
same
slot
§ e.g.,
referencing
blocks
0,
8,
0,
8,
...
would
miss
every
7me
Â
Capacity
miss
Đ Occurs
when
the
set
of
ac7ve
cache
blocks
(the
working
set)
is
larger
than
the
cache
(just
won’t
fit)
Cache
Organiza3on
University
of
Washington
What
about
writes?
¢
Mul3ple
copies
of
data
exist:
Đ L1,
L2,
possibly
L3,
main
memory
Â
What
is
the
main
problem
with
that?
Cache
Organiza3on
University
of
Washington
What
about
writes?
¢
Mul3ple
copies
of
data
exist:
Đ L1,
L2,
possibly
L3,
main
memory
Â
What
to
do
on
a
write-‐hit?
§ Write-‐through
(write
immediately
to
memory)
§ Write-‐back
(defer
write
to
memory
un7l
line
is
evicted)
Đ
Â
Need
a
dirty
bit
to
indicate
if
line
is
dierent
from
memory
or
not
What
to
do
on
a
write-‐miss?
§ Write-‐allocate
(load
into
cache,
update
line
in
cache)
Good
if
more
writes
to
the
loca7on
follow
§ No-‐write-‐allocate
(just
write
immediately
to
memory)
Đ
Â
Typical
caches:
Đ Write-ưback
+
Write-ưallocate,
usually
§ Write-‐through
+
No-‐write-‐allocate,
occasionally
Cache
Organiza3on
University
of
Washington
Intel
Core
i7
Cache
Hierarchy
Processor package
Core 0
Core 3
Regs
L1
d-cache
Regs
L1
i-cache
L1
d-cache
…
L2 unified cache
L1
i-cache
L2 unified cache
L3 unified cache
(shared by all cores)
Main memory
Cache
Organiza3on
L1
i-‐cache
and
d-‐cache:
32
KB,
8-‐way,
Access:
4
cycles
L2
unified
cache:
256
KB,
8-‐way,
Access:
11
cycles
L3
unified
cache:
8
MB,
16-‐way,
Access:
30-‐40
cycles
Block
size:
64
bytes
for
all
caches.