Tải bản đầy đủ (.pdf) (62 trang)

Advanced Computer Architecture - Lecture 36: Multiprocessors

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.71 MB, 62 trang )

CS 704
Advanced Computer Architecture

Lecture 36
Multiprocessors
(Cache Coherence Problem … Cont’d )

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Recap:
Example of Invalidation Scheme
Coherence in Distributed Memory
Architecture

Performance of Cache Coherence
Schemes
Summary
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

2


Recap: Cache Coherence Problem
Last time we discussed the sharing of
caches for multi-processing in the
symmetric shared-memory architecture,


wherein each processor has the same
relationship to the single memory
Here, we distinguished between the private
data and shared data, i.e.,
 the data used by a single processor and
 the data replicated in the caches of the multiple
processors for their simultaneous use
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

3


Recap: Cache Coherence Problem
Then we discussed cache coherence
problem in symmetric shared memory
which results due to inconsistency or
conflict in caching of shared data, being
read by the multiple processors
simultaneously
We studied the cache coherence problem
with the help of a typical shared memory
architecture where each of the processor
contained write-back cache
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)


4


Recap: Cache Coherency Problem
In write-back caches, values written back to
memory depend on which cache flushes or
writes back the value and when?
We noticed that the cache coherency
problem exists even on uniprocessors due
interaction between caches and I/O devices
However, in multiprocessors the problem is
performance-critical where the order among
multiple processes is crucial, i.e.,
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

5


Recap: Order among multiple processes
For single shared memory, with no caches,
a serial or total order is imposed on
operations to the location; and for
single shared memory, with caches, the
serial order be consistent, i.e., all
processors must see writes to the location
in the same order

Considering this we can say that in a
coherent system:
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

6


Recap: Order among multiple processes
– the operations issued by any particular
process occur in the order issued by that
process, and
– the value returned by a read is the value
written by the last write to that location in
the serial order
Then we talked about write propagation
and write serialization as the two
features of the coherent system
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

7


Recap: Multiprocessor cache Coherence
We also noticed that to implement cache

coherence the multiprocessors extend both
the bus transaction and state transition
The cache controller snoops on bus events
(write transactions) and invalidate / update
cache
Then we discussed the cache coherence
protocols, which use different techniques to
track the sharing status and maintain
coherence for multiprocessor
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

8


Recap: Coherency Solutions
The two fundamental classes of Coherence
protocols are:
– Snooping Protocols
All cache controllers monitor or snoop (spy) on
the bus to determine whether or not they have a
copy of the block that is requested on the bus

– Directory-Based Protocols
The sharing status of a block of physical
memory is kept in one location, called directory
MAC/VU-Advanced
Computer Architecture


Lec. 36 Multiprocessor (3)

9


Recap: Basic Snooping Protocols
The snooping protocols are implemented
using two techniques: write invalidate and
write broadcast
The Write Invalidate method ensures that
processor has exclusive access to the data
item before it write that item and all other
cached copies are invalidated or canceled
on write
The write broadcast approach, on the other
hand, updates all the cached copies of a
data item when that item is written
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

10


Recap: Write Invalidate versus Broadcast
We noticed that
– Invalidate requires one transaction for
multiple writes to the same word; and it

uses spatial locality, i.e., one transaction
for write to different words in the same
block; and
– Broadcast has lower latency between write
and read
Then we discussed the finite state machine
controller implementing the snooping protocols
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

11


Recap: An Example Snooping Protocol
This controller responds to the request from
the processor and from the bus based on:
– the type of the request
– Its hit or miss status in the cache; and
– State of the cache block specified in the request

Furthermore, each block of memory is in
one of the three states: Shared, Exclusive or
Invalid (Not in any caches) and each cache
block tracks these three states
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)


12


Example: Working of Finite State Machine Controller
Today we will continue our discussion on
the finite state machine controller for the
implementation of snooping protocol;
and will try to understand its working with
the help of example
Here, we assume that two processors P1
and P2 each having its own cache, share
the main memory connected on bus
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

13


Example: Working of Finite State Machine Controller
The status of the processors, bus
transaction and the memory is depicted in a
table for each step of the state machine
Here, the state of the machine for each
processor and cache address and value
cached, the bus action and shared-memory
status is shown for each step of operation
Initially the cache state is invalid (i.e., the

block of memory is not in the cache); and …
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

14


Example: Working of Finite State Machine Controller
memory blocks A1 and A2 map to the same
cache block where the address A1 is not
equal to A2
At Step 1 – P1 writes 10 to A1
write miss on bus occurs and the state
transition from invalid to exclusive takes
place

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

15


Example: Working of Finite State Machine Controller

MAC/VU-Advanced
Computer Architecture


Lec. 36 Multiprocessor (3)

16


Example: Working of Finite State Machine Controller
At Step 2 – P1 reads A1
CPU read HITs occurs, hence the FSM Stays
in exclusive state

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

17


Example: Working of Finite State Machine Controller
At Step 3: P2 reads A1
i) As P2 is initially in invalid state, therefore,

read miss on the bus occurs; the controller
state changes from invalid to Shared

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)


18


Example: Working of Finite State Machine Controller

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

19


Example: Working of Finite State Machine Controller
ii) P1 being in Exclusive state, remote read
write-back is asserted and the state
changes from exclusive to Shared; and
iii) the value (10) is read 1 from the sharedmemory at address A1, into P1 and P2
caches at A1; and both P1 and P2
controllers are in shared state

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

20



Example: Working of Finite State Machine Controller
At Step 4: P2 write 20 to A2
i) P1 find a remote write, so the state of the

controller changes from shared to Invalid
ii) P2 find a CPU write, so places write miss on

the bus and changes the state from shared
to exclusive and writes value 20 to A1
iii)The memory address to A1 with value A1

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

21


Example: Working of Finite State Machine Controller

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

22


Example: Working of Finite State Machine Controller

At Step 5: P2 write 40 to A2
i) P2 being in Exclusive state, CPU write Miss

occurs, and initiates write-back to P2 at A2
ii) P2 remains in Exclusive state, with address

A2 and value 40

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

23


Example: Working of Finite State Machine Controller

MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

24


Implementation Complications
With this example, we have observed that
the finite state machine implementation of
the snooping protocols works well

However, the following implementation
complications have been observed
Write Races
Interventions and invalidations
MAC/VU-Advanced
Computer Architecture

Lec. 36 Multiprocessor (3)

25


×