Tải bản đầy đủ (.pdf) (55 trang)

Advanced Computer Architecture - Lecture 35: Multiprocessors

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.46 MB, 55 trang )

CS 704
Advanced Computer Architecture

Lecture 35
Multiprocessors
(Cache Coherence Problem)

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Recap:
Multiprocessor Cache Coherence
Enforcing Coherence in:


Symmetric Shared Memory Architecture



Distributed Memory Architecture

Performance of Cache Coherence
Schemes
Summary
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

2




Recap: Parallel Processing Architecture
Last time we introduced the concept of
Parallel Processing to improve the
computer performance
Parallel Architecture is a collection of
processing elements that cooperate
and communicate to solve larger
problems fast
We discussed Flynn’s four categories
of computers which form the basis ….
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

3


Recap: Parallel Computer Categories
……. to implement the programming and
communication models for parallel computing
These categories are:
– SISD (Single Instruction Single Data)
– SIMD (Single Instruction Multiple Data)
– MISD (Multiple Instruction Single Data)
– MIMD (Multiple Instruction Multiple Data)
The MIMD machines implement Parallel
processing architecture

MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

4


Recap: MIMD Classification
We noticed that based on the memory
organization and interconnect strategy, the
MIMD machines are classified as:
- Centralized Shared Memory Architecture
Here, the subsystems share the same
physical centralized memory connected by
a bus
The key architectural property of this design is
the Uniform Memory Access – UMA; i.e., the
access time to all memory from all the
processors is same

MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

5


Recap: MIMD Classification

– Distributed Memory Architecture

It consists of number of individual nodes
containing a processors, some memory
and I/O and an interface to an
interconnection network that connects
all the nodes
The distributed memory provides more
memory bandwidth and lower memory
latency
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

6


Recap: Framework for Parallel processing
Last time we also studied a framework for
parallel architecture
The framework defines the programming
and communication Models for centralized
shared-memory and distributed memory
parallel processing architectures
These models present address space
sharing and message passing in parallel
architecture
MAC/VU-Advanced
Computer Architecture


Lec. 35 Multiprocessor (2)

7


Recap: Framework for Parallel processing
Here, we noticed that the shared-memory
communication model has compatibility
with the SMP hardware; and
offers ease of programming when
communication patterns are complex or
vary dynamically during execution
While the message-passing communication
model has explicit Communication which is
simple to understand; and is easier to use
sender-initiated communication
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

8


Multiprocessor Cache Sharing

Today, we will look into the sharing of
caches for multi-processing in the
symmetric shared-memory architecture

The symmetric shared memory architecture
is one where each processor has the same
relationship to the single memory
Small-scale shared-memory machines
usually support caching of both the private
data as well as the shared data
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

9


Multiprocessor Cache Sharing
The private data is used by a single
processor, while the shared data is
replicated in the caches of the multiple
processors for their simultaneous use
It is obvious that the program behavior for
caching of private data is identical to the
that of a Uniprocessor, as no other
processor uses the same data,
i.e., no other processor cache has copy of
the same data
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)


10


Multiprocessor Cache Coherence
Whereas when shared data are cached the
shared value may be replicated in multiple
caches
This results in reduction in access latency
and fulfill the bandwidth requirements,
but, due to difference in the communication
for load/store and strategy to write in the
caches, values in different caches may not
be consistent, i.e.,

MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

11


Multiprocessor Cache Coherence
There may be conflict (or inconsistency) for
the shared data being read by the multiple
processors simultaneously
This conflict or contention in caching of
sheared data is referred to as the cache
coherence problem
Informally, we can say that memory system

is coherent if any read of a data item
returns the most recently written value of
that data item
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

12


Multiprocessor Cache Coherence
This definition contains two aspects of
memory behavior:
 Coherence that defines what value can be

returned by a read?
 Consistency that determines when a written

value will be returned by a read?
Let us explain the cache coherence
problem with the help of a typical shared
memory architecture shown here!
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

13



Multiprocessor Cache Coherence

MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

14


Cache Coherency Problem?
Note that here the processors P1, P2, P3
see old values in their caches as there exist
several alternative to write to caches!
For example, in write-back caches, value
written back to memory depends on which
cache flushes or writes back value (and
when);
i.e., value returned depends on the program
order, program issue order or order of
completion etc.
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

15



Cache Coherency Problem?
The cache coherency problem exists even
on uniprocessors where due interaction
between caches and I/O devices the
infrequent software solutions work well
However, the problem is performancecritical in multiprocessors where the order
among multiple processes is crucial and
needs to be treated as a basic hardware
design issue
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

16


Order among multiple processes?
Now let us discuss what does order among
multiple processes means!
Firstly, let us consider a single shared
memory, with no caches
– Here, every read/write to a location
accesses the same physical location and
the operation completes at the time when
it does so
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)


17


Order among multiple processes?
This means that a single shared memory,
with no caches, imposes a serial or total
order on operations to the location, i.e.,
– the operations to the location from a given

processor are in program order; and
– the order of operations to the location from

different processors is some interleaving
that preserves the individual program
orders
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

18


Order among multiple processes?
Now, let us discuss the case of a single
shared memory, with caches
Here, the latest means the most recent in a
serial order with operations to a location
from a given processor in program order

Note that for the serial order to be
consistent, all processors must see writes
to the location in the same order
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

19


Formal Definition of Coherence!
With this much discussion on the cache
coherence problem, we can say that
A memory system is coherent
if the results of any execution of a program
are such that for each location,
it is possible to construct a hypothetical
serial order of all operations to the location
that is consistent with the results of the
execution
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

20


Formal Definition of Coherence!

In a coherent system
– the operations issued by any particular
process occur in the order issued by that
process, and
– the value returned by a read is the value
written by the last write to that location in
the serial order

MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

21


Features of Coherent System
Two features of a coherent system are:
– write propagation: value written must
become visible to others, i.e.,
any write must eventually be seen by a
read
– write serialization: writes to a location seen
in the same order by all
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

22



Cache Coherence on buses
Bus transactions and Cache state transitions are
the fundamentals of Uniprocessor systems
Bus transaction passes through three phases:
arbitration, command/address, data transfer
Cache State transition deals with every block as a
finite state machine
– The write-through, write no-allocate caches
have two states: valid, invalid
– write-back caches have one more state:
modified (“dirty”)
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

23


Multiprocessor cache Coherence
Multiprocessors extend both the bus transaction
and state transition to implement cache coherence

MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)


24


Coherence with write-through caches!
Here, the controller snoops on bus events
(write transactions) and invalidate / update
cache
As in case of write-through, the memory is always
up-to-date therefore invalidation causes next read
to miss and fetch new value from memory, so the
bus transaction is indeed write propagation

The Bus transactions impose write
serialization as the writes are seen in the
same order
MAC/VU-Advanced
Computer Architecture

Lec. 35 Multiprocessor (2)

25


×