Tải bản đầy đủ (.pdf) (756 trang)

Distributed principles computing

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.81 MB, 756 trang )


This page intentionally left blank


Distributed Computing
Principles, Algorithms, and Systems

Distributed computing deals with all forms of computing, information access,
and information exchange across multiple processing platforms connected
by computer networks. Design of distributed computing systems is a complex task. It requires a solid understanding of the design issues and an
in-depth understanding of the theoretical and practical aspects of their solutions. This comprehensive textbook covers the fundamental principles and
models underlying the theory, algorithms, and systems aspects of distributed
computing.
Broad and detailed coverage of the theory is balanced with practical
systems-related problems such as mutual exclusion, deadlock detection,
authentication, and failure recovery. Algorithms are carefully selected, lucidly
presented, and described without complex proofs. Simple explanations and
illustrations are used to elucidate the algorithms. Emerging topics of significant impact, such as peer-to-peer networks and network security, are also
covered.
With state-of-the-art algorithms, numerous illustrations, examples, and
homework problems, this textbook is invaluable for advanced undergraduate
and graduate students of electrical and computer engineering and computer
science. Practitioners in data networking and sensor networks will also find
this a valuable resource.
Ajay D. Kshemkalyani is an Associate Professor in the Department of Computer Science, at the University of Illinois at Chicago. He was awarded his
Ph.D. in Computer and Information Science in 1991 from The Ohio State
University. Before moving to academia, he spent several years working on
computer networks at IBM Research Triangle Park. In 1999, he received the
National Science Foundation’s CAREER Award. He is a Senior Member of
the IEEE, and his principal areas of research include distributed computing,
algorithms, computer networks, and concurrent systems. He currently serves


on the editorial board of Computer Networks.
Mukesh Singhal is Full Professor and Gartner Group Endowed Chair in Network Engineering in the Department of Computer Science at the University
of Kentucky. He was awarded his Ph.D. in Computer Science in 1986 from
the University of Maryland, College Park. In 2003, he received the IEEE


Technical Achievement Award, and currently serves on the editorial boards
for the IEEE Transactions on Parallel and Distributed Systems and the IEEE
Transactions on Computers. He is a Fellow of the IEEE, and his principal
areas of research include distributed systems, computer networks, wireless and
mobile computing systems, performance evaluation, and computer security.


Distributed Computing
Principles, Algorithms, and
Systems

Ajay D. Kshemkalyani
University of Illinois at Chicago, Chicago

and

Mukesh Singhal
University of Kentucky, Lexington


CAMBRIDGE UNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press

The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521876346
© Cambridge University Press 2008
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published in print format 2008

ISBN-13 978-0-511-39341-9

eBook (EBL)

ISBN-13

hardback

978-0-521-87634-6

Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.


To my father Shri Digambar and
my mother Shrimati Vimala.
Ajay D. Kshemkalyani
To my mother Chandra Prabha Singhal,
my father Brij Mohan Singhal, and my

daughters Meenakshi, Malvika,
and Priyanka.
Mukesh Singhal



Contents

Preface
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12

2
2.1
2.2
2.3
2.4
2.5
2.6

2.7
2.8
2.9
2.10

page xv

Introduction
Definition
Relation to computer system components
Motivation
Relation to parallel multiprocessor/multicomputer systems
Message-passing systems versus shared memory systems
Primitives for distributed communication
Synchronous versus asynchronous executions
Design issues and challenges
Selection and coverage of topics
Chapter summary
Exercises
Notes on references
References

1
1
2
3
5
13
14
19

22
33
34
35
36
37

A model of distributed computations
A distributed program
A model of distributed executions
Models of communication networks
Global state of a distributed system
Cuts of a distributed computation
Past and future cones of an event
Models of process communications
Chapter summary
Exercises
Notes on references
References

39
39
40
42
43
45
46
47
48
48

48
49


viii

Contents

3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12

4
4.1
4.2
4.3
4.4
4.5
4.6
4.7

4.8
4.9
4.10
4.11
4.12

5
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10

Logical time
Introduction
A framework for a system of logical clocks
Scalar time
Vector time
Efficient implementations of vector clocks
Jard–Jourdan’s adaptive technique
Matrix time
Virtual time
Physical clock synchronization: NTP
Chapter summary
Exercises

Notes on references
References
Global state and snapshot recording algorithms
Introduction
System model and definitions
Snapshot algorithms for FIFO channels
Variations of the Chandy–Lamport algorithm
Snapshot algorithms for non-FIFO channels
Snapshots in a causal delivery system
Monitoring global state
Necessary and sufficient conditions for consistent global
snapshots
Finding consistent global snapshots in a distributed
computation
Chapter summary
Exercises
Notes on references
References
Terminology and basic algorithms
Topology abstraction and overlays
Classifications and basic concepts
Complexity measures and metrics
Program structure
Elementary graph algorithms
Synchronizers
Maximal independent set (MIS)
Connected dominating set
Compact routing tables
Leader election


50
50
52
53
55
59
65
68
69
78
81
84
84
84
87
87
90
93
97
101
106
109
110
114
121
122
122
123

126

126
128
135
137
138
163
169
171
172
174


ix

Contents

5.11
5.12
5.13
5.14
5.15

Challenges in designing distributed graph algorithms
Object replication problems
Chapter summary
Exercises
Notes on references
References

175

176
182
183
185
186

6

Message ordering and group communication
Message ordering paradigms
Asynchronous execution with synchronous communication
Synchronous program order on an asynchronous system
Group communication
Causal order (CO)
Total order
A nomenclature for multicast
Propagation trees for multicast
Classification of application-level multicast algorithms
Semantics of fault-tolerant group communication
Distributed multicast algorithms at the network layer
Chapter summary
Exercises
Notes on references
References

189
190
195
200
205

206
215
220
221
225
228
230
236
236
238
239

Termination detection
Introduction
System model of a distributed computation
Termination detection using distributed snapshots
Termination detection by weight throwing
A spanning-tree-based termination detection algorithm
Message-optimal termination detection
Termination detection in a very general distributed computing
model
Termination detection in the atomic computation model
Termination detection in a faulty distributed system
Chapter summary
Exercises
Notes on references
References

241
241

242
243
245
247
253

Reasoning with knowledge
The muddy children puzzle
Logic of knowledge

282
282
283

6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14

7

7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12

8
8.1
8.2

257
263
272
279
279
280
280


x

Contents


8.3
8.4
8.5
8.6
8.7
8.8
8.9

Knowledge in synchronous systems
Knowledge in asynchronous systems
Knowledge transfer
Knowledge and clocks
Chapter summary
Exercises
Notes on references
References

289
290
298
300
301
302
303
303

9

Distributed mutual exclusion algorithms
Introduction

Preliminaries
Lamport’s algorithm
Ricart–Agrawala algorithm
Singhal’s dynamic information-structure algorithm
Lodha and Kshemkalyani’s fair mutual exclusion algorithm
Quorum-based mutual exclusion algorithms
Maekawa’s algorithm
Agarwal–El Abbadi quorum-based algorithm
Token-based algorithms
Suzuki–Kasami’s broadcast algorithm
Raymond’s tree-based algorithm
Chapter summary
Exercises
Notes on references
References

305
305
306
309
312
315
321
327
328
331
336
336
339
348

348
349
350

Deadlock detection in distributed systems
Introduction
System model
Preliminaries
Models of deadlocks
Knapp’s classification of distributed deadlock detection
algorithms
10.6
Mitchell and Merritt’s algorithm for the singleresource model
10.7
Chandy–Misra–Haas algorithm for the AND model
10.8
Chandy–Misra–Haas algorithm for the OR model
10.9
Kshemkalyani–Singhal algorithm for the P-out-of-Q model
10.10 Chapter summary
10.11 Exercises
10.12 Notes on references
References

352
352
352
353
355


9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15

10
10.1
10.2
10.3
10.4
10.5

358
360
362
364
365
374
375

375
376


xi

Contents

11

Global predicate detection
Stable and unstable predicates
Modalities on predicates
Centralized algorithm for relational predicates
Conjunctive predicates
Distributed algorithms for conjunctive predicates
Further classification of predicates
Chapter summary
Exercises
Notes on references
References

379
379
382
384
388
395
404
405

406
407
408

12
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9

Distributed shared memory
Abstraction and advantages
Memory consistency models
Shared memory mutual exclusion
Wait-freedom
Register hierarchy and wait-free simulations
Wait-free atomic snapshots of shared objects
Chapter summary
Exercises
Notes on references
References

410
410
413

427
434
434
447
451
452
453
454

13

Checkpointing and rollback recovery
Introduction
Background and definitions
Issues in failure recovery
Checkpoint-based recovery
Log-based rollback recovery
Koo–Toueg coordinated checkpointing algorithm
Juang–Venkatesan algorithm for asynchronous checkpointing
and recovery
Manivannan–Singhal quasi-synchronous checkpointing
algorithm
Peterson–Kearns algorithm based on vector time
Helary–Mostefaoui–Netzer–Raynal communication-induced
protocol
Chapter summary
Exercises
Notes on references
References


456
456
457
462
464
470
476

11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9

13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
13.9
13.10
13.11
13.12

13.13

478
483
492
499
505
506
506
507


xii

Contents

14
14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9

Consensus and agreement algorithms
Problem definition
Overview of results

Agreement in a failure-free system (synchronous or
asynchronous)
Agreement in (message-passing) synchronous systems with
failures
Agreement in asynchronous message-passing systems with
failures
Wait-free shared memory consensus in asynchronous systems
Chapter summary
Exercises
Notes on references
References

15

510
510
514
515
516
529
544
562
563
564
565

Failure detectors
Introduction
Unreliable failure detectors
The consensus problem

Atomic broadcast
A solution to atomic broadcast
The weakest failure detectors to solve fundamental agreement
problems
15.7
An implementation of a failure detector
15.8
An adaptive failure detection protocol
15.9
Exercises
15.10 Notes on references
References

567
567
568
577
583
584

16

Authentication in distributed systems
Introduction
Background and definitions
Protocols based on symmetric cryptosystems
Protocols based on asymmetric cryptosystems
Password-based authentication
Authentication protocol failures
Chapter summary

Exercises
Notes on references
References

598
598
599
602
615
622
625
626
627
627
628

Self-stabilization
Introduction
System model

631
631
632

15.1
15.2
15.3
15.4
15.5
15.6


16.1
16.2
16.3
16.4
16.5
16.6
16.7
16.8
16.9

17
17.1
17.2

585
589
591
596
596
596


xiii

Contents

17.3
17.4
17.5

17.6
17.7
17.8
17.9
17.10
17.11
17.12
17.13
17.14
17.15
17.16
17.17

18
18.1
18.2
18.3
18.4
18.5
18.6
18.7
18.8
18.9
18.10
18.11
18.12
18.13
18.14
18.15
18.16

18.17

Definition of self-stabilization
Issues in the design of self-stabilization algorithms
Methodologies for designing self-stabilizing systems
Communication protocols
Self-stabilizing distributed spanning trees
Self-stabilizing algorithms for spanning-tree construction
An anonymous self-stabilizing algorithm for 1-maximal
independent set in trees
A probabilistic self-stabilizing leader election algorithm
The role of compilers in self-stabilization
Self-stabilization as a solution to fault tolerance
Factors preventing self-stabilization
Limitations of self-stabilization
Chapter summary
Exercises
Notes on references
References

634
636
647
649
650
652
657
660
662
665

667
668
670
670
671
671

Peer-to-peer computing and overlay graphs
Introduction
Data indexing and overlays
Unstructured overlays
Chord distributed hash table
Content addressible networks (CAN)
Tapestry
Some other challenges in P2P system design
Tradeoffs between table storage and route lengths
Graph structures of complex networks
Internet graphs
Generalized random graph networks
Small-world networks
Scale-free networks
Evolving networks
Chapter summary
Exercises
Notes on references
References

677
677
679

681
688
695
701
708
710
712
714
720
720
721
723
727
727
728
729

Index

731



Preface

Background
The field of distributed computing covers all aspects of computing and information access across multiple processing elements connected by any form of
communication network, whether local or wide-area in the coverage. Since
the advent of the Internet in the 1970s, there has been a steady growth of
new applications requiring distributed processing. This has been enabled by

advances in networking and hardware technology, the falling cost of hardware, and greater end-user awareness. These factors have contributed to
making distributed computing a cost-effective, high-performance, and faulttolerant reality. Around the turn of the millenium, there was an explosive
growth in the expansion and efficiency of the Internet, which was matched
by increased access to networked resources through the World Wide Web,
all across the world. Coupled with an equally dramatic growth in the wireless
and mobile networking areas, and the plummeting prices of bandwidth and
storage devices, we are witnessing a rapid spurt in distributed applications and
an accompanying interest in the field of distributed computing in universities,
governments organizations, and private institutions.
Advances in hardware technology have suddenly made sensor networking
a reality, and embedded and sensor networks are rapidly becoming an integral
part of everyone’s life – from the home network with the interconnected
gadgets to the automobile communicating by GPS (global positioning system),
to the fully networked office with RFID monitoring. In the emerging global
village, distributed computing will be the centerpiece of all computing and
information access sub-disciplines within computer science. Clearly, this is
a very important field. Moreover, this evolving field is characterized by a
diverse range of challenges for which the solutions need to have foundations
on solid principles.
The field of distributed computing is very important, and there is a huge
demand for a good comprehensive book. This book comprehensively covers
all important topics in great depth, combining this with a clarity of explanation


xvi

Preface

and ease of understanding. The book will be particularly valuable to the
academic community and the computer industry at large. Writing such a

comprehensive book has been a Herculean task and there is a deep sense of
satisfaction in knowing that we were able complete it and perform this service
to the community.

Description, approach, and features
The book will focus on the fundamental principles and models underlying all
aspects of distributed computing. It will address the principles underlying the
theory, algorithms, and systems aspects of distributed computing. The manner
of presentation of the algorithms is very clear, explaining the main ideas and
the intuition with figures and simple explanations rather than getting entangled
in intimidating notations and lengthy and hard-to-follow rigorous proofs of
the algorithms. The selection of chapter themes is broad and comprehensive,
and the book covers all important topics in depth. The selection of algorithms
within each chapter has been done carefully to elucidate new and important
techniques of algorithm design. Although the book focuses on foundational
aspects and algorithms for distributed computing, it thoroughly addresses all
practical systems-like problems (e.g., mutual exclusion, deadlock detection,
termination detection, failure recovery, authentication, global state and time,
etc.) by presenting the theory behind and algorithms for such problems. The
book is written keeping in mind the impact of emerging topics such as
peer-to-peer computing and network security on the foundational aspects of
distributed computing.
Each chapter contains figures, examples, exercises, a summary, and
references.

Readership
This book is aimed as a textbook for the following:
• Graduate students and Senior level undergraduate students in computer
science and computer engineering.
• Graduate students in electrical engineering and mathematics. As wireless

networks, peer-to-peer networks, and mobile computing continue to grow
in importance, an increasing number of students from electrical engineering
departments will also find this book necessary.
• Practitioners, systems designers/programmers, and consultants in industry
and research laboratories will find the book a very useful reference because
it contains state-of-the-art algorithms and principles to address various
design issues in distributed systems, as well as the latest references.


xvii

Preface

Hard and soft prerequisites for the use of this book include the following:
• An undergraduate course in algorithms is required.
• Undergraduate courses in operating systems and computer networks would
be useful.
• A reasonable familiarity with programming.
We have aimed for a very comprehensive book that will act as a single
source for distributed computing models and algorithms. The book has both
depth and breadth of coverage of topics, and is characterized by clear and
easy explanations. None of the existing textbooks on distributed computing
provides all of these features.

Acknowledgements
This book grew from the notes used in the graduate courses on distributed
computing at the Ohio State University, the University of Illinois at Chicago,
and at the University of Kentucky. We would like to thank the graduate
students at these schools for their contributions to the book in many ways.
The book is based on the published research results of numerous researchers

in the field. We have made all efforts to present the material in our own
words and have given credit to the original sources of information. We would
like to thank all the researchers whose work has been reported in this book.
Finally, we would like to thank the staff of Cambridge University Press for
providing us with excellent support in the publication of this book.

Access to resources
The following websites will be maintained for the book. Any errors and
comments should be sent to or Further
information about the book can be obtained from the authors’ web pages:
• www.cs.uic.edu/∼ajayk/DCS-Book
• www.cs.uky.edu/∼singhal/DCS-Book.



CHAPTER

1

Introduction

1.1 Definition
A distributed system is a collection of independent entities that cooperate to
solve a problem that cannot be individually solved. Distributed systems have
been in existence since the start of the universe. From a school of fish to a flock
of birds and entire ecosystems of microorganisms, there is communication
among mobile intelligent agents in nature. With the widespread proliferation
of the Internet and the emerging global village, the notion of distributed
computing systems as a useful and widely deployed tool is becoming a reality.
For computing systems, a distributed system has been characterized in one of

several ways:
• You know you are using one when the crash of a computer you have never
heard of prevents you from doing work [23].
• A collection of computers that do not share common memory or a common
physical clock, that communicate by a messages passing over a communication network, and where each computer has its own memory and runs its
own operating system. Typically the computers are semi-autonomous and are
loosely coupled while they cooperate to address a problem collectively [29].
• A collection of independent computers that appears to the users of the
system as a single coherent computer [33].
• A term that describes a wide range of computers, from weakly coupled
systems such as wide-area networks, to strongly coupled systems such as
local area networks, to very strongly coupled systems such as multiprocessor systems [19].
A distributed system can be characterized as a collection of mostly
autonomous processors communicating over a communication network and
having the following features:
• No common physical clock This is an important assumption because
it introduces the element of “distribution” in the system and gives rise to
the inherent asynchrony amongst the processors.
1


2

Introduction

• No shared memory This is a key feature that requires message-passing
for communication. This feature implies the absence of the common physical clock.
It may be noted that a distributed system may still provide the abstraction
of a common address space via the distributed shared memory abstraction.
Several aspects of shared memory multiprocessor systems have also been

studied in the distributed computing literature.
• Geographical separation The geographically wider apart that the processors are, the more representative is the system of a distributed system.
However, it is not necessary for the processors to be on a wide-area network (WAN). Recently, the network/cluster of workstations (NOW/COW)
configuration connecting processors on a LAN is also being increasingly
regarded as a small distributed system. This NOW configuration is becoming popular because of the low-cost high-speed off-the-shelf processors
now available. The Google search engine is based on the NOW architecture.
• Autonomy and heterogeneity The processors are “loosely coupled”
in that they have different speeds and each can be running a different
operating system. They are usually not part of a dedicated system, but
cooperate with one another by offering services or solving a problem
jointly.

1.2 Relation to computer system components
A typical distributed system is shown in Figure 1.1. Each computer has a
memory-processing unit and the computers are connected by a communication
network. Figure 1.2 shows the relationships of the software components that
run on each of the computers and use the local operating system and network
protocol stack for functioning. The distributed software is also termed as
middleware. A distributed execution is the execution of processes across the
distributed system to collaboratively achieve a common goal. An execution
is also sometimes termed a computation or a run.
The distributed system uses a layered architecture to break down the complexity of system design. The middleware is the distributed software that
Figure 1.1 A distributed
system connects processors by
a communication network.

P M

P M


P M

P processor(s)
M memory bank(s)

Communication network
(WAN/ LAN)
P M

P M
P M

P M


Figure 1.2 Interaction of the
software components at each
processor.

1.3 Motivation

Extent of
distributed
protocols

Distributed application

Distributed software
(middleware libraries)
Application layer

Operating
system

Transport layer
Network layer

Network protocol stack

3

Data link layer

drives the distributed system, while providing transparency of heterogeneity at
the platform level [24]. Figure 1.2 schematically shows the interaction of this
software with these system components at each processor. Here we assume
that the middleware layer does not contain the traditional application layer
functions of the network protocol stack, such as http, mail, ftp, and telnet.
Various primitives and calls to functions defined in various libraries of the
middleware layer are embedded in the user program code. There exist several
libraries to choose from to invoke primitives for the more common functions – such as reliable and ordered multicasting – of the middleware layer.
There are several standards such as Object Management Group’s (OMG)
common object request broker architecture (CORBA) [36], and the remote
procedure call (RPC) mechanism [1, 11]. The RPC mechanism conceptually
works like a local procedure call, with the difference that the procedure code
may reside on a remote machine, and the RPC software sends a message
across the network to invoke the remote procedure. It then awaits a reply,
after which the procedure call completes from the perspective of the program
that invoked it. Currently deployed commercial versions of middleware often
use CORBA, DCOM (distributed component object model), Java, and RMI
(remote method invocation) [7] technologies. The message-passing interface

(MPI) [20, 30] developed in the research community is an example of an
interface for various communication functions.

1.3 Motivation
The motivation for using a distributed system is some or all of the following
requirements:
1. Inherently distributed computations In many applications such as
money transfer in banking, or reaching consensus among parties that are
geographically distant, the computation is inherently distributed.
2. Resource sharing Resources such as peripherals, complete data sets
in databases, special libraries, as well as data (variable/files) cannot be


4

Introduction

fully replicated at all the sites because it is often neither practical nor
cost-effective. Further, they cannot be placed at a single site because access
to that site might prove to be a bottleneck. Therefore, such resources are
typically distributed across the system. For example, distributed databases
such as DB2 partition the data sets across several servers, in addition to
replicating them at a few sites for rapid access as well as reliability.
3. Access to geographically remote data and resources In many scenarios, the data cannot be replicated at every site participating in the
distributed execution because it may be too large or too sensitive to be
replicated. For example, payroll data within a multinational corporation is
both too large and too sensitive to be replicated at every branch office/site.
It is therefore stored at a central server which can be queried by branch
offices. Similarly, special resources such as supercomputers exist only in
certain locations, and to access such supercomputers, users need to log in

remotely.
Advances in the design of resource-constrained mobile devices as well
as in the wireless technology with which these devices communicate
have given further impetus to the importance of distributed protocols and
middleware.
4. Enhanced reliability A distributed system has the inherent potential
to provide increased reliability because of the possibility of replicating
resources and executions, as well as the reality that geographically distributed resources are not likely to crash/malfunction at the same time
under normal circumstances. Reliability entails several aspects:
• availability, i.e., the resource should be accessible at all times;
• integrity, i.e., the value/state of the resource should be correct, in the
face of concurrent access from multiple processors, as per the semantics
expected by the application;
• fault-tolerance, i.e., the ability to recover from system failures, where
such failures may be defined to occur in one of many failure models,
which we will study in Chapters 5 and 14.
5. Increased performance/cost ratio By resource sharing and accessing
geographically remote data and resources, the performance/cost ratio is
increased. Although higher throughput has not necessarily been the main
objective behind using a distributed system, nevertheless, any task can be
partitioned across the various computers in the distributed system. Such a
configuration provides a better performance/cost ratio than using special
parallel machines. This is particularly true of the NOW configuration.
In addition to meeting the above requirements, a distributed system also offers
the following advantages:
6. Scalability As the processors are usually connected by a wide-area network, adding more processors does not pose a direct bottleneck for the
communication network.


5


1.4 Relation to parallel multiprocessor/multicomputer systems

7. Modularity and incremental expandability Heterogeneous processors
may be easily added into the system without affecting the performance,
as long as those processors are running the same middleware algorithms. Similarly, existing processors may be easily replaced by other
processors.

1.4 Relation to parallel multiprocessor/multicomputer systems
The characteristics of a distributed system were identified above. A typical
distributed system would look as shown in Figure 1.1. However, how does
one classify a system that meets some but not all of the characteristics? Is the
system still a distributed system, or does it become a parallel multiprocessor
system? To better answer these questions, we first examine the architecture of parallel systems, and then examine some well-known taxonomies for
multiprocessor/multicomputer systems.

1.4.1 Characteristics of parallel systems
A parallel system may be broadly classified as belonging to one of three
types:
1. A multiprocessor system is a parallel system in which the multiple processors have direct access to shared memory which forms a common address
space. The architecture is shown in Figure 1.3(a). Such processors usually
do not have a common clock.
A multiprocessor system usually corresponds to a uniform memory
access (UMA) architecture in which the access latency, i.e., waiting time, to
complete an access to any memory location from any processor is the same.
The processors are in very close physical proximity and are connected by
an interconnection network. Interprocess communication across processors
is traditionally through read and write operations on the shared memory,
although the use of message-passing primitives such as those provided by


Figure 1.3 Two standard
architectures for parallel
systems. (a) Uniform memory
access (UMA) multiprocessor
system. (b) Non-uniform
memory access (NUMA)
multiprocessor. In both
architectures, the processors
may locally cache data from
memory.

P

P

P

P

Interconnection network

M

M

M

P M

P M


P M

Interconnection network

M

(a)

P M

P M
(b)

M memory

P processor

P M


×