Tải bản đầy đủ (.pdf) (366 trang)

probability and computing randomized algorithms and probabilistic analysis

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.77 MB, 366 trang )

Probability and Computing
Randomized Algorithms and Probabilistic Analysis

Michael Mitzenmacher
Eli Upfal


Probability and Computing
Randomization and probabilistic techniques play an important role in modern com­
puter science, with appl ications ranging from combinatorial optimization and machine
learning to communication networks and secure protocols.
This textbook is designed to accompany a one- or two - semester course for advanced
undergraduates or beginning graduate students in computer science and applied mathe­
matics. It gives an excellent introduction to the probabil istic techniques and paradigms
used in the development of probabilistic algorithms and analyses . It assumes only an
elementary background in discrete mathematics and gives a rigorous yet accessible
treatment of the material, with numerous examples and applications.
The first half of the book covers core material , including random sampling, expec­
tations, Markov's inequality, Chebyshev 's inequality, ChernotT bounds, bal ls-and-bins
models, the probabilistic method, and Markov chains . I n the second half, the authors
delve into more advanced topics such as continuous probability, applications of limited
i ndependence, entropy, Markov chain Monte Carlo methods. coupling, martingales,
and balanced allocations. With its comprehensive selection of topics, along with many
examples and exercises, this book is an indispensable teaching tool.
Michael Mitzenmacher is John L. Loeb Associate Professor in Computer Science at
Harvard University. He received his Ph . D. from the University of California. Berke­
ley, in 1 996. Prior to joining Harvard in 1 999, he was a research staff member at Digital
Systems Research Laboratory in Palo Alto. He has received an NSF CAREER Award
and an Alfred P. S loan Research Fel lowship. In 2002, he shared the IEEE Information
Theory Society " B est Paper" Award for his work on error- correcting codes.
Eli Upfal is Professor and Chair of Computer Science at Brown University. He received


his Ph . D. from the Hebrew University, Jerusale m , Israe l . Prior to joining Brown in
1997 , he was a research staff member at the IBM research division and a professor at
the Weizm ann Institute of Science in I srael. His main research interests are randomized
computation and probabilistic analysis of algorithms, with applications to optimization
al gorithms, communication networks, parallel and distributed computing. and compu­
tational biology.


Probability and Computing
Randomized Algorithms and
Probabilistic Analysis

Michael Mitzenmacher
Harrarr.l Uni1·ersitY

CAMBRIDGE
C:\Jl VE}{SJTY Pl{ESS

Eli Upfal

Brmm Unirersity


PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

The Pitt B u i l di ng , Trumpington Street. Cambridge. U n i ted Ki ngdom
CAMBRIDGE UNIVERSITY PRESS

The Edinburgh B u i ldi ng. Cambridge C B 2 2RU. U K
40 West 20th Street. New York . N Y IOCIII-421 1 . USA

4 7 7 Wi l l i amstown Road. Port M e l bourne. VIC 3207. Austra l i a
R u i z de Alarcon 1 3 . 2801 4 \1adrid. S p a i n
Dock House, The Waterfront. C a p e Town 8001 . South Africa
bridge .org
© M i chael M i tzenmacher and El i l'pfal 2005
This book is in copyright . S ubject to statutory exception and
to the provi sions of relevant c o l l e ctive l i ce n s i n g agreements.
no reproduction of any part may take p l ace \\ ithout

the wri tten perm i s s i o n of Cambridge U n i versity Press.
First pub l i shed 2005
Printed i n the U n ited States of America

Type{ace Times 10. 5 /1 3 pt .

System A M S -TEX

[FH]

A catalog record for this book is available from the British Library.

LibrarY of Congress Cataloging in Publication data
M i tzenmacher, M ichael, 1969Probab i l ity and computi ng : rando m i zed al gorithms and probab i l i stic
analysis I M i chael M itze n m acher. E l i Upfa l .
p.

em.

Incl udes i ndex.
I S B N 0 -521-8354 0 - 2 ( a l k . paper)


I. Al gorithms. 2. Probabi l i ties. 3. Stochastic anal y s i s . I . U p fa l . Eli. 1954-. J l . Title.
QA2 74 . M 5 74

2005

518'.1 - dc22
ISBN 0 5 21 8 3 540 2 hardback

2004054540


Contents

page

Preface

xiii

Events and Probability

2

1.1

Application: Verifying Polynomial Identities

1.2


Axioms of Probability

1.3

Application: Verifying Matrix Multiplication

1.4

Application: A Randomized Min-Cut Algorithm

1.5

Exercises

Discrete Random Variables and Expectation

20

2.1

20
22
23
25
26
30
32
34
38


Random Variables and Expectation

2.1.1

Linearity of Expectations

2.1.2

Jensen's Inequality

2.2

The Bernoulli and Binomial Random Variables

2.3

Conditional Expectation

2.4

The Geometric Distribution

2.4.1

3

1
3
8
12

14

Example: Coupon Collector's Problem

2.5

Application: The Expected Run-Time of Quicksort

2.6

Exercises

Moments and Deviations

44

3.1

Markov's Inequality

3.2

Variance and Moments of a Random Variable

3.3

Chebyshev's Inequality

44
45

48
48
50
52
53
54
57

3.2.1

3.3.1
3.4

3.5

Example: Variance of a Binomial Random Variable
Example: Coupon Collector's Problem

Application: A Randomized Algorithm for Computing the Median

3.4.1

The Algorithm

3.4.2

Analysis of the Algorithm

Exercises


vii


C ONTENTS

4

Chernoff Bounds

61

4.1

Moment Generating Functions

4.2

Deriving and Applying Chernoff Bounds

61
63
63
67
67
69
71
72
73
78
83


4.2.1

Chernoff Bounds for the Sum of Poisson Trials

4.2.2

Example: Coin Flips

4.2.3

Application: Estimating a Parameter

4.3

Better Bounds for Some Special Cases

4.4

Application: Set Balancing

4.5* Application: Packet Routing in Sparse Networks

4.6

4.5.1

Permutation Routing on the Hypercube

4.5.2


Permutation Routing on the Butterfly

Exere ises

5 Balls, Bins, and Random Graphs
5.1

Example: The Birthday Paradox

5.2

Balls into Bins

5.3

5.2.1

The Balls-and-Bins Model

5.2.2

Application: Bucket Sort

The Poisson Distribution

5.3.1
5.4

90


Limit of the Binomial Distribution

The Poisson Approximation

5.4.1 * Example: Coupon Collector's Problem, Revisited
5.5

5.6

6

Application: Hashing

5.5.1

Chain Hashing

5.5.2

Hashing: Bit Strings

5.5.3

Bloom Filters

5.5.4

Breaking Symmetry


Random Graphs

5.6.1

Random Graph Models

5.6.2

Application: Hamiltonian Cycles in Random Graphs

5.7

Exercises

5.8

An Exploratory Assignment

90
92
92
93
94
98
99
1 04
1 06
1 06
1 08
1 09

1 12
112
1 12
1 13
1 18
1 24

The Probabilistic Method

1 26

6.1

The Basic Counting Argument

6.2

The Expectation Argument

1 26
1 28
1 29
1 30
131
133
1 33
1 34
1 34
1 35


6.3
6.4

6.5

6.2.1

Application: Finding a Large Cut

6.2.2

Application: Maximum Satisfiability

Derandomization Using Conditional Expectations
Sample and Modify

6.4.1

Application: Independent Sets

6.4.2

Application: Graphs with Large Girth

The Second Moment Method

6.5.1

Application: Threshold Behavior in Random Graphs


viii


CONTENTS

6.6

The Conditional Expectation Inequality

6.7

The Lovasz Local Lemma

6.8*

6.7.1

Application: Edge-Disjoint Paths

6.7.2

Application: Satistiability

Explicit Constructions

6.8.1

7

Application: A Satistiability Algorithm


6.9

Lovasz Local Lemma: The General Case

6.10

Exercises

:\Iarkov Chains and Random Walks

1 53

7. 1

15 3
156
159
1 63
166
167
173
1 74
176
177
182

7.2

Markov Chains: Definitions and Representations


7.1.1

Application: A Randomized Algorithm for 2-Satisfiability

7.1.2

Application: A Randomized Algorithm for 3-Satisfiability

Classification of States

7.2.1
7.3

7.4

Example: The Gambler's Ruin

Stationary Distributions

7.3.1

Example: A Simple Queue

Random Walks on Undirected Graphs

7.4.1

8


the Local Lemma

Application: An

7.5

Parrondo's Paradox

7.6

Exercises

Connectivity Algorithm

Continuous Distributions and the Poisson Process

1 88

8.1

1 88
188
191
193
1 94
196
1 97
1 99
20 1
204

205
207
210
212
213
216
216
219

Continuous Random Variables

8. 1. 1
8.1.2
8.2

8.3

Probability Distributions in R

Joint Distributions and Conditional Probability

The Uniform Distribution

8.2.1

Additional Properties of the Uniform Distribution

The Exponential Distribution

8.3.1


Additional Properties of the Exponential Distribution

8.3.2* Example: Balls and Bins with Feedback
8.4

8.5
8.6

The Poisson Process

8.4.1

Interarrival Distribution

8.4.2

Combining and Splitting Poisson Processes

8.4.3

Conditional Arrival Time Distribution

Continuous Time Markov Processes
Example: Markovian Queues

8.6.1
8.6.2
8.6.3
8.7

tJ

1 36
138
141
1 42
1 42
1 43
1 46
1 48

MIMI I Queue in Equilibrium
MIMI IlK Queue in Equilibrium

The Number of Customers in an lvfI MIx Queue

Exercises

Entropy, Randomness, and Information

225

9.1

The Entropy Function

9.2

Entropy and Binomial Coefficients


225
228

ix


CONT E NTS

9.3

Entropy: A Measure of Randomness

9.4

Compression

9.5* Coding: Shannon's Theorem
9.6
10

Exercises

The Monte Carlo Method

25 2

10.1

The Monte Carlo Method


10.2

Application: The DNF Counting Problem

25 2
255
255
257
259
263
265
267
270

10.2.1

The Na"ive Approach

10.2.2

A Fully Polynomial Randomized Scheme for DNF Counting

10.3

From Approximate Sampling to Approximate Counting

10.4

The Markov Chain Monte Carlo Method


10.4.1

11*

12

The Metropolis Algorithm

10.5

Exercises

10.6

An Exploratory Assignment on Minimum Spanning Trees

Coupling of Markov Chains

27 1

11.1

Variation Distance and Mixing Time

11.2

Coupling

27 1
274

275
276
277
278
28 1
282
286
289

11.2.1

Example: Shuffling Cards

11.2.2

Example: Random Walks on the Hypercube

11.2.3

Example: Independent Sets of Fixed Size

11.3

Application: Variation Distance Is Nonincreasing

11.4

Geometric Convergence

11.5


Application: Approximately Sampling Proper Colorings

11.6

Path Coupling

11.7

Exercises

Martingales

295

12.1

Martingales

12.2

Stopping Times

295
297
299
300
303
305
305

307
308
308
309

12.2.1
12.3

Example: A Ballot Theorem

Wald "s Equation

12.4

Tail Inequalities for Martingales

12.5

Applications of the Azuma-Hoeffding Inequality

12.6
13

230
234
237
245

12.5.1


General Formalization

12.5.2

Application: Pattern Matching

12.5.3

Application: Balls and Bins

12.5.4

Application: Chromatic Number

Exercises

Pairwise Independence and Universal Hash Functions

3 14

13.1

3 14
315
3 16

Pairwise Independence

13.1.1


Example: A Construction of Pairwise Independent Bits

13.1.2

Application: Derandomizing an Algorithm for Large Cuts
X


C ONTENTS

13.1.3

Example: Constructing Pairwise Independent Values Modulo

13.4

Application: Finding Heavy Hitters in Data Streams

13.5

Exercises

317
318
3 19
32 1
323
3 24
326
328

333

Balanced Allocations

336

14.1

336
336
34 1
344
344
345
345

a Prime

13.2

Chebyshev's Inequality for Pairwise Independent Variables

13.2.1
13.3

13.3.1

14 *

Application: Sampling Using Fewer Random Bits


Families of Universal Hash Functions
Example: A 2-Universal Family of Hash Functions

13.3.2

Example: A Strongly 2-Universal Family of Hash Functions

13.3.3

Application: Perfect Hashing

The Power of Two Choices

14.1.1

The Upper Bound

14.2

Two Choices: The Lower Bound

14.3

Applications of the Power of Two Choices

14.4

14.3.1


Hashing

14.3.2

Dynamic Resource Allocation

Exercises

Further Reading

349

Index

350

\·ore:

Asterisks indicate advanced material.

xi


Preface

\\.hy Randomness?

\\.hy should computer scientists study and use randomness? Computers appear to
�have far too unpredictably as it is ! Adding randomness would seemingly be a dis­
.JJ\ antage, adding further complications to the already c hallenging task of efficiently

uti l i zi ng computers.
Science has learned in the last century to accept randomness as an essential com­
r,)ncnt i n modeling and analyzing nature. In physics, for example, Newton ' s laws led
�l)ple to believe that the universe was a deterministic place ; given a big enough calcu­
;..1tnr and the appropriate i nitial conditions, one could determine the location of planets
�cars from now. The development of quantum theory suggests a rather different view ;
the universe stil l behaves according t o laws, but the bac kbone o f these laws i s proba­
�IIi stic. " God does not play dice with the universe" was Einstein ' s anecdotal obj ection
:,, modern quantum mechanics. Nevertheless, the prevail i ng theory today for subpar­
:I,:k physics is based on random behavior and statistical laws, and randomness plays a
,I�n i ti cant role in almost every other field of science ranging from genetics and evolu­
:11 111 i n biology to modeling price fluctuations in a free-market economy.
Computer science is no exception. From the highly theoretical notion of proba­
�lli ... tic theorem proving to the very practical design of PC Ethernet cards, randomness
..1nJ probabilistic methods play a key role i n modern computer science . The last two
Jcl..'a des have witnessed a tremendous growth in the use of probabil ity theory in com­
puting. Increasingly more advanced and sophisticated probabilistic techniques have
Xcn developed for use within broader and more chal lenging computer science appli­
.: ..1ti ons. In this book , we study the fundamental ways in which randomness comes
h' hear on computer science: random ized algorithms and the probabilistic analysis of
..1l�orithms.
Randomized algorithms: Randomized algorithms are algorithms that make random
.:hnices during their execution . In practice, a randomized program would use values
�cncrated by a random number generator to decide the next step at several branches
xiii


PREFACE

of its execution . For example. the protocol implemented i n an Ethernet card uses ran­

dom numbers to decide when it next tries to access the shared Ethernet communication
medium . The randomness i s useful for breaking symmetry, preventing different cards
from repeatedly accessing the medium at the same time. Other commonly used ap­
plications of randomized algorithms include Monte Carlo simulations and primality
testing in cryptography. In these and many other i mportant applications, randomized
algorithms are significantly more efficient than the best known deterministic solutions.
Furthermore, i n most cases the randomi zed algorithms are also simpler and easier to
program .
These gains come at a price; the answer may have some probability of being incor­
rect , or the efficiency i s guaranteed only with some probability. Although it may seem
unusual to design an algorithm that may be i ncorrect , if the probability of error is suf­
ficiently small then the i mprovement in speed or memory requ irements may well be
worthwhi le .
Probabilistic analysis qf algorithms: Complexity theory tries t o classify computa­
tion problems according to their computational complexity, i n particular di stinguish­
ing between easy and hard problems. For example, complexity theory shows that the
Traveling Salesmen problem is NP-hard. It is therefore very unlikely that there is an
algorithm that can solve any instance of the Traveling Salesmen problem in time that
is subexponential in the number of cities. An embarrassing phenomenon for the clas­
sical worst- case complexity theory is that the problems it classifies as hard to compute
are often easy to solve in practice. Probabili stic analysis gives a theoretical explanation
for this phenomenon. Although these problems may be hard to solve on some set of
pathological i nputs, on most inputs ( in particular, those that occur in real-life applica­
tions) the problem i s actually easy to solve. More precisely, if we think of the i nput as
being randomly selected according to some probability di stribution on the collection of
all possible i nputs, we are very likely to obtain a problem i nstance that is easy to solve,
and i nstances that are hard to solve appear with relatively small probability. Probabilis­
tic analysis of algorithms is the method of studying how algorithms perform when the
i nput is taken from a well-defined probabilistic space. As we will see, even NP-hard
problems might have al gorithms that are extremely efficient on almost all inputs.


The Book

This textbook is designed to accompany one- or two -semester courses for advanced
undergraduate or beginning graduate students in computer science and applied math­
e matics. The study of randomized and probabilistic techniques in most leading uni­
versities has moved from being the subj ect of an advanced graduate seminar meant
for theoreticians to being a regular course geared generally to advanced undergraduate
and beginning graduate students . There are a number of excellent advanced, research­
oriented books on this subj ect , but there is a c lear need for an i ntroductory textbook .
\\·e hope that our book satisfies this need.
The textbook has developed from courses on probabi li stic methods in computer sci­
ence taught at B rown (CS 1 5 5 ) and Harvard (CS 223) in recent years . The e mphasis
xiv


PREFA C E

1n these courses and in thi s textbook i s on the probabilistic techniques and paradigms,
not on particular applications. Each chapter of the book is devoted to one such method
)r technique . Techniques are clarified though examples based on analyzing random­
lied algorithms or developing probabilistic analysis of algorithms on random i nputs.
\ lany of these examples are derived from problems in networking, reflecting a promi­
nent trend in the networking field (and the taste of the authors).
The book contai ns fourteen chapters. We may view the book as being divided into
: \\O parts, where the first part (Chapters 1-7 ) compri ses what we believe i s core mate­
nal. The book assumes only a basic fami liarity with probability theory, equivalent to
\\hat is covered in a standard course on di screte mathematics for computer scientists.
Chapters 1-3 review this elementary probability theory while introduci ng some i nter­
c .... ting applications . Topics covered include random sampling, expectation, Markov ' s

mequality, variance, and Chebyshev's inequality. If the class has sufficient background
m probability, then these chapters can be taught q uickly. We do not suggest skipping
them, however, because they i ntroduce the concepts of random ized algorithms and
rrobabilistic analysis of algorithms and also contain several examples that are used
throughout the text .
Chapters 4-7 cover more advanced topics, including C hernoff bounds, balls-and­
t"lins models, the probabili stic method, and Markov chains. The material in these chap­
:cr.... is more challenging than in the initial chapters. Sections that are particularly chal­
:cnging ( and hence that the in structor may want to con sider skipping ) are marked with
.1n asteri sk. The core material i n the first seven chapters may constitute the bul k of a
--1uarter- or semester-long course, depending on the pace .
The second part of the book (Chapters 8-1 4 ) covers additional advanced material
that can be used either to fill out the basic course as necessary or for a more advanced
'econd course. These chapters are largely self- contained, so the instructor can choose
the topics best suited to the class. The chapters on continuous probability and en­
tropy are perhaps the most appropriate for i ncorporating into the basi c course. Our
Introduction to continuous probability (Chapter 8) focuses on u niform and exponential
Jhtri butions, including examples from queueing theory. Our exami nation of entropy
·Chapter 9) shows how randomness can be measured and how entropy ari ses naturall y
1n t he context o f randomness extraction, compression, and coding.
Chapters 10 and 1 1 cover the Monte Carlo method and coupling, respectively; these
.:hapters are cl osely related and are best taught together. Chapter 1 2, on martingales,
,_·, )\ ers important i ssues on dealing with dependent random variables, a theme that con­
tinues in a different vein in Chapter 13 ' s development of pairwise i ndependence and
Jerandomization. Finally, the chapter on balanced allocations (Chapter 1 4 ) covers a
t,)pic close to the authors ' hearts and ties in nicely with ChapterS's analysi s of balls­
.tnJ-bi ns problems.
The order of the subjects, especial ly in the fi rst part of the book . corresponds to
�heir relative importance in the algorithmic literature . Thus, for example, the study
,1f Chernoff bounds precedes more fundamental probability concepts such as Markov

,_·hains . However, instructors may choose to teach the chapters in a different order. A
,_·nurse with more emphasis on general stochastic processes, for example, may teach
\ larkov chains (Chapter 7 ) i mmediately after Chapters 1-3, following with the chapter
\

XV


PREFACE

on balls, bins, and random graphs (Chapter 5, omitting the Hamiltonian cycle exam­
ple) . Chapter 6 on the probabili stic method could then be skipped, following instead
with continuous probability and the Poisson process (Chapter 8). The material from
Chapter 4 on Chernoff bounds, however, is needed for most of the remaining material .
Most ofthe exercises i n the book are theoreticaL but we have included some program­
ming exercises - i ncluding two more extensive exploratory assignments that require
some programming. We have found that occasional programming exercises are often
helpful i n reinforci ng the book's ideas and in adding some variety to the course.
We have decided to restrict the material in this book to methods and techniques based
on rigorous mathematical analysis ; with few exceptions. all claims in this book are fol­
lowed by full proofs. Obviously, many extremely useful probabilistic methods do not
fall within this strict category. For example, i n the i mportant area of Monte Carlo meth­
ods, most practical solutions are heuristics that have been demonstrated to be effective
and efficient by experimental evaluatio n rather than by rigorous mathematical analy­
sis. We have taken the view that, i n order to best apply and understand the strengths
and weaknesses of heuristic methods, a firm grasp of underly i ng probability theory and
rigorous techniques - as we present in this book - is necessary. We hope that students
will appreciate this point of view by the end of the course .

Acknowledgments


Our first thanks go to the many probabilists and computer scientists who developed the
beautiful material covered in thi s book . \\le chose not to overload the textbook with
numerous references to the original papers. Instead. we provide a reference list that
includes a number of excellent books giv i ng bac kground material as well as more ad­
vanced di scussion of the topi cs cm·ered here.
The book owes a great deal to the comments and feedback of students and teaching
assi stants who took the courses CS 1 55 at Brown and CS 223 at Harvard. In particu­
lar we wish to thank Ari s Anagnostopoulos. Eden Hochbau m , Rob Hunter, and Adam
Kirsch. all of whom read and commented on early drafts of the book .
Speci al thanks to Dick Karp. who u sed a draft of the book in teaching CS 1 74 at
Berkeley during fall 2003 . H i s early comments and corrections were most valuable i n
i mproving the manuscript . Peter Bartlett taught C S 1 74 at Berkeley i n spring 2004,
also providing many corrections and useful comments.
We thank our colleagues who carefully read parts of the manuscript , pointed out
many errors, and suggested important i mprovements in content and presentation: Artur
Czumaj , Alan Frieze, Claire Kenyon . Joe Marks, Sal i l Vadhan , Eric Vigoda , and the
anonymous reviewers who read the manuscript for the publi sher.
We also thank Rajeev Matwani and Prabhakar Raghavan for allowing us to use some
of the exercises in their excellent book Randomized Algorithms.
We are grateful to Lauren Cowles of Cambridge University Press for her editorial
help and advice in preparing and organizing the manuscript .
Writing of this book was supported i n part by NSF ITR Grant no. CCR-0 1 2 1 1 54.
xvi


C HAPTE R ONE

Events and Probability


Thi s chapter introduces the notion of randomized algorithms and reviews some basic
L"oncepts of probability theory i n the context of analyzing the performance of simple
randomized algorithms for verifying algebraic identities and fi nding a minimum cut-set
in a graph .

1 . 1 . Application : Verifying Polynomial Identities

Computers can sometimes makes mistakes, due for example to incorrect programming
or hardware failure. It would be useful to have simple ways to double-check the results
of computations . For some problems, we can use randomness to efficiently verify the
L"orrectness of an output .
Suppose we have a program that multiplies together monomial s. Consider the prob­
km of verifying the following identity, which might be output by our program :

(x + l)(x- 2)(x + 3)(x- 4)(x + S)(x- 6 )

;1 x6- 7x3 + 25.

There i s an easy way to verify whether the identity i s correct: multiply together the
terms on the left-hand side and see if the resulting polynomial matches the right-hand
.... ide . In this example, when we multiply all the constant terms on the left , the result
Joes not match the constant term on the right , so the identity cannot be valid. More
generally, given two pol ynomials F(x) and G(x), we can verify the identity

F(x) � G(x)
')

by converting the two polynomials to their canonical forms (2:..::�1=0 c;x;); two polyno­
mials are equivalent if and only if all the coefficients in their canonical forms are equal.
From this point on let us assume that , as in our example, F(x) is given as a product

F(x) = TI�1=1(x- a;) and G(x) is given in its canonical form . Transforming F(x) to
its canonical form by consecutively multiplying the i th monomial with the product of
the first i - 1 monomials requires B(d2) multiplications of coefficients. We assume i n
1


EV ENTS AND PROBABILITY

what follows that each multiplication can be performed in constant time, although if
the products of the coefficients grow large then it could conceivably require more than
constant time to add and multiply numbers together.
So far, we have not said anything particularly i nteresting. To check whether the
computer program has multipli ed monomials together correctly, we have suggested
multiplying the monomials together again to check the result . Our approach for check­
ing the program is to write another program that does essential ly the same thing we
expect the first program to do. This is certainly one way to double- check a program :
write a second program that does the same thing. and make sure they agree. There
are at least two problems with this approach , both stemming from the idea that there
should be a difference between checki ng a given answer and recomputing i t . First , if
there is a bug in the program that multiplies monomials. the same bug may occur in the
checking program . ( Suppose that the checki ng program was written by the same per­
son who wrote the original program!) Second. it stands to reason that we would like
to check the answer i n less time than it takes to try to solve the original problem all
over agai n .
Let us instead uti lize randomness to obtai n a faster method to verify the identity. We
informally explain the algorithm and then set up the formal mathematical framework
for analyzing the algorithm .
Assume that the maximum degree, or the largest exponent oLr. in F(x ) and G ( x ) is
d . The algorithm chooses an integer r uniformly at random in the range { 1 , . . . . l OOd } ,
where b y " uniformly at random" w e mean that all integers are equally l ikely to be

chose n . The algorithm then computes the values F(r) and G(r) . If F(r) f. G (r) the
algorithm decides that the two polynomials are not equivalent , and if F(r) = G (r) the
algorithm decides that the two polynomial s are equivalent .
Suppose that in one computation step the algorithm can generate an i nteger cho­
sen uniformly at random i n the range { 1 .
l OOd } . Computing the values of F(r) and
G ( r ) can be done in O ( d ) time. which is faster than computing the canonical form of
F( r ) . The randomized algorithm . however. may give a wrong answer.
How can the al gorithm give the wrong answer?
If F( .r ) = G( .r). then the algorithm gives the correct answer, since it will fi nd that
F(r) = G (r) for any val ue of r.
If F(x ) :=/=- G (x ) and F ( r ) f. G (r ). then the algorithm gives the correct answer since
it has found a case where F( x ) and G ( x ) disagree. Thu s , when the algorithm decides
that the two polynomials are not the same. the answer is always correct .
I f F(x) :=/=- G ( x ) and F(r ) = G (r ) . the algorithm gives the wrong answer. In
other words, it is possible that the algorithm decides that the two polynomials are the
same when they are not . For this error to occur, r must be a root of the equation
F(x ) - G (x ) = 0. The degree of the polynomial F(x) - G ( x ) is no larger than d
and, by the fundamental theorem of algebra , a polynomial of degree up to d has no
more than d roots . Thus, if F( x ) :=/=- G ( x ), then there are no more than d values in the
range { 1 , . . . , lOOd } for which F(r) = G (r ) . Since there are I OOd values in the range
{ 1 , . . . . l OO d } , the chance that the algorithm chooses such a value and returns a wrong
answer is no more than 1 / 100.
. . . •

2


1 . 2 A XIOMS OF PROBAB IL ITY


1.2. Axioms of Probability

We turn now to a formal mathematical setting for analyzing the randomized algorithm .
Any probabilistic statement must refer to the underlying probability space.

Definition 1.1: A probability space has three components:
1.
2.
3.

a sample space Q , which is the set of all possible outcomes of the random process
modeled by the probability space;
a family of sets F representing the allovvable events, where each set in F is a subset
of the sample space Q; and
a probability function Pr: F-+ R satisfying Definition 1.2.

.\n element of Q is called a simple or elementary event .
In the randomized algorithm for verifying pol ynomial identities, the sample space
j.., the set of integers { 1 , . . . , l OO d } . Each choice of an integer r in this range is a simple
c\ ent .

Definition 1.2: A probability function is any function Pr :
tr1llowing conditions:
1.

2.

3.

F


-+

R that sati.�fies the

for any event E , 0 _:::: Pr ( E ) _:::: 1;
Pr ( Q ) = 1; and
for any finite or countably infinite sequence of pairwise mutually disjoint events
£1, E 2 , £ 3 , . . . ,

Pr

(U )
i::0:1

£,

=

L Pr ( £ ,)
i::0:1

In most of this book we will use discrete probability spaces. I n a discrete probability
.... pace the sample space Q is finite or countably infinite, and the family F of allow­
..ihle events consists of al l subsets of Q . In a discrete probabil ity space, the probabil ity
function is uniquely defined by the probabilities of the si mple events.
Again , in the randomized algorithm for verifying polynomial identities , each choice
'1f an integer r is a simple event . Since the algorithm chooses the integer uniforml y at
random , al l simple events have equal probability. The sample space has l OOd simple
\?\ ents, and the sum of the probabilities of al l simple events must be 1. Therefore each

,jmple event has probability 1/ 1 00 d .
Because events are sets, w e use standard set theory notation t o express combinations
,1f events. We write £1 n £2 for the occurrence of both £1 and E2 and write £1 U £2
fl)r the occurrence of either £1 or £2 ( or both) . For example, suppose we roll two dice.
If £1 is the event that the first die i s a 1 and £ 2 is the event that the second die is a 1 ,
then £1 n £ 2 denotes the event that both dice are 1 while £1 U £2 denotes the event that
.1t l east one of the two dice l ands on 1 . S i milarly, we write £1 E2 for the occurrence
,1f an event that is in £1 but not in £ 2 . With the same dice example, £1
£2 consists
,)f the event where the first die is a 1 and the second die is not . We use the notation E

-

3

-


EVENTS A N D PROBABIL ITY

as shorthand for Q
E: for example, if E is the event that we obtain an even number
when rol ling a die, then E is the event that we obtain an odd number.
Definition 1 .2 yields the fol l owing obvious lemma.

Lemma 1.1: For any nvo events E1 and E2 .

Proof· From the defi nition ,

Pr ( E I )


Pr ( E 2 )
Pr( £1

U

=
=

)=

n E 2 ) ) + Pr ( n
Pr ( E 2 - ( E 1 n E 2 ) ) + Pr ( E 1 n
Pr ( £ 1 - ( £ 1 n E :J) + Pr ( E2 -

Pr ( E I - ( E I

.

)'
(

)

n

) ) + Pr ( E I

n


E2 ) .


The lem ma easily fol lows .

A consequence of Definition 2 i s known as the union bound. Although it is very sim­
ple. it is tremendously u sefu l .

Lemma 1.2: For an_v finite or countably infinite sequence of events E 1 , E 2,

• • •

,

Notice that Lemma 1.2 differs from the third part of Definition 1.2 in that Definition 1 .2
i s an equality and requires the events to be pairwise mutually disjoint.
Lemma 1 . 1 can be generalized to the fol lowing equality, often referred to as the

inclusion-exclusion principle.

Lemma 1.3: Ler E1
Pr

• • • • •

E 1 1 be any n events. Then

( LJ ) t
r=l


£,

=

1=!

+

L Pr( E ; n Ej )

Pr(
Pr(

... + (

1)1

it

The proof of the inclusion-exclusion principle is left as Exercise 1.7.
We showed before that the only case in whic h the algorithm may fail to give the cor­
rect answer is when the two input polynomials F(x ) and G ( x ) are not equivalent ; the
algorithm then gives an incorrect answer if the random number it chooses is a root of
the polynomial F(x) - G (x). Let E represent the event that the algorithm failed to
give the correct answer. The elements of the set corresponding to E are the roots of the
4


1 . 2 AX IOMS OF PROB A B I L ITY


pol ynomial F ( x ) - G (x ) that are i n the set of integers { 1 , . . . , l OOd } . Since the poly­
nomial has no more than d roots i t follows that the event E i ncludes no more than d
-;imp1e events, and therefore
< -- =
Pr (algorithm fai ls) = Pr ( £ ) -

d
l OOd

1
1 00

-.

It may seem unusual to have an algorithm that can return the wrong answer. It may
hdp to thi nk of the correctness of an algorithm as a goal that we seek to optimi ze in
conjuncti on with other goals. In designing an al gorithm, we generally seek to mini­
mize the number of computational steps and the memory requi red. Someti mes there is
;.1 trade-off: there may be a faster algorithm that u ses more memory or a s l ower algo­
rithm that uses less memory. The randomized al gorithm we have presented gives a
trade-off between correctness and speed. Allowin g al gori thm s that may give an incor­
rect answer ( but in a systematic way ) expands the trade-off space avai lable in designing
..1lgori thm s . Rest assured, however, that not all randomized al gorithms give i ncorrect
.mswers, as we shal l see.
For the algorith m j ust described, the algori thm gives the correct answer 99% of the
t i me even when the polynomial s are not equivalent . Can we improve this probability ?
One way i s t o choose the random number r from a larger range o f integers . If o u r sam­
rk space is the set of i ntegers { 1 , . . . , lOOOd } , then the probabil ity of a v.Tong answer
1" at most 1 / 1 000 . At some point , however. the range of \·al ues we can use is limited
�y the precision available on the machine on which we nm the al gorithm .

Another approach is to repeat the algorithm mu ltiple times. using different random
\ al ues to test the identity. The property we use here is that the algorithm has a one-sided
t rmr. The algorithm may be wrong only when it outputs that the two pol ynomials are
'-'LIUivalent . If any run yields a number r such that F ( r ) =f. G (r ) , then the polynomi­
..d., are not equivalent . Thus, if we repeat the algorithm a number of times and find
Fir)# G (r ) in at least one round of the algorith m , we know that F (x ) and G (x ) are
n�1t equivalent . The algorithm outputs that the two polynomials are equivalent only if
there is equality for all runs.
I n repeating the algorithm we repeatedly choose a random number in the range
l . . . . . 1 OOd } . Repeatedly choosing random numbers according to a given distribution
> general l y referred to as sampling. In this case, we can repeatedly choose random
�l umbers in the range { 1 , . . . , 1 00d } in two ways: we can sample either with replac ement
, 'r 1ritlzout replacement. Sampling with replacement means that we do not remember
·,\ hich numbers we have already tested; each time we run the algorithm . we choose
..: n umber uniformly at random from the range { 1 , . . . , 1 00d } regardless of previous
..:hLlices. so there is some chance we will choose an r that we have chosen on a previ­
. 'Lh run . Sampling without replacement means that , once we have chosen a number r,
-.\ e do not allow the number to be chosen on subsequent runs : the number chosen at a
;I\ en iteration is uniform over al l previously unselected numbers .
Let us first consider the case where sampl ing is done with replacement . Assume
:h;1t we repeat the algorithm k times, and that the input polynomials are not equiva­
.cnt . What is the probabi lity that in all k iterations our random sampling from the set
l . . . . . l OOd } yields roots of the polynomial F(x ) - G (x ) , resulting in a wrong output

5


E V ENTS AND PROBABILITY

by the algorithm? I f k = 1 , we know that this probability is at most d/ 1 00d = 1 / 1 00 .

If k = 2 , it seems that the probability that the first iteration finds a root is 1 / 1 00 and
the probability that the second iteration finds a root is 1 / 1 00, so the probability that
both iterations fi nd a root is at most 0/ 1 00) 2 . Generalizing, for any k, the probability
of choosing roots for k iterations wou l d be at most ( 1 / 1 00) k .
To formalize this, we introduce the notion of independence.

Definition

1.3:

Tw o events E and F are i ndependent �f and only if
Pr ( En F )

=

Pr ( E ) Pr ( F ) .
·

More generally. events E1 , E 2, . . . , Ek are mutually independent if and only if, for any
subset!� [ L k ] ,
Pr

(n )
iEI

£

=

n Pr ( £1 )

iEI

If our algorithm samples with replacement then in each iteration the algorithm chooses
a random number uniformly at random from the set { 1 , . . . , 1 00d } , and thus the choice i n
o n e iteration is independent o f the choices i n previous iterations. For the case where the
polynomials are not equivalent , let £1 be the event that , on the i th run of the algorithm,
we choose a root r1 such that F(r1 ) - G (r1 ) = 0 . The probability that the algorithm
returns the wrong answer is given by

S ince Pr ( £1 ) is at most d / 1 00d and since the events £ 1 , £2 , . . . , E k are independent ,
the probability that the algorithm gives the wrong answer after k iterations is

k

Pr ( £Jn £_,n
- . . .n £k· ) =

k

d
=
n Pr ( £·I ) < n -I OOd
i=l

-

i=l

( )


1 k
.
1 00

-

The probability of making an error is therefore at most exponentially small in the num­
ber of trials .
Now let us consider the case where sampling is done without replacement . I n this
case the probability of choosing a given number i s conditioned on the events of the
previous iterations.

Definition 1.4: The conditional probabi lity that event E occurs given that event F
occurs is
Pr ( En F )
Pr ( E I F ) =
.
Pr ( F )
The cond itiona l probabilit_v is well-defined only zf Pr ( F ) > 0.
Intuitively, w e are looking for the probabi lity o f En F within the set o f events defined
by F. Because F defines our restricted sample space, we normalize the probabilities
by dividing by Pr ( F ) , so that the sum of the probabilities of all events is 1. When
Pr ( F) > 0, the definition can also be written in the useful form
Pr ( E I F ) Pr (F )
6

=

Pr ( £n F ) .



1.2 AX IOMS OF PRO B A B I L IT Y

�otice that , when E and F are independent and Pr (F ) =/=- 0, we have
Pr ( E I F )

=

Pr ( E

nF)

=

Pr (F )

Pr ( E ) Pr (F )
Pr(F )

=

Pr ( £ ) .

This is a property that conditional probabi lity should have ; intuitively, if two events are
mdependent, then information about one event should not affect the probability of the
'econd event .
Again assume that we repeat the algorithm k times and that the input polynomials
.m� not equivalent . What is the probability that in all the k iterations our random sam­
pling from the set { 1 , . . . , l OO d } yields roots of the polynomial F (x ) - G (x ) , resulting
m a wrong output by the algorithm?

As in the analysis with replacement, we let Ei be the event that the random num­
l'ler ri chosen in the i th iteration of the algorithm is a root of F ( x ) - G ( x ) ; again, the
probabi lity that the algorithm returns the wrong answer is given by
Pr ( £1

n £2 n · · · n Ek )·

-\pplying the definition of conditional probability, we obtain

..md repeating this argument gives
Pr ( E 1
=

n E2 n ·

·

·

n Ek )

Pr ( E 1 ) · Pr ( E 2 I E J ) · Pr ( E3 I E 1

n E 2 ) · · · Pr ( E k I

E1

n E 2 n · · · n E k -I).

Can we bound Pr ( Ei I £1 n £2 n · · · n £1_1)? Recal l that there are at most d values

for which F (r) - G (r) = 0; if trials 1 through j - 1 < d have found j - 1 of them,
rhen when sampling without replacement there are only d - (j - 1 ) values out of the
lOOd - ( j - 1 ) remaining choices for which F (r) - G (r) = 0. Hence
r

Pr ( E·J

1

£1

<
n £ 2 n . . · n E·J -J ) -

d-

(}-

1)

1 00 d - ( j - 1 ) '

..md the probability that the algorithm gives the wrong answer after k _:s d iterations is
t"lounded by
Pr ( £1

n

£2


n · · · n £,.)
"

k

d - ( j - 1)
- n
. 1 OOd - ( j - 1 )
<

j=l

<

-

(-)
1
1 00

k
.

Because (d- ( j - 1 ) )/( l OOd- ( j - 1 ) ) < dj lOOd when j > 1 , our bounds on the prob­
..ibility of making an error are actual ly slightly better without replacement . You may
..1l so notice that , if we take d + 1 samples without replacement and the two polynomi­
..il s are not equivalent, then we are guaranteed to find an r such that F (r) - G (r) =/=- 0 .
Thus. i n d + 1 iterations w e are guaranteed t o output the correct answer. However, com­
2
puting the value of the polynomial at d + 1 points takes 0) (d ) time u sing the standard

..1pproach , which is no faster than fi nding the canonical form deterministically.
Since sampling without replacement appears to give better bounds on the probabil­
ity of error, why would we ever want to consider sampling with repl acement? In some

7


EVE NTS AND PROBABILITY

cases, sampling with repl acement i s significantly easier to analyze, so it may be worth
considering for theoretical reasons. In practice, sampl i ng with replacement is often
simpl er to code and the effect on the probabil ity of making an error is almost negligi­
ble, making it a desirable al ternative.
1.3. Application : Verifying Matrix Multiplication
\Vc now consider another example where randomness can be u sed to verify an equal­

ity more quickly than the known determini stic algorithms. Suppose we are given three
x n matrices A , B, and C. For convenience, assume we are working over the integers
modulo 2 . We want to verify whether

n

AB = C.
One way to accompli sh thi s is to multiply A and B and compare the result to C. The
simple matrix multipl ication algorithm takes (>-<} ( n 3 ) operations . There exi st more so­
phisticated algorithms that are known to take roughly (0 ( n 2 · 3 7 ) operations .
Once again , w e u se a randomized algorithm that allows for faster verification - a t the
expense of possibly returning a wrong answer with small probability. The algorithm is
similar in spirit to our randomized algorithm for checking polynomial identities . The
algorithm chooses a random vector r = (r 1, r2 , . . . , r11 ) E {0, 1 }11• It then computes ABr

by first computing Br and then A ( Br) , and it also computes Cr. If A (Br) =1- Cr, then
AB =/:- C. Otherwise, it returns that AB = C.
The algorithm requires three matrix-vector mul tiplications, which can be done in
time C0(n 2 ) in the obvious way. The probabi lity that the algorithm returns that AB =
C when they are actual ly not equal is bounded by the fol lowing theorem .

Theorem 1.4: {f AB =/:- C and (fr is chosen un(formly at random from { 0, 1 } 17, then
1
< -.
Pr ( ABr = Cr ) 2
Proof· Before beginning, we point out that the sample space for the vector r is the set
{0, 1 } 11 and that the event under consideration is ABr = Cr. We al so make note of the
fol lowing simple but useful lemma .

Lemma 1.5: Choosing r = (r 1 , r2 , ••• , r11 ) E {0, 1 } 11 un{forrnly at random is equivalent
to choosing each ri independently and uniforml_vfrom { 0, 1 } .
Proof· If each ri i s chosen i ndependently and uniformly at random, then each of the
21 1 possibl e vectors r is chosen with probability 2-11, giving the l emma.
D

Let D = AB - C =1- 0. Then ABr = Cr i mplies that Dr
0. Since D =1- 0 it must
have some nonzero entry ; without l oss of generality, let that entry be d1 1 •
=

8


1.3 APPLICAT IO N : V E RIFYING MAT R I X MULTIPL I CATION


For Dr = 0, it must be the case that
11

L dli rJ

=0

.i=l

or, equivalently,
(1.1)
Now we introduce a helpful idea. Instead of reasoning about the vector r, suppose
that we choose the rk independently and uniformly at random from {0, 1 } in order, from
r11 down to r 1• Lemma 1 .5 says that choosing the rk in thi s way is equivalent to choos­
ing a vector r uniformly at random . Now consider the situation just before r 1 is chosen .
A.t this point , the right-hand side of Eqn . ( 1 . 1 ) is determi ned, and there is at most one
choice for r 1 that wi ll make that equality hold. Since there are two choices for r1, the
equality holds with probability at most 1 / 2 , and hence the probability that ABr = Cr
i s at most 1 / 2. By considering all variables besides r 1 as having been set , we have re­
duced the sample space to the set of two values { 0, 1 } for r 1 and have changed the event
being considered to whether Eqn . ( 1. 1 ) holds.
This idea is called the principle of deferred dec isions. When there are several ran­
dom variables, such as the ri of the vector r, it often helps to think of some of them
Js being set at one point in the algorithm with the rest of them being l eft random - or
deferred - until some further poi nt in the analysis. Formall y, this corresponds to con­
ditioning on the revealed values; when some of the random variables are reveal ed, we
must condition on the revealed values for the rest of the anal ysis. We wil l see further
examples of the principle of deferred decisions l ater in the book .
To formalize this argument, we first introduce a simple fact , known as the law of
total probability.


Theorem 1.6 [ Law of Total Probability] : Let £1, £2, . . . , E11 be mutually disjoint
f:Tents in the sample space Q , and let U;1=1 Ei = Q . Then

Proof· Since the events B
space Q , it follows that

II

II

i=l

i=l

n £1 (i

= 1 , . . . , n ) are disjoint and cover the entire sampl e

Pr ( B ) =

11

L Pr ( B n Ef) .
i=l

Further,
11

11


i=l

i=l

by the definition of conditional probabil ity.
9

D


E VE N TS AND PROBA B IL IT Y

Now, using this law and summing over all collections of values (x2, x3, x4, . . . ,X11) E
{ 0, l}n -I yields
Pr ( ABr

= Cr)

L

I 12..

\,.

IC:::(0.Jl11-l

Pr ( ( ABr

=


Cr)

1

<

....Xn )E(O, l}n-l

(X2

2 Pr ( ( r 2 , ..., r11 )

n ( ( r2_ . . . Jn)

=

=

(X2, . .. ,Xn)))

( x2 , . . . ,x11))

2

Here we have used the independence of r 1 and (r 2 , . . . , r11 ) in the fourth line.



To improve on the error probability of Theorem 1 .4, we can again use the fact that the

algorithm has a one-sided error and run the algorithm multiple times . If we ever find
an r such that ABr =!=- Cr, then the algorithm will correctly return that AB =!=- C. If we
always find ABr = Cr, then the algorithm returns that AB = C and there is some
probability of a mistake. Choosing r with replacement from {0, I } 11 for each trial, we
obtain that, after k trials, the probability of error is at most 2-k. Repeated trials i ncrease
2
the running time to B ( kn ) .
Suppose we attempt this verifi cation 1 00 times . The running time of the random­
i zed checking algorithm is still (-.) ( n 2) , which is faster than the known deterministic
algorithms for matrix multiplication for sufficiently large n . The probability that an in­
correct algorithm passes the verification test 1 00 times is 2-HlO , an astronomically small
number. I n practice, the computer is much more likely to crash during the execution
of the algorithm than to return a wrong answer.
An i nteresting related problem i s to evaluate the gradual change in our confidence in
the correctness of the matrix multiplication as we repeat the randomized test . Toward
that end we introduce Bayes ' law.

Theorem 1.7 [Bayes' Law] : Assume that £ 1 , £ 2 ,
such that U;� 1 £7 = E . Then
Pr ( E1· I B )

=

Pr ( Ei

• • •

, E11

are mutually disjoint sets


n B)

Pr ( B )

A s a simple application of B ayes ' law, consider the following problem . We are given
three coins and are told that two of the coins are fair and the third coin is biased, land­
ing heads with probability 2 / 3. We are not told which of the three coins is biased. We
10


1.3 APPL ICATION: VERIFYING MAT R I X MULT IPL I C ATION

permute the coins randomly, and then flip each of the coins. The first and second coins
come up heads, and the third comes up tails. What is the probability that the first coin
is the biased one?
The coins are i n a random order and so, before our observing the outcomes of the
coin fl ips, each of the three coins is equally likely to be the biased one. Let Ei be the
event that the i th coin flipped is the biased one, and let B be the event that the three
coin fl ips came up heads, heads, and tails.
Before we flip the coins we have Pr ( Ei ) = 1 1 3 for all i . We can also compute the
probability of the event B conditioned on Ei :
2

and

1
Pr ( B I E 1) = .
2


·

1

1

1

= J 2 2 = (; ,

Pr ( B I E 1 ) = Pr ( B I E 2)

·

·

- · - = -.
1

1

1

2

3

12

Applying B ayes ' law, we have

2

Pr ( B I E J ) Pr
( E1 )
-----Pr ( £1 I B ) = -1,-L i=l Pr ( B I Ef ) Pr ( EJ )

5

Thus, the outcome of the three coin fl ips increases the likelihood that the fi rst coin is
the biased one from 1 I 3 to 2 I 5.
Returning now to our randomized matrix multiplication test , we want to evaluate
the i ncrease in confidence in the matrix identity obtained through repeated tests . I n
the Bayesian approach o n e starts with a prior model, giving some initial value t o the
model parameters. This model is then modified, by incorporating new observations, to
obtain a posterior model that captures the new i nformatio n .
I n the matrix multiplication case, if we have no information about the process that
generated the identity then a reasonable prior assumption is that the identity is correct
\Vith probability 1 1 2 . If we run the randomized test once and it returns that the matrix
identity is correct, how does this change our confidence in the identity ?
Let E be the event that the identity is correct, and let B be the event that the test re­
turns that the identity is correct . We start with Pr ( £ ) = Pr ( £ ) = 1 1 2 , and since the
test has a one-sided error bounded by 1 I 2 , we have Pr ( B I E ) = 1 and Pr ( B I E) _:s
I 1 2 . Applying B ayes ' l aw yields
Pr ( E I B ) =

Pr ( B I E ) Pr ( E )
Pr ( B I E ) Pr ( E ) + Pr ( B I E ) Pr ( E )

1I2


> -----

1I2 + 1I2

·

I I2

2
3

Assume now that we run the randomized test again and it again returns that the iden­
tity is correct . After the first test, I may natural l y have revised my prior model, so that
I believe Pr ( £ ) � 2 1 3 and Pr ( £ ) _:s 1 1 3 . Now let B be the event that the new test
returns that the identity is correct; since the tests are i ndependent , as before we have
Pr ( B I £ ) = 1 and Pr ( B I £ ) _:s 1 1 2 . Applying B ayes ' l aw then yields
Pr ( E I B )



213

2I3 + 1I3 · 1I2
11

=

4

S.



EVENTS AND PROBA B ILITY

I n general : I f our prior model ( before running the test ) is that Pr ( £ ) 2':
and if the test returns that the identity i s correct ( event B ) , then

2i

Pr ( E I B ) 2':

2i

2i + 1

------

1

1

+
2i + 1 -2 2i + 1

--

2i + l
:z i + l + 1

2i/ (2i + 1 )


1

--- = 1 - . -.

27 + 1

--

Thus, if all 1 00 calls to the matrix identity test return that the identity is correct , our
confidence in the correctness of this identity is at least I - I / ( 2 1 00 + 1 ) .
1 . 4 . Application : A Randomized Min- Cut Algorithm

A cut-set in a graph is a set of edges whose removal breaks the graph i nto two or
more connected components . Given a graph G = ( V, E ) with n vertices. the minimum
cut - or min-cut - problem is to fi nd a minimum cardinality cut-set in G. Minimum
cut problems arise in many contexts, including the study of network reliability. I n the
case where nodes correspond to machines in the network and edges correspond to con­
nections between machi nes, the min- cut is the smallest number of edges that can fail
before some pair of machines cannot communicate. Minimum cuts also ari se in cl us­
tering problems. For example, if nodes represent Web pages (or any documents i n a
hypertext-based system) and two nodes have an edge between them if the correspond­
ing nodes have a hyperlink between them, then small cuts divide the graph i nto clusters
of documents with few links between clusters. Documents in different clusters are
likely to be unrelated.
We shall proceed by making use of the defi nitions and techniques presented so far in
order to analyze a simple randomized algorithm for the min- cut problem . The main op­
eration in the algorithm is edge contraction. In contracting an edge { u , v } we merge the
two vertices u and v i nto one vertex, eliminate all edges connecting u and v , and retain
all other edges in the graph . The new graph may have parallel edges but no self-loops .

Examples appear in Figure 1 . 1 , where in each step the dark edge is being contracted.
The algorithm consists of n 2 iterations . In each iteration, the algorithm picks an
edge from the exi sting edges in the graph and contracts that edge. There are many pos­
sibl e ways one could choose the edge at each step. Our randomized algorithm chooses
the edge uniformly at random from the remaining edges .
Each iteration reduces the number of vertices in the graph by one. After n
2 it­
erations, the graph consists of two vertices. The algorithm outputs the set of edges
connecting the two remaining vertices .
It is easy to verify that any cut-set of a graph in an intermediate iteratio n of the algo­
rithm is al so a cut-set of the original graph . On the other hand, not every cut-set of the
original graph is a cut-set of a graph in an i ntermedi ate i teration . since some edges of
the cut-set may have been contracted in previous iterations. As a resu l t , the output of
the algorithm is always a cut-set of the origi nal graph but not necessarily the minimum
cardinality cut-set (see Figure 1 . 1 ).
-

-

12


×