Tải bản đầy đủ (.pdf) (642 trang)

0521864704 cambridge university press probability and random processes for electrical and computer engineers jun 2006

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (768 KB, 642 trang )


This page intentionally left blank


PROBABILITY AND RANDOM PROCESSES FOR
ELECTRICAL AND COMPUTER ENGINEERS

The theory of probability is a powerful tool that helps electrical and computer
engineers explain, model, analyze, and design the technology they develop. The
text begins at the advanced undergraduate level, assuming only a modest knowledge
of probability, and progresses through more complex topics mastered at the graduate
level. The first five chapters cover the basics of probability and both discrete and
continuous random variables. The later chapters have a more specialized coverage,
including random vectors, Gaussian random vectors, random processes, Markov
Chains, and convergence. Describing tools and results that are used extensively in
the field, this is more than a textbook: it is also a reference for researchers working
in communications, signal processing, and computer network traffic analysis. With
over 300 worked examples, some 800 homework problems, and sections for exam
preparation, this is an essential companion for advanced undergraduate and graduate
students.
Further resources for this title, including solutions, are available online at
www.cambridge.org/9780521864701.
John A. Gubner has been on the Faculty of Electrical and Computer
Engineering at the University of Wisconsin-Madison since receiving his Ph.D.
in 1988, from the University of Maryland at College Park. His research interests
include ultra-wideband communications; point processes and shot noise; subspace
methods in statistical processing; and information theory. A member of the IEEE,
he has authored or co-authored many papers in the IEEE Transactions, including
those on Information Theory, Signal Processing, and Communications.




PROBABILITY AND RANDOM
PROCESSES FOR ELECTRICAL AND
COMPUTER ENGINEERS
JOHN A. GUBNER
University of Wisconsin-Madison


cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press
The Edinburgh Building, Cambridge cb2 2ru, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521864701
© Cambridge University Press 2006
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published in print format 2006
isbn-13
isbn-10

978-0-511-22023-4 eBook (EBL)
0-511-22023-5 eBook (EBL)

isbn-13
isbn-10

978-0-521-86470-1 hardback

0-521-86470-4 hardback

Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.


To Sue and Joe



Contents

1

2

3

4

5

Chapter dependencies
Preface
Introduction to probability
1.1 Sample spaces, outcomes, and events
1.2 Review of set notation
1.3 Probability models
1.4 Axioms and properties of probability

1.5 Conditional probability
1.6 Independence
1.7 Combinatorics and probability
Notes
Problems
Exam preparation
Introduction to discrete random variables
2.1 Probabilities involving random variables
2.2 Discrete random variables
2.3 Multiple random variables
2.4 Expectation
Notes
Problems
Exam preparation
More about discrete random variables
3.1 Probability generating functions
3.2 The binomial random variable
3.3 The weak law of large numbers
3.4 Conditional probability
3.5 Conditional expectation
Notes
Problems
Exam preparation
Continuous random variables
4.1 Densities and probabilities
4.2 Expectation of a single random variable
4.3 Transform methods
4.4 Expectation of multiple random variables
4.5 Probability bounds
Notes

Problems
Exam preparation
Cumulative distribution functions and their applications
5.1 Continuous random variables
5.2 Discrete random variables
5.3 Mixed random variables
5.4 Functions of random variables and their cdfs
5.5 Properties of cdfs
5.6 The central limit theorem
5.7 Reliability

vii

page

x
xi
1
6
8
17
22
26
30
34
43
48
62
63
63

66
70
80
96
99
106
108
108
111
115
117
127
130
132
137
138
138
149
156
162
164
167
170
183
184
185
194
197
200
205

207
215


viii

6

7

8

9

10

Contents

Notes
Problems
Exam preparation
Statistics
6.1 Parameter estimators and their properties
6.2 Histograms
6.3 Confidence intervals for the mean – known variance
6.4 Confidence intervals for the mean – unknown variance
6.5 Confidence intervals for Gaussian data
6.6 Hypothesis tests for the mean
6.7 Regression and curve fitting
6.8 Monte Carlo estimation

Notes
Problems
Exam preparation
Bivariate random variables
7.1 Joint and marginal probabilities
7.2 Jointly continuous random variables
7.3 Conditional probability and expectation
7.4 The bivariate normal
7.5 Extension to three or more random variables
Notes
Problems
Exam preparation
Introduction to random vectors
8.1 Review of matrix operations
8.2 Random vectors and random matrices
8.3 Transformations of random vectors
8.4 Linear estimation of random vectors (Wiener filters)
8.5 Estimation of covariance matrices
8.6 Nonlinear estimation of random vectors
Notes
Problems
Exam preparation
Gaussian random vectors
9.1 Introduction
9.2 Definition of the multivariate Gaussian
9.3 Characteristic function
9.4 Density function
9.5 Conditional expectation and conditional probability
9.6 Complex random variables and vectors
Notes

Problems
Exam preparation
Introduction to random processes
10.1 Definition and examples
10.2 Characterization of random processes
10.3 Strict-sense and wide-sense stationary processes
10.4 WSS processes through LTI systems
10.5 Power spectral densities for WSS processes
10.6 Characterization of correlation functions
10.7 The matched filter
10.8 The Wiener filter

219
222
238
240
240
244
250
253
256
262
267
271
273
276
285
287
287
295

302
309
314
317
319
328
330
330
333
340
344
348
350
354
354
360
362
362
363
365
367
369
371
373
375
382
383
383
388
393

401
403
410
412
417


Contents

11

12

13

14

15

10.9 The Wiener–Khinchin theorem
10.10Mean-square ergodic theorem for WSS processes
10.11Power spectral densities for non-WSS processes
Notes
Problems
Exam preparation
Advanced concepts in random processes
11.1 The Poisson process
11.2 Renewal processes
11.3 The Wiener process
11.4 Specification of random processes

Notes
Problems
Exam preparation
Introduction to Markov chains
12.1 Preliminary results
12.2 Discrete-time Markov chains
12.3 Recurrent and transient states
12.4 Limiting n-step transition probabilities
12.5 Continuous-time Markov chains
Notes
Problems
Exam preparation
Mean convergence and applications
13.1 Convergence in mean of order p
13.2 Normed vector spaces of random variables
13.3 The Karhunen–Lo`eve expansion
13.4 The Wiener integral (again)
13.5 Projections, orthogonality principle, projection theorem
13.6 Conditional expectation and probability
13.7 The spectral representation
Notes
Problems
Exam preparation
Other modes of convergence
14.1 Convergence in probability
14.2 Convergence in distribution
14.3 Almost-sure convergence
Notes
Problems
Exam preparation

Self similarity and long-range dependence
15.1 Self similarity in continuous time
15.2 Self similarity in discrete time
15.3 Asymptotic second-order self similarity
15.4 Long-range dependence
15.5 ARMA processes
15.6 ARIMA processes
Problems
Exam preparation
Bibliography
Index

ix
421
423
425
427
429
440
443
443
452
453
459
466
466
475
476
476
477

488
496
502
507
509
515
517
518
522
527
532
534
537
545
549
550
562
564
564
566
572
579
580
589
591
591
595
601
604
606

608
610
613
615
618


Chapter dependencies

1 Introduction to probability
2 Introduction to discrete random variables
3 More about discrete random variables

12.1−12.4 Discrete−time Markov chains

4 Continuous random variables
5 Cumulative distribution functions
and their applications

6 Statistics

7 Bivariate random variables

8 Introduction to random vectors

9 Gaussian random vectors
10 Introduction to random processes
11.1 The Poisson process
11.2−11.4 Advanced concepts in random processes


12.5 Continuous−time Markov chains

13 Mean convergence and applications
14 Other modes of convergence
15 Self similarity and long−range dependence

x


Preface
Intended audience
This book is a primary text for graduate-level courses in probability and random processes that are typically offered in electrical and computer engineering departments. The
text starts from first principles and contains more than enough material for a two-semester
sequence. The level of the text varies from advanced undergraduate to graduate as the
material progresses. The principal prerequisite is the usual undergraduate electrical and
computer engineering course on signals and systems, e.g., Haykin and Van Veen [25] or
Oppenheim and Willsky [39] (see the Bibliography at the end of the book). However, later
chapters that deal with random vectors assume some familiarity with linear algebra; e.g.,
determinants and matrix inverses.
How to use the book
A first course. In a course that assumes at most a modest background in probability, the
core of the offering would include Chapters 1–5 and 7. These cover the basics of probability
and discrete and continuous random variables. As the chapter dependencies graph on the
preceding page indicates, there is considerable flexibility in the selection and ordering of
additional material as the instructor sees fit.
A second course. In a course that assumes a solid background in the basics of probability and discrete and continuous random variables, the material in Chapters 1–5 and 7
can be reviewed quickly. In such a review, the instructor may want include sections and
problems marked with a , as these indicate more challenging material that might not
be appropriate in a first course. Following the review, the core of the offering would
include Chapters 8, 9, 10 (Sections 10.1–10.6), and Chapter 11. Additional material from

Chapters 12–15 can be included to meet course goals and objectives.
Level of course offerings. In any course offering, the level can be adapted to the
background of the class by omitting or including the more advanced sections, remarks,
and problems that are marked with a . In addition, discussions of a highly technical
nature are placed in a Notes section at the end of the chapter in which they occur. Pointers
to these discussions are indicated by boldface numerical superscripts in the text. These
notes can be omitted or included as the instructor sees fit.

Chapter features
• Key equations are boxed:
P(A|B) :=

P(A ∩ B)
.
P(B)

• Important text passages are highlighted:
Two events A and B are said to be independent if P(A ∩ B) = P(A) P(B).
xi


xii

Preface
• Tables of discrete random variables and of Fourier transform pairs are found inside
the front cover. A table of continuous random variables is found inside the back cover.
• The index was compiled as the book was written. Hence, there are many crossreferences to related information. For example, see “chi-squared random variable.”
• When cumulative distribution functions or other functions are encountered that do not
have a closed form, M ATLAB commands are given for computing them; see “Matlab
commands” in the index for a list. The use of many commands is illustrated in the

examples and the problems throughout most of the text. Although some commands
require the M ATLAB Statistics Toolbox, alternative methods are also suggested; e.g.,
the use of erf and erfinv for normcdf and norminv.
• Each chapter contains a Notes section. Throughout each chapter, numerical superscripts refer to discussions in the Notes section. These notes are usually rather technical and address subtleties of the theory.
• Each chapter contains a Problems section. There are more than 800 problems throughout the book. Problems are grouped according to the section they are based on, and
this is clearly indicated. This enables the student to refer to the appropriate part of
the text for background relating to particular problems, and it enables the instructor
to make up assignments more quickly. In chapters intended for a first course, the
more challenging problems are marked with a . Problems requiring M ATLAB are
indicated by the label MATLAB.
• Each chapter contains an Exam preparation section. This serves as a chapter summary, drawing attention to key concepts and formulas.

Acknowledgements
The writing of this book has been greatly improved by the suggestions of many people.
At the University of Wisconsin–Madison, the sharp eyes of the students in my classes
on probability and random processes, my research students, and my postdocs have helped
me fix countless typos and improve explanations of several topics. My colleagues here have
been generous with their comments and suggestions. Professor Rajeev Agrawal, now with
Motorola, convinced me to treat discrete random variables before continuous random variables. Discussions with Professor Bob Barmish on robustness of rational transfer functions
led to Problems 38–40 in Chapter 5. I am especially grateful to Professors Jim Bucklew, Yu
Hen Hu, and Akbar Sayeed, who taught from early, unpolished versions of the manuscript.
Colleagues at other universities and students in their classes have also been generous
with their support. I thank Professors Toby Berger, Edwin Chong, and Dave Neuhoff, who
have used recent manuscripts in teaching classes on probability and random processes and
have provided me with detailed reviews. Special thanks go to Professor Tom Denney for his
multiple careful reviews of each chapter.
Since writing is a solitary process, I am grateful to be surrounded by many supportive
family members. I especially thank my wife and son for their endless patience and faith
in me and this book, and I thank my parents for their encouragement and help when I was
preoccupied with writing.



1

Introduction to probability
Why do electrical and computer engineers need to study probability?
Probability theory provides powerful tools to explain, model, analyze, and design technology developed by electrical and computer engineers. Here are a few applications.
Signal processing. My own interest in the subject arose when I was an undergraduate
taking the required course in probability for electrical engineers. We considered the situation shown in Figure 1.1. To determine the presence of an aircraft, a known radar pulse v(t)

v( t )

(
v( t ) + X t
radar

linear
system

detector

Figure 1.1. Block diagram of radar detection system.

is sent out. If there are no objects in range of the radar, the radar’s amplifiers produce only a
noise waveform, denoted by Xt . If there is an object in range, the reflected radar pulse plus
noise is produced. The overall goal is to decide whether the received waveform is noise
only or signal plus noise. To get an idea of how difficult this can be, consider the signal
plus noise waveform shown at the top in Figure 1.2. Our class addressed the subproblem
of designing an optimal linear system to process the received waveform so as to make the
presence of the signal more obvious. We learned that the optimal transfer function is given

by the matched filter. If the signal at the top in Figure 1.2 is processed by the appropriate
matched filter, we get the output shown at the bottom in Figure 1.2. You will study the
matched filter in Chapter 10.
Computer memories. Suppose you are designing a computer memory to hold k-bit
words. To increase system reliability, you employ an error-correcting-code system. With
this system, instead of storing just the k data bits, you store an additional l bits (which are
functions of the data bits). When reading back the (k + l)-bit word, if at least m bits are read
out correctly, then all k data bits can be recovered (the value of m depends on the code). To
characterize the quality of the computer memory, we compute the probability that at least m
bits are correctly read back. You will be able to do this after you study the binomial random
variable in Chapter 3.
1


2

Introduction to probability

1

0

−1

0.5

0

Figure 1.2. Matched filter input (top) in which the signal is hidden by noise. Matched filter output (bottom) in
which the signal presence is obvious.


Optical communication systems. Optical communication systems use photodetectors
(see Figure 1.3) to interface between optical and electronic subsystems. When these sys-

light

photo−
detector

photoelectrons

Figure 1.3. Block diagram of a photodetector. The rate at which photoelectrons are produced is proportional to
the intensity of the light.

tems are at the limits of their operating capabilities, the number of photoelectrons produced
by the photodetector is well-modeled by the Poissona random variable you will study in
Chapter 2 (see also the Poisson process in Chapter 11). In deciding whether a transmitted
bit is a zero or a one, the receiver counts the number of photoelectrons and compares it
to a threshold. System performance is determined by computing the probability that the
threshold is exceeded.
Wireless communication systems. In order to enhance weak signals and maximize the
range of communication systems, it is necessary to use amplifiers. Unfortunately, amplifiers
always generate thermal noise, which is added to the desired signal. As a consequence of the
underlying physics, the noise is Gaussian. Hence, the Gaussian density function, which you
will meet in Chapter 4, plays a prominent role in the analysis and design of communication
systems. When noncoherent receivers are used, e.g., noncoherent frequency shift keying,
a Many important quantities in probability and statistics are named after famous mathematicians and
statisticians.
You can use an Internet search engine to find pictures and biographies of them on
the web. At the time of this writing, numerous biographies of famous mathematicians and statisticians can be found at and at

Pictures on stamps
and currency can be found at />

Relative frequency

3

this naturally leads to the Rayleigh, chi-squared, noncentral chi-squared, and Rice density
functions that you will meet in the problems in Chapters 4, 5, 7, and 9.
Variability in electronic circuits. Although circuit manufacturing processes attempt to
ensure that all items have nominal parameter values, there is always some variation among
items. How can we estimate the average values in a batch of items without testing all of
them? How good is our estimate? You will learn how to do this in Chapter 6 when you
study parameter estimation and confidence intervals. Incidentally, the same concepts apply
to the prediction of presidential elections by surveying only a few voters.
Computer network traffic. Prior to the 1990s, network analysis and design was carried
out using long-established Markovian models [41, p. 1]. You will study Markov chains
in Chapter 12. As self similarity was observed in the traffic of local-area networks [35],
wide-area networks [43], and in World Wide Web traffic [13], a great research effort began
to examine the impact of self similarity on network analysis and design. This research has
yielded some surprising insights into questions about buffer size vs. bandwidth, multipletime-scale congestion control, connection duration prediction, and other issues [41, pp. 9–
11]. In Chapter 15 you will be introduced to self similarity and related concepts.
In spite of the foregoing applications, probability was not originally developed to handle
problems in electrical and computer engineering. The first applications of probability were
to questions about gambling posed to Pascal in 1654 by the Chevalier de Mere. Later,
probability theory was applied to the determination of life expectancies and life-insurance
premiums, the theory of measurement errors, and to statistical mechanics. Today, the theory
of probability and statistics is used in many other fields, such as economics, finance, medical
treatment and drug studies, manufacturing quality control, public opinion surveys, etc.


Relative frequency
Consider an experiment that can result in M possible outcomes, O1 , . . . , OM . For example, in tossing a die, one of the six sides will land facing up. We could let Oi denote
the outcome that the ith side faces up, i = 1, . . . , 6. Alternatively, we might have a computer
with six processors, and Oi could denote the outcome that a program or thread is assigned to
the ith processor. As another example, there are M = 52 possible outcomes if we draw one
card from a deck of playing cards. Similarly, there are M = 52 outcomes if we ask which
week during the next year the stock market will go up the most. The simplest example we
consider is the flipping of a coin. In this case there are two possible outcomes, “heads” and
“tails.” Similarly, there are two outcomes when we ask whether or not a bit was correctly
received over a digital communication system. No matter what the experiment, suppose
we perform it n times and make a note of how many times each outcome occurred. Each
performance of the experiment is called a trial.b Let Nn (Oi ) denote the number of times Oi
occurred in n trials. The relative frequency of outcome Oi ,
Nn (Oi )
,
n
is the fraction of times Oi occurred.
b When

there are only two outcomes, the repeated experiments are called Bernoulli trials.


4

Introduction to probability
Here are some simple computations using relative frequency. First,
Nn (O1 ) + · · · + Nn (OM ) = n,

and so


Nn (O1 )
Nn (OM )
+···+
= 1.
(1.1)
n
n
Second, we can group outcomes together. For example, if the experiment is tossing a die,
let E denote the event that the outcome of a toss is a face with an even number of dots; i.e.,
E is the event that the outcome is O2 , O4 , or O6 . If we let Nn (E) denote the number of times
E occurred in n tosses, it is easy to see that
Nn (E) = Nn (O2 ) + Nn (O4 ) + Nn (O6 ),
and so the relative frequency of E is
Nn (E)
Nn (O2 ) Nn (O4 ) Nn (O6 )
=
+
+
.
n
n
n
n

(1.2)

Practical experience has shown us that as the number of trials n becomes large, the relative frequencies settle down and appear to converge to some limiting value. This behavior
is known as statistical regularity.
Example 1.1. Suppose we toss a fair coin 100 times and note the relative frequency of
heads. Experience tells us that the relative frequency should be about 1/2. When we did

this,c we got 0.47 and were not disappointed.
The tossing of a coin 100 times and recording the relative frequency of heads out of 100
tosses can be considered an experiment in itself. Since the number of heads can range from
0 to 100, there are 101 possible outcomes, which we denote by S0 , . . . , S100 . In the preceding
example, this experiment yielded S47 .
Example 1.2. We performed the experiment with outcomes S0 , . . . , S100 1000 times and
counted the number of occurrences of each outcome. All trials produced between 33 and 68
heads. Rather than list N1000 (Sk ) for the remaining values of k, we summarize as follows:
N1000 (S33 ) + N1000 (S34 ) + N1000 (S35 ) = 4
N1000 (S36 ) + N1000 (S37 ) + N1000 (S38 ) = 6
N1000 (S39 ) + N1000 (S40 ) + N1000 (S41 ) = 32
N1000 (S42 ) + N1000 (S43 ) + N1000 (S44 ) = 98
N1000 (S45 ) + N1000 (S46 ) + N1000 (S47 ) = 165
N1000 (S48 ) + N1000 (S49 ) + N1000 (S50 ) = 230
N1000 (S51 ) + N1000 (S52 ) + N1000 (S53 ) = 214
N1000 (S54 ) + N1000 (S55 ) + N1000 (S56 ) = 144
c We did not actually toss a coin. We used a random number generator to simulate the toss of a fair coin.
Simulation is discussed in Chapters 5 and 6.


What is probability theory?

5

N1000 (S57 ) + N1000 (S58 ) + N1000 (S59 ) = 76
N1000 (S60 ) + N1000 (S61 ) + N1000 (S62 ) = 21
N1000 (S63 ) + N1000 (S64 ) + N1000 (S65 ) = 9
N1000 (S66 ) + N1000 (S67 ) + N1000 (S68 ) = 1.
This summary is illustrated in the histogram shown in Figure 1.4. (The bars are centered
over values of the form k/100; e.g., the bar of height 230 is centered over 0.49.)

250
200
150
100
50
0
0.3

0.4

0.5

0.6

0.7

Figure 1.4. Histogram of Example 1.2 with overlay of a Gaussian density.

Below we give an indication of why most of the time the relative frequency of heads is
close to one half and why the bell-shaped curve fits so well over the histogram. For now
we point out that the foregoing methods allow us to determine the bit-error rate of a digital
communication system, whether it is a wireless phone or a cable modem connection. In
principle, we simply send a large number of bits over the channel and find out what fraction
were received incorrectly. This gives an estimate of the bit-error rate. To see how good an
estimate it is, we repeat the procedure many times and make a histogram of our estimates.

What is probability theory?
Axiomatic probability theory, which is the subject of this book, was developed by A.
N. Kolmogorovd in 1933. This theory specifies a set of axioms for a well-defined mathematical model of physical experiments whose outcomes exhibit random variability each
time they are performed. The advantage of using a model rather than performing an experiment itself is that it is usually much more efficient in terms of time and money to analyze

a mathematical model. This is a sensible approach only if the model correctly predicts the
behavior of actual experiments. This is indeed the case for Kolmogorov’s theory.
A simple prediction of Kolmogorov’s theory arises in the mathematical model for the
relative frequency of heads in n tosses of a fair coin that we considered in Example 1.1. In
the model of this experiment, the relative frequency converges to 1/2 as n tends to infinity;
d The

website is devoted to Kolmogorov.


6

Introduction to probability

this is a special case of the the strong law of large numbers, which is derived in Chapter 14.
(A related result, known as the weak law of large numbers, is derived in Chapter 3.)
Another prediction of Kolmogorov’s theory arises in modeling the situation in Example 1.2. The theory explains why the histogram in Figure 1.4 agrees with the bell-shaped
curve overlaying it. In the model, the strong law tells us that for each k, the relative frequency of having exactly k heads in 100 tosses should be close to
1
100!
.
100
k!(100 − k)! 2
Then, by the central limit theorem, which is derived in Chapter 5, the above expression is
approximately equal to (see Example 5.19)
1 k − 50
1
√ exp −
2
5

5 2π

2

.

(You should convince yourself that the graph of e−x is indeed a bell-shaped curve.)
Because Kolmogorov’s theory makes predictions that agree with physical experiments,
it has enjoyed great success in the analysis and design of real-world systems.
2

1.1 Sample spaces, outcomes, and events
Sample spaces
To model systems that yield uncertain or random measurements, we let Ω denote the
set of all possible distinct, indecomposable measurements that could be observed. The set
Ω is called the sample space. Here are some examples corresponding to the applications
discussed at the beginning of the chapter.
Signal processing. In a radar system, the voltage of a noise waveform at time t can be
viewed as possibly being any real number. The first step in modeling such a noise voltage
is to consider the sample space consisting of all real numbers, i.e., Ω = (−∞, ∞).
Computer memories. Suppose we store an n-bit word consisting of all 0s at a particular
location. When we read it back, we may not get all 0s. In fact, any n-bit word may be read
out if the memory location is faulty. The set of all possible n-bit words can be modeled by
the sample space
Ω = {(b1 , . . . , bn ) : bi = 0 or 1}.
Optical communication systems. Since the output of a photodetector is a random
number of photoelectrons. The logical sample space here is the nonnegative integers,

Ω = {0, 1, 2, . . .}.
Notice that we include 0 to account for the possibility that no photoelectrons are observed.

Wireless communication systems. Noncoherent receivers measure the energy of the
incoming waveform. Since energy is a nonnegative quantity, we model it with the sample
space consisting of the nonnegative real numbers, Ω = [0, ∞).
Variability in electronic circuits. Consider the lowpass RC filter shown in Figure 1.5(a).
Suppose that the exact values of R and C are not perfectly controlled by the manufacturing
process, but are known to satisfy
95 ohms ≤ R ≤ 105 ohms

and

300 µ F ≤ C ≤ 340 µ F.


1.1 Sample spaces, outcomes, and events

7

c
340
R
+


+

C


300


r
95

(a)

105

(b)

Figure 1.5. (a) Lowpass RC filter. (b) Sample space for possible values of R and C.

This suggests that we use the sample space of ordered pairs of real numbers, (r, c), where
95 ≤ r ≤ 105 and 300 ≤ c ≤ 340. Symbolically, we write
Ω = {(r, c) : 95 ≤ r ≤ 105 and 300 ≤ c ≤ 340},
which is the rectangular region in Figure 1.5(b).
Computer network traffic. If a router has a buffer that can store up to 70 packets, and
we want to model the actual number of packets waiting for transmission, we use the sample
space
Ω = {0, 1, 2, . . . , 70}.
Notice that we include 0 to account for the possibility that there are no packets waiting to
be sent.
Outcomes and events
Elements or points in the sample space Ω are called outcomes. Collections of outcomes
are called events. In other words, an event is a subset of the sample space. Here are some
examples.
If the sample space is the real line, as in modeling a noise voltage, the individual numbers such as 1.5, −8, and π are outcomes. Subsets such as the interval
[0, 5] = {v : 0 ≤ v ≤ 5}
are events. Another event would be {2, 4, 7.13}. Notice that singleton sets, that is sets
consisting of a single point, are also events; e.g., {1.5}, {−8}, {π }. Be sure you understand
the difference between the outcome −8 and the event {−8}, which is the set consisting of

the single outcome −8.
If the sample space is the set of all triples (b1 , b2 , b3 ), where the bi are 0 or 1, then any
particular triple, say (0, 0, 0) or (1, 0, 1) would be an outcome. An event would be a subset
such as the set of all triples with exactly one 1; i.e.,
{(0, 0, 1), (0, 1, 0), (1, 0, 0)}.
An example of a singleton event would be {(1, 0, 1)}.


8

Introduction to probability

In modeling the resistance and capacitance of the RC filter above, we suggested the
sample space
Ω = {(r, c) : 95 ≤ r ≤ 105 and 300 ≤ c ≤ 340},
which was shown in Figure 1.5(b). If a particular circuit has R = 101 ohms and C = 327 µ F,
this would correspond to the outcome (101, 327), which is indicated by the dot in Figure 1.6.
If we observed a particular circuit with R ≤ 97 ohms and C ≥ 313 µ F, this would correspond
to the event
{(r, c) : 95 ≤ r ≤ 97 and 313 ≤ c ≤ 340},
which is the shaded region in Figure 1.6.
c
340

300

r
95

105


Figure 1.6. The dot is the outcome (101, 327). The shaded region is the event {(r, c) : 95 ≤ r ≤ 97 and 313 ≤ c ≤
340}.

1.2 Review of set notation
Since sample spaces and events use the language of sets, we recall in this section some
basic definitions, notation, and properties of sets.
Let Ω be a set of points. If ω is a point in Ω, we write ω ∈ Ω. Let A and B be two
collections of points in Ω. If every point in A also belongs to B, we say that A is a subset of
B, and we denote this by writing A ⊂ B. If A ⊂ B and B ⊂ A, then we write A = B; i.e., two
sets are equal if they contain exactly the same points. If A ⊂ B but A = B, we say that A is a
proper subset of B.
Set relationships can be represented graphically in Venn diagrams. In these pictures,
the whole space Ω is represented by a rectangular region, and subsets of Ω are represented
by disks or oval-shaped regions. For example, in Figure 1.7(a), the disk A is completely
contained in the oval-shaped region B, thus depicting the relation A ⊂ B.
Set operations
/ A. The set of all such ω is
If A ⊂ Ω, and ω ∈ Ω does not belong to A, we write ω ∈
called the complement of A in Ω; i.e.,
/ A}.
A c := {ω ∈ Ω : ω ∈
This is illustrated in Figure 1.7(b), in which the shaded region is the complement of the disk
A.
The empty set or null set is denoted by ∅; it contains no points of Ω. Note that for any
A ⊂ Ω, ∅ ⊂ A. Also, Ω c = ∅.


1.2 Review of set notation


A

9

B

A
Ac
(b)

(a)

Figure 1.7. (a) Venn diagram of A ⊂ B. (b) The complement of the disk A, denoted by A c , is the shaded part of the
diagram.

The union of two subsets A and B is
A ∪ B := {ω ∈ Ω : ω ∈ A or ω ∈ B}.
Here “or” is inclusive; i.e., if ω ∈ A ∪ B, we permit ω to belong either to A or to B or to
both. This is illustrated in Figure 1.8(a), in which the shaded region is the union of the disk
A and the oval-shaped region B.

A

B

A

B

(b)


(a)

Figure 1.8. (a) The shaded region is A ∪ B. (b) The shaded region is A ∩ B.

The intersection of two subsets A and B is
A ∩ B := {ω ∈ Ω : ω ∈ A and ω ∈ B};
hence, ω ∈ A∩B if and only if ω belongs to both A and B. This is illustrated in Figure 1.8(b),
in which the shaded area is the intersection of the disk A and the oval-shaped region B. The
reader should also note the following special case. If A ⊂ B (recall Figure 1.7(a)), then
A ∩ B = A. In particular, we always have A ∩ Ω = A and ∅ ∩ B = ∅.
The set difference operation is defined by
B \ A := B ∩ A c ,
i.e., B \ A is the set of ω ∈ B that do not belong to A. In Figure 1.9(a), B \ A is the shaded
part of the oval-shaped region B. Thus, B \ A is found by starting with all the points in B and
then removing those that belong to A.
Two subsets A and B are disjoint or mutually exclusive if A ∩ B = ∅; i.e., there is no
point in Ω that belongs to both A and B. This condition is depicted in Figure 1.9(b).


10

Introduction to probability

A
B

A

B


(b)

(a)

Figure 1.9. (a) The shaded region is B \ A. (b) Venn diagram of disjoint sets A and B.

Example 1.3. Let Ω := {0, 1, 2, 3, 4, 5, 6, 7}, and put
A := {1, 2, 3, 4},

B := {3, 4, 5, 6},

and C := {5, 6}.

Evaluate A ∪ B, A ∩ B, A ∩C, A c , and B \ A.
Solution. It is easy to see that A ∪ B = {1, 2, 3, 4, 5, 6}, A ∩ B = {3, 4}, and A ∩C = ∅.
Since A c = {0, 5, 6, 7},
B \ A = B ∩ A c = {5, 6} = C.

Set identities
Set operations are easily seen to obey the following relations. Some of these relations
are analogous to the familiar ones that apply to ordinary numbers if we think of union as
the set analog of addition and intersection as the set analog of multiplication. Let A, B, and
C be subsets of Ω. The commutative laws are
A ∪ B = B ∪ A and

A ∩ B = B ∩ A.

(1.3)


A ∩ (B ∩C) = (A ∩ B) ∩C.

(1.4)

The associative laws are
A ∪ (B ∪C) = (A ∪ B) ∪C

and

The distributive laws are
A ∩ (B ∪C) = (A ∩ B) ∪ (A ∩C)

(1.5)

A ∪ (B ∩C) = (A ∪ B) ∩ (A ∪C).

(1.6)

and
De Morgan’s laws are
(A ∩ B) c = A c ∪ B c

and

(A ∪ B) c = A c ∩ B c .

(1.7)

Formulas (1.3)–(1.5) are exactly analogous to their numerical counterparts. Formulas (1.6)
and (1.7) do not have numerical counterparts. We also recall that A ∩ Ω = A and ∅ ∩ B = ∅;

hence, we can think of Ω as the analog of the number one and ∅ as the analog of the number
zero. Another analog is the formula A ∪ ∅ = A.


1.2 Review of set notation

11

We next consider infinite collections of subsets of Ω. It is important to understand how
to work with unions and intersections of infinitely many subsets. Infinite unions allow us
to formulate questions about some event ever happening if we wait long enough. Infinite
intersections allow us to formulate questions about some event never happening no matter
how long we wait.
Suppose An ⊂ Ω, n = 1, 2, . . . . Then


An := {ω ∈ Ω : ω ∈ An for some 1 ≤ n < ∞}.
n=1
In other words, ω ∈ ∞
n=1 An if and only if for at least one integer n satisfying 1 ≤ n
ω ∈ An . This definition admits the possibility that ω ∈ An for more than one value
Next, we define

< ∞,
of n.



An := {ω ∈ Ω : ω ∈ An for all 1 ≤ n < ∞}.
n=1


n=1 An if and only if ω ∈ An for every positive integer

In other words, ω ∈
n.
Many examples of infinite unions and intersections can be given using intervals of real
numbers such as (a, b), (a, b], [a, b), and [a, b]. (This notation is reviewed in Problem 5.)
Example 1.4. Let Ω denote the real numbers, Ω = IR := (−∞, ∞). Then the following
infinite intersections and unions can be simplified. Consider the intersection


(−∞, 1/n) = {ω : ω < 1/n for all 1 ≤ n < ∞}.

n=1

Now, if ω < 1/n for all 1 ≤ n < ∞, then ω cannot be positive; i.e., we must have ω ≤ 0.
Conversely, if ω ≤ 0, then for all 1 ≤ n < ∞, ω ≤ 0 < 1/n. It follows that


(−∞, 1/n) = (−∞, 0].

n=1

Consider the infinite union,


(−∞, −1/n] = {ω : ω ≤ −1/n for some 1 ≤ n < ∞}.

n=1


Now, if ω ≤ −1/n for some n with 1 ≤ n < ∞, then we must have ω < 0. Conversely, if
ω < 0, then for large enough n, ω ≤ −1/n. Thus,


(−∞, −1/n] = (−∞, 0).

n=1

In a similar way, one can show that


[0, 1/n) = {0},

n=1

as well as



(−∞, n] = (−∞, ∞) and

n=1



(−∞, −n] = ∅.

n=1



×