Tải bản đầy đủ (.pdf) (377 trang)

Simulation and the Monte Carlo Method Second Edition potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (15.05 MB, 377 trang )

SIMULATION
AND
THE
MONTE
CARL0
METHOD
Second
Edition
Reuven
Y.
Rubinstein
Technion
Dirk
P.
Kroese
University
of
Queensland
BICENTENNIAL
BICENTENNIAL
WILEY-INTERSCIENCE
A
John
Wiley
&
Sons,
Inc.,
Publication
This Page Intentionally Left Blank
SIMULATION AND THE


MONTE CARL0 METHOD
::
THE
WILEY
BICENTENNIAL-KNOWLEDGE
FOR
GENERATIONS
1807:
BWILEY
2007
f
~~~~~
Gach generation has its unique needs and aspirations. When Charles Wiley first
opened his small printing shop in lower Manhattan in
1807,
it was a generation
of boundless potential searching for an identity. And we were there, helping to
define a new American literary tradition. Over half a century later, in the midst
of the Second Industrial Revolution, it was a generation focused on building the
future. Once again, we were there, supplying the critical scientific, technical, and
engineering knowledge that helped frame the world. Throughout the 20th
Century, and into the new millennium, nations began to reach out beyond their
own borders and a new international community was born. Wiley was there,
expanding its operations around the world to enable a global exchange of ideas,
opinions, and know-how.
For
200
years, Wiley has been an integral part of each generation’s journey,
enabling the flow
of

information and understanding necessary to meet their needs
and fulfill their aspirations. Today, bold new technologies are changing the way
we live and learn. Wiley will be there, providing you the must-have knowledge
you need to imagine new worlds, new possibilities, and new opportunities.
Generations come and go, but you can always count on Wiley to provide you the
knowledge you need, when and where you need it!
4
WILLIAM
J.
PESCE
PETER
BOOTH
WILEV
PRESIDENT
AND
CHIEF EXECUTIVE OmCER
CHAJRMAN
OF
THE
BOARD
SIMULATION
AND
THE
MONTE
CARL0
METHOD
Second
Edition
Reuven
Y.

Rubinstein
Technion
Dirk
P.
Kroese
University
of
Queensland
BICENTENNIAL
BICENTENNIAL
WILEY-INTERSCIENCE
A
John
Wiley
&
Sons,
Inc.,
Publication
Copyright
0
2008 by John Wiley
&
Sons, Inc. All rights reserved
Published by John Wiley
&
Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada.
No
part of this publication may be reproduced, stored in
a

retrieval system,
or
transmitted in any form
or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except
as
permitted under Section 107
or
108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher,
or
authorization through payment of the appropriate per-copy fee
to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
(978) 750-4470,
or
on the web at www.copyright.com. Requests to the Publisher
for
permission should
be addressed to the Permissions Department, John Wiley
&
Sons, Inc.,
11
1
River Street, Hoboken,
NJ
07030, (201) 748-601
1,
fax (201) 748-6008,
or

online at
lgo/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations
or
warranties with respect to the accuracy
or
completeness of the contents
of
this book and specifically disclaim any implied warranties of
merchantability
or
fitness for
a
particular purpose. No warranty may be created
or
extended by sales
representatives
or
written sales materials. The advice and strategies contained herein may not be
suitable for your situation.
You
should consult with
a
professional where appropriate. Neither the
publisher nor author shall be liable for any
loss
of profit
or
any other commercial damages, including

but not limited to special, incidental, consequential,
or
other damages.
For
general information on
our
other products and services
or
for technical support, please contact
our
Customer Care Department within the United States at
(800)
762-2974, outside the United States at
(3 17) 572-3993
or
fax (3 17) 572-4002.
Wiley
also
publishes its books in a variety
of
electronic formats. Some content that appears in print may
not be available in electronic format.
For
information
about
Wiley products, visit
our
web site at
www.wiley.com.
Wiley Bicentennial Logo: Richard

J.
Pacific0
Library
of
Congrcss Cataloging-in-Publication Data:
Rubinstein, Reuven
Y.
Simulation and the monte carlo method.

2nd
ed.
/
Reuven
Y.
Rubinstein.
Dirk P. Kroese.
Includes index.
ISBN
978-0-470-1 7794-5 (cloth
:
acid-free paper)
1.
Monte
Carlo
method. 2. Digital computer simulation.
1.
Kroese, Dirk
P.
QA298.R8
2008

5
18'.282-d~22 2007029068
p. cm.
-
(Wiley series in probability and statistics)
11.
Title.
Printed in the United States of America.
10987654321
To
my
friends and colleagues S0ren Asmussen and Peter
Glynn
-
RYR
In
memory
of
my
parents Albert and Anna Kroese
-
DPK
This Page Intentionally Left Blank
CONTENTS
Preface
Acknowledgments
1
Preliminaries
1.1 Random Experiments
1.2 Conditional Probability and Independence

1.3
1.4 Some Important Distributions
1.5 Expectation
1.6 Joint Distributions
1.7 Functions of Random Variables
1.7.1 Linear Transformations
1.7.2 General Transformations
Random Variables and Probability Distributions
1.8 Transforms
1.9 Jointly Normal Random Variables
1.10 Limit Theorems
1.1 1 Poisson Processes
1.12 Markov Processes
1.12.1 Markov Chains
I.
12.2 Classification of States
1.12.3 Limiting Behavior

XI11
xvii
1
1
2
3
5
6
7
10
11
12

13
14
15
16
18
19
20
21
vii
Viii
CONTENTS
1.12.4
Reversibility
1.12.5
Markov Jump Processes
1.13
Efficiency of Estimators
1.13.1
Complexity
1.14
Information
1.14.1
Shannon Entropy
1.14.2
Kullback-Leibler Cross-Entropy
1.14.3
The Maximum Likelihood Estimator and the Score Function
1.14.4
Fisher Information
1.15

Convex Optimization and Duality
1.15.1
Lagrangian Method
1.15.2
Duality
Problems
References
2
Random Number, Random Variable, and Stochastic Process
Generation
2.1
Introduction
2.2
Random Number Generation
2.3
Random Variable Generation
2.3.1
Inverse-Transform Method
2.3.2
Alias Method
2.3.3
Composition Method
2.3.4
Acceptance-Rejection Method
Generating From Commonly Used Distributions
2.4.1
Generating Continuous Random Variables
2.4.2
Generating Discrete Random Variables
2.5.1

Vector Acceptance-Rejection Method
2.5.2
Generating Variables from
a
Multinormal Distribution
2.5.3
Generating Uniform Random Vectors Over a Simplex
2.5.4
Generating Random Vectors Uniformly Distributed Over a Unit
Hyperball and Hypersphere
2.5.5
Generating Random Vectors Uniformly Distributed Over a
Hyperellipsoid
2.4
2.5
Random Vector Generation
2.6
Generating Poisson Processes
2.7
Generating Markov Chains and Markov Jump Processes
2.7.1
Random Walk on a Graph
2.7.2
Generating Markov Jump Processes
Problems
References
2.8
Generating Random Permutations
23
24

26
28
28
29
31
32
33
34
36
37
41
46
49
49
49
51
51
54
54
55
58
58
63
65
66
67
68
69
70
70

72
72
73
74
75
80
CONTENTS
ix
3
Simulation
of
Discrete-Event Systems
3.1
Simulation Models
3.1.1
Classification of Simulation Models
Simulation Clock and Event List
for
DEDS
3.2
3.3
Discrete-Event Simulation
3.3.1
Tandem Queue
3.3.2
Repairman Problem
Problems
References
4
Statistical Analysis

of
Discrete-Event Systems
4.1
Introduction
4.2
Static Simulation Models
4.3
Dynamic Simulation Models
4.2.1
Confidence Interval
4.3.1
Finite-Horizon Simulation
4.3.2
Steady-State Simulation
Problems
References
4.4
The Bootstrap Method
5
Controlling the Variance
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
Introduction

Common and Antithetic Random Variables
Control Variables
Conditional Monte Carlo
5.4.1
Variance Reduction for Reliability Models
Stratified Sampling
Importance Sampling
5.6.1
Weighted Samples
5.6.2
The Variance Minimization Method
5.6.3
The Cross-Entropy Method
Sequential Importance Sampling
5.7.1
Nonlinear Filtering
for
Hidden Markov Models
The Transform Likelihood Ratio Method
Preventing the Degeneracy of Importance Sampling
5.9.1
The Two-Stage Screening Algorithm
5.9.2
Case Study
Problems
References
81
82
84
85

87
87
91
94
96
97
97
98
100
101
102
103
113
115
118
119
119
120
123
125
126
129
131
132
132
136
141
144
148
151

153
158
161
165
X
CONTENTS
6
Markov Chain Monte Carlo
167
167
168
173
175
178
178
179
181
183
185
186
189
192
194
199
6.1
6.2
6.3
6.4
6.5
6.6

6.7
6.8
6.9
Introduction
The Metropolis-Hastings Algorithm
The Hit-and-Run Sampler
The Gibbs Sampler
Ising and Potts Models
6.5.1
Ising Model
6.5.2
Potts Model
Bayesian Statistics
*
Other Markov Samplers
6.7.1
Slice Sampler
6.7.2
Reversible Jump Sampler
Simulated Annealing
Perfect Sampling
Problems
References
7
Sensitivity Analysis and Monte Carlo Optimization
7.1
Introduction
7.2
7.3
Simulation-Based Optimization

of
DESS
The Score Function Method for Sensitivity Analysis of DESS
7.3.1
Stochastic Approximation
7.3.2
The Stochastic Counterpart Method
Problems
References
7.4
Sensitivity Analysis of DEDS
8
The Cross-Entropy Method
8.1
Introduction
8.2
Estimation of Rare-Event Probabilities
8.2.1
The Root-Finding Problem
8.2.2
The Screening Method for Rare Events
The CE Method for Optimization
8.3
8.4
The Max-cut Problem
8.5
The Partition Problem
8.5.1
Empirical Computational Complexity
8.6

The Traveling Salesman Problem
8.6.1
Incomplete Graphs
8.6.2
Node Placement
8.6.3
Case Studies
8.7
Continuous Optimization
201
20
1
203
21
1
212
215
225
230
233
235
235
236
245
245
249
253
259
260
260

265
266
267
268
CONTENTS
8.8
Noisy Optimization
Problems
References
9
Counting via Monte Carlo
9.1 Counting Problems
9.2 Satisfiability Problem
9.3
9.2.1 Random K-SAT (K-RSAT)
The Rare-Event Framework for Counting
9.3.1 Rare Events for the Satisfiability Problem
Other Randomized Algorithms for Counting
9.4.1
%*
is a Union of Some Sets
9.4.2 Complexity of Randomized Algorithms: FPRAS and FPAUS
9.4.3 FPRAS for
SATs
in CNF
9.5 MinxEnt and Parametric MinxEnt
9.5.1 The MinxEnt Method
9.5.2 Rare-Event Probability Estimation Using PME
9.6 PME for Combinatorial Optimization Problems and Decision Making
9.7 Numerical Results

Problems
References
9.4
Appendix
A.
1
A.2
A.3
A.4
AS
A.6
A.7
A.8
Cholesky Square Root Method
Exact Sampling from a Conditional Bernoulli Distribution
Exponential Families
Sensitivity Analysis
A.4.1 Convexity Results
A.4.2 Monotonicity Results
A Simple CE Algorithm for Optimizing the Peaks Function
Discrete-time Kalman Filter
Bernoulli Disruption Problem
Complexity
of
Stochastic Programming Problems
Problems
References
Abbreviations and Acronyms
List
of

Symbols
Index
xi
269
27
1
275
279
279
280
283
284
287
288
29
1
294
297
297
297
30
1
306
307
311
312
31
5
315
316

317
3 20
321
322
323
323
3 24
326
334
335
336
338
34
1
This Page Intentionally Left Blank
PREFACE
Since the publication in
198
1
of
Simulation and the
Monte
Carlo
Method,
dramatic changes
have taken place in the entire field of Monte Carlo simulation. This long-awaited second
edition gives a fully updated and comprehensive account of the major topics in Monte Carlo
simulation.
The book is based on an undergraduate course on Monte Carlo methods given at the
Israel Institute of Technology (Technion) and the University of Queensland for the past five

years. It
is
aimed at a broad audience of students in engineering, physical and life sciences,
statistics, computer science and mathematics, as well as anyone interested in using Monte
Carlo simulation in his or her study or work. Our aim is to provide an accessible introduction
to modem Monte Carlo methods, focusing on the main concepts while providing a sound
foundation for problem solving. For this reason, most ideas are introduced and explained
via concrete examples, algorithms, and experiments.
Although we assume that the reader has some basic mathematical knowledge, such as
gained from an elementary course in probability and statistics, we nevertheless review the
basic concepts of probability, Markov processes, and convex optimization in Chapter
1.
In a typical stochastic simulation, randomness is introduced into simulation models via
independent uniformly distributed random variables. These random variables are then used
as building blocks to simulate more general stochastic systems. Chapter
2
deals with the
generation of such random numbers, random variables, and stochastic processes.
Many real-world complex systems can be modeled as discrete-event systems. Examples
of discrete-event systems include traffic systems, flexible manufacturing systems, computer-
communications systems, inventory systems, production lines, coherent lifetime systems,
PERT networks, and flow networks. The behavior of such systems is identified via a
xiii
xiv
PREFACE
sequence of discrete events, which causes the system to change from one state to another.
We discuss how to model such systems on a computer in Chapter
3.
Chapter4 treats the statistical analysis of the output data from static and dynamic models.
The main difference is that the former do not evolve in time, while the latter do. For the latter,

we distinguish between finite-horizon and steady-state simulation. Two popular methods
for estimating steady-state performance measures
-
the batch means and regenerative
methods
-
are discussed as well.
Chapter
5
deals with variance reduction techniques in Monte Carlo simulation, such
as antithetic and common random numbers, control random variables, conditional Monte
Carlo, stratified sampling, and importance sampling. The last is the most widely used vari-
ance reduction technique. Using importance sampling, one can often achieve substantial
(sometimes dramatic) variance reduction, in particular when estimating rare-event proba-
bilities. While dealing with importance sampling we present two alternative approaches,
called the variance minimization and cross-entropy methods. In addition, this chapter con-
tains two new importance sampling-based methods, called the transform likelihood ratio
method and the screening method for variance reduction. The former presents a simple,
convenient, and unifying way of constructing efficient IS estimators, while the latter ensures
lowering of the dimensionality of the importance sampling density. This is accomplished
by identifying (screening out) the most important (bottleneck) parameters to be used in the
importance sampling distribution.
As
a result, the accuracy of the importance sampling
estimator increases substantially.
We present a case study for a high-dimensional complex electric power system and show
that without screening the importance sampling estimator, containing hundreds of likelihood
ratio terms, would be quite unstable and thus would fail to work. In contrast, when using
screening, one obtains an accurate low-dimensional importance sampling estimator.
Chapter

6
gives a concise treatment of the generic Markov chain Monte Carlo (MCMC)
method for approximately generating samples from an arbitrary distribution. We discuss the
classic Metropolis-Hastings algorithm and the Gibbs sampler. In the former, one simulates
a Markov chain such that its stationary distribution coincides with the target distribution,
while in the latter, the underlying Markov chain
is
constructed on the basis of a sequence of
conditional distributions. We also deal with applications of MCMC in Bayesian statistics
and explain how MCMC is used to sample from the Boltzmann distribution for the Ising
and Potts models, which are extensively used in statistical mechanics. Moreover, we show
how MCMC
is
used in the simulated annealing method to find the global minimum of a
multiextremal function. Finally, we show that both the Metropolis-Hastings and Gibbs
samplers can be viewed as special cases of a general MCMC algorithm and then present
two more modifications, namely, the slice and reversible jump samplers.
Chapter
7
focuses on sensitivity analysis and Monte Carlo optimization of simulated
systems. Because of their complexity, the performance evaluation of discrete-event
sys-
tems
is
usually studied by simulation, and it is often associated with the estimation of the
performance function with respect to some controllable parameters. Sensitivity analysis
is concerned with evaluating sensitivities (gradients, Hessians, etc.) of the performance
function with respect to system parameters. It provides guidance to operational decisions
and plays an important role in selecting system parameters that optimize the performance
measures. Monte Carlo optimization deals with solving stochastic programs, that is, opti-

mization problems where the objective function and some of the constraints are unknown
and need to be obtained via simulation. We deal with sensitivity analysis and optimization
of both static and dynamic models. We introduce the celebrated score function method
for sensitivity analysis, and two alternative methods for Monte Carlo optimization, the
so-
PREFACE
XV
called
stochastic approximation
and
stochastic counterpart
methods. In particular, in the
latter method, we show how, using a single simulation experiment, one can approximate
quite accurately the true unknown optimal solution of the original deterministic program.
Chapter
8
deals with the
cross-entropy
(CE) method, which was introduced by the first
author in
1997
as an adaptive algorithm for rare-event estimation using a CE minimization
technique. It was soon realized that the underlying ideas had a much wider range of ap-
plication than just in rare event simulation; they could be readily adapted to tackle quite
general combinatorial and multiextremal optimization problems, including many problems
associated with learning algorithms and neural computation. We provide a gradual intro-
duction to the CE method and show its elegance and versatility. In particular, we present a
general CE algorithm for the estimation of rare-event probabilities and then slightly mod-
ify it for solving combinatorial optimization problems. We discuss applications of the CE
method to several combinatorial optimization problems, such as the max-cut problem and

the traveling salesman problem, and provide supportive numerical results on its effective-
ness. Due to its versatility, tractability, and simplicity, the CE method has great potential
for a diverse range
of
new applications, for example in the fields
of
computational biology,
DNA sequence alignment, graph theory, and scheduling. During the past five to six years at
least
100
papers have been written on the theory and applications of CE. For more details,
see the Web site
www.
cemethod.
org;
the book by R.
Y.
Rubinstein and D.
P.
Kroese,
The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-
Carlo Simulation and Machine Learning
(Springer,
2004);
or Wikipedia under the name
cross-entropy method.
Finally, Chapter
9
deals with difficult
counting problems,

which occur frequently in
many important problems in science, engineering, and mathematics. We show how these
problems can be viewed as particular instances of
estimation
problems and thus can be
solved efficiently via Monte Carlo techniques, such as importance sampling and MCMC.
We also show how to resolve the “degeneracy” in the likelihood ratio, which typically
occurs in high-dimensional counting problems, by introducing a particular modification of
the classic
MinxEnt
method called
parametric MinxEnt.
A wide range of problems is provided at the end of each chapter. More difficult sections
and problems are marked with an asterisk
(*).
Additional material, including a brief intro-
duction to exponential families, a discussion on the computational complexity of stochastic
programming problems, and sample Matlab programs, is given in the Appendix. This book
is accompanied by a detailed
solutions manual.
REUVEN RUBINSTEIN
AND
DIRK
KROESE
Haija
and
Brisbane
July,
2007
This Page Intentionally Left Blank

ACKNOWLEDGMENTS
We thank all who contributed to this book.
Robert Smith and Zelda Zabinski read and
provided useful suggestions on Chapter
6.
Alex Shapiro kindly provided a detailed account
of the complexity of stochastic programming problems (Section A.8).
We are grateful
to the many undergraduate and graduate students at the Technion and the University of
Queensland who helped make this book possible and whose valuable ideas and experi-
ments were extremely encouraging and motivating: Yohai Gat, Uri Dubin, Rostislav Man,
Leonid Margolin, Levon Kikinian, Ido Leichter, Andrey Dolgin, Dmitry Lifshitz, Sho Nar-
iai, Ben Roberts, Asrul Sani, Gareth Evans, Grethe Casson, Leesa Wockner, Nick Miller,
and Chung Chan. We are especially indebted to Thomas Taimre and Zdravko Botev, who
conscientiously worked through the whole manuscript, tried and solved all the exercises
and provided exceptional feedback. This book was supported by the Australian Research
Council under Grants DP056631 and DP055895.
RYR.
DPK
xvii
This Page Intentionally Left Blank
CHAPTER
1
PRELIMINARIES
1.1
RANDOM
EXPERIMENTS
The basic notion in probability theory is that of a
random experiment:
an experiment

whose outcome cannot be determined in advance. The most fundamental example is the
experiment where a fair coin is tossed a number of times. For simplicity suppose that the
coin
is
tossed three times. The
sample space,
denoted
0,
is
the set of all possible outcomes
of the experiment. In this case
R
has eight possible outcomes:
R
=
(HHH, HHT, HTH, HTT,THH,THT,TTH,TTT),
where, for example, HTH means that the first toss is heads, the second tails, and the third
heads.
Subsets of the sample space are called
events.
For example, the event
A
that the third
toss is heads
is
A
=
{HHH, HTH,THH,TTH}.
We say that event
A

occurs
if the outcome of the experiment is one of the elements in
A.
Since events are sets, we can apply the usual set operations to them. For example, the event
A
U
B,
called the
union
of
A
and
B,
is the event that
A
or
B
or both occur, and the event
A
n
B,
called the
intersection
of
A
and
B,
is the event that
A
and

B
both occur. Similar
notation holds for unions and intersections of more than two events. The event
A',
called
the
complement
of
A,
is the event that
A
does not occur. Two events
A
and
B
that have
no outcomes in common, that is, their intersection is empty, are called
disjoint
events. The
main step is
to
specify the probability of each event.
Simulation and
the
Monte Carlo
Method,
Second Edition.
By
R.Y. Rubinstein
and

D.
P.
Kroese
1
Copyright
@
2007
John Wiley
&
Sons, Inc.
2
PRELIMINARIES
Definition
1.1.1
(Probability)
AprobabilifyP is a rule that assigns a number0
6
P(A)
6
1
to each event
A,
such that
P(R)
=
1,
and such that for any sequence
A1
,
A2,

.
.
.
of disjoint
events
1
1
Equation (1.1) is referred to as the
sum
rule
of probability. It states that if an event can
happen in a number of different ways, but not simultaneously, the probability
of
that event
is
simply the sum of the probabilities of the comprising events.
For
the fair coin toss experiment the probability
of
any event is easily given. Namely,
because the coin is fair, each of the eight possible outcomes
is
equally likely,
so
that
P({
HHH})
=
. . .
=

P({
TTT})
=
1/8.
Since any event
A
is the union of the “elemen-
tary” events
{
HHH},
.
.
.
,
{TTT},
the sum rule implies that
I
Al
IRI
P(A)
=
-
,
where
\A/
denotes the number of outcomes in
A
and
IRI
=

8.
More generally, if a random
experiment has finitely many and equally likely outcomes, the probability
is
always
of
the
form
(1.2).
In that case the calculation of probabilities reduces to counting.
1.2
CONDITIONAL PROBABILITY AND INDEPENDENCE
How do probabilities change when we know that some event
B
c
52
has occurred? Given
that the outcome lies in
€3,
the event
A
will occur if and only if
A
fl
B
occurs, and the
relative chance of
A
occumng
is

therefore
P(A
n
B)/P(B).
This leads to the definition of
the conditionalprobability of
A
given
B:
P(A
n
B)
P(B)
.
P(A
I
B)
=
For example, suppose we toss a fair coin three times. Let
B
be the event that the total
number of heads is two. The conditional probability of the event
A
that the first toss is
heads, given that
B
occurs, is
(2/8)/(3/8)
=
2/3.

Rewriting (1.3) and interchanging the role of
A
and
B
gives the relation
P(A
n
B)
=
P(A) P(B
I
A).
This can be generalized easily to the product
rule
of probability, which
states that for any sequence of events
Al, A2
.
.
.
,
A,,
P(A1

An)
=
P(Ai)P(Az IAi)P(A3 IAiAz) P(A, IA1 An_1),
(1.4)
using the abbreviation
AlA2

.
.

Ak
=
Al
n
A2
n
. .
.
fl
A,+.
Suppose
B1,
B2,.
. .
B,
is apartition of
R.
That is,
B1,
B2,.
. .
,
B,
are disjoint and
their union is
R.
Then, by the sum rule,

P(A)
=
cy=l P(A
n
Bi)
and hence, by the
definition of conditional probability, we have the
law
of
totalprobabilify:
n
P(A)
=
CP(AI
Bi)P(Bi).
i=l
Combining this with the definition of conditional probability gives
Bayes

rule:
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
3
Independence is
of
crucial importance in probability and statistics. Loosely speaking,
it models the lack of information between events. Two events
A
and
B
are said to be

independent
if the knowledge that
B
has occurred does not change the probability that
A
occurs. That is,
A,
B
independent
H
P(A
I
B)
=
P(A).
Since
P(A
I
B)
=
P(A
n
B)/P(
B),
an alternative definition of independence is
A,
B
independent
H
P(A

n
B)
=
P(A)
P(B)
.
This definition covers the case where
B
=
0
(empty set). We can extend this definition to
arbitrarily many events.
Definition
1.2.1
(Independence)
The events
Al, A2,
.
.
.
,
are said to be
independent
if for
any
k
and any choice
of
distinct indices
il,

. .
.
,
ik.
P(A,,
nA,,n nA,,)=P(A,,)P(A,,).~.P(A,,)
.
Remark
1.2.1
In most cases, independence of events is a model assumption. That is, we
assume that there exists a
P
such that certain events are independent.
EXAMPLE1.l
We toss a biased coin
n
times. Let
p
be the probability of heads (for a fair coin
p
=
1/2).
Let
Ai
denote the event that the i-th toss yields heads,
i
=
1,.
.
.

,
n.
Then
P
should be such that the events
Al,
. . .
,
A,
are independent, and
P(Ai)
=
p
for all
i.
These two rules completely specify
P.
For example, the probability that the first
k
throws are heads and the last
n
-
k
are tails is
P(A1
. . .
AkAi+l
.
.
.

A:L)
=
P(A1).
. .
P(Ak)
P(AE+l).
.
.
P(Ak)
=
pk(1
-
p)"-k.
1.3
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Specifying a model for a random experiment via a complete description of
0
and
P
may
not always be convenient or necessary. In practice we are only interested in various ob-
servations (that is, numerical measurements) in the experiment. We incorporate these into
our modeling process via the introduction of
random variables,
usually denoted by capital
letters from the last part of the alphabet, e.g.,
X,
XI,
X2,.
. .

,
Y,
2.
EXAMPLE1.2
We toss a biased coin
n
times, with
p
the probability of heads. Suppose we are
interested only in the number of heads, say
X.
Note that
X
can take any
of
the values
in
{
0,
1,
. . .
,
n}.
The
probability distribution
of
X
is given by the
binomial formula
Namely, by Example

1.1,
each elementary event
{HTH
. .
T}
with exactly
k
heads
and
n
-
k
tails has probability
pk(l
-
P)"-~.
and there are
(i)
such events.
4
PRELIMINARIES
f(m)
The probability distribution of a general random variable
X
-
identifying such proba-
bilities as
P(X
=
x),

P(a
6
X
<
b),
and
so
on
-
is completely specified by the
cumulative
distribution function
(cdf), defined by
F(x)
=
P(X
6
z), z
E
R
A
random variable
X
is said to have a
discrete
distribution if, for some finite or countable
set of values
x1,x2,.
.
.,

P(X
=
xi)
>
0,
i
=
1,2,.
.
.
and
xi
P(X
=
xi)
=
1.
The
function
f(x)
=
P(X
=
x)
is called
theprobability
mass function
(prnf)
of
X

-
but see
Remark 1.3.1.
13
5
7
9
11
s
s s
s
s
%I
4
EXAMPLE1.3
Toss two fair dice and let
A4
be the largest face value showing. The pmf of
A4
is
given by
For
example, to get
M
=
3,
either
(1,3), (2,3), (3,3), (3,2),
or
(3,l)

has to be
thrown, each of which happens with probability
1/36.
A
random variable
X
is said to have a
continuous
distribution if there exists a positive
function
f
with total integral
1,
such that for all
a,
b
b
P(a
6
X
6
b)
=
f(u) du
The function
f
is called the
probability densityfunction
(pdf)
of

X.
Note that in the
continuous case the cdf is given by
F(x)
=
P(X
6
x)
=
1:
f(u)
du,
Lx+h
and
f
is
the derivative of
F.
We can interpret
f(x)
as the probability “density” at
X
=
z
in the sense that
f(.)
du
=:
hf(x)
.

P(x
<
X
<
x+
h)
=
Remark
1.3.1
(Probability Density)
Note that we have deliberately used the
same
sym-
bol,
f,
for both pmf and pdf. This is because the pmf and pdf play very similar roles and
can, in more advanced probability theory, both be viewed as particular instances of the
general notion of
probability density.
To
stress this viewpoint, we will call
f
in
both
the
discrete and continuous case the pdf or (probability) density (function).

×