Tải bản đầy đủ (.pdf) (63 trang)

probability theory with economic applications

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (662.54 KB, 63 trang )

Chapter B
P r o b a b ility T h e o r y v ia the L e bes q u e
Integral
The prim ary objective of this c hapter is to in troduce the basic prob ab ility model from
the measure theoretic point of view. Consequently, we first start with discussing the
idea behind the form al notion of a prob ability space, and provide a fairly introduc-
tory discussion of finite meas ure theory. A good part of this discussion is likely to
be new for the economics studen t, so our pace is quite leisurely. In particular, w e
discussalgebrasandσ-algebras in detail, pay due atte ntion t o B ore l σ-algebras, and
prove s ever al elem enta ry properties of pr ob ab ility mea su re s. Mor eover, we outline
the constructions of some useful probabilit y spaces, including those that are induced
by di stribution functions. As usual, these constructions are ac hiev ed by inv oking the
fundam ental extension theorem of Carathéodory. We omit the proof of the existence
part of this theorem, but prove its uniqueness part as an application o f the Sierpinski
Class Lemm a. We then introduce the notion of a random variable, and discuss the
notion of Borel meas ura bility at some length.
The high poin t of the c h ap ter is the in troduction of the Lebesgue integration the-
ory within the context of finite measure spaces. In fact, we almost exclusiv ely w ork
with probab ility measures, so the Lebesgue integral for the present exposition is none
other than the so-called expectation functional. O ur treatmen t is again leisurely. In
particular, we introduce the fundamen tal convergence theorems for the Lebesgue in-
tegral b y means of a step-by-step approac h. For instance, th e Monotone Conv ergence
Theorem is given in four different formulations. First, we pro ve it fo r a sequence
of nonnegative random variab les the point wise limit of which is real-valued. Th en
w e drop the nonnegativity assumption from the statement of the theorem, and then
reintroduce it but this time work with sequences that con verge almost surely to an
extended real-valued function. O ur fourth form ulation states the result in full gener-
ality. We also study other important properties of the expectation functional, suc h
as its linearity, th e c h ange of variables form ula , and Jensen’s Inequality. The c h ap te r
concludes with a brief introduction to the normed linear space of in tegrable random
variables, and other related spaces.


There is, of course, no shortage of truly excellent textbooks on probability theory.
In particular, the classic treatmen ts of Billingsley (1986), Durrett (1991), Shiryaev
(1996) and Chung (2001) ha ve a scope far more comprehensive than ours. The proofs
thatweomithereandinthefollowingchapterscanberecoveredfromanyoneofthese
books. A more recent reference, which the presen t author finds most commendable,
is Frist edt and Gr ay (1997).
1
1 Event Spaces
The most fund amental notion of probability theory is tha t of a proba bility measure.
Rough ly speaking, a probab ility measure tells us the likelihood of observing any
conceiva ble event in an experimen t the outcome of whic h is uncertain. To formally
introduce t his c oncept, howev er, we need to model t he e lusive term “conceivable
event” in this description — hence the next subsection.
1.1 σ-Algebras
Dhilqlwlrq. Given any nonempty set X,letA and Σ be nonempty subsets of 2
X
.
The class A is called an algeb r a on X if
(i) X\A ∈ A for all A ∈ A;and
(ii) A ∪ B ∈ A for all A, B ∈ A.
The collection Σ is called a σ-algebra on X if it satisfies (i) and
(iii)


A
i
∈ Σ whenever A
i
∈ Σ for each i =1,2,
Any element of Σ is called a Σ-measurable set in X. If Σ is a σ-algebra on X, we

refer to t he pair (X, Σ) as a measur ab le spa ce .
In words, an algebra on X is a nonempty collection of subsets of X that is closed
under complementation and ta kin g pairwise (and thus finite) unions. It is rea dily
verified that both ∅ and X belong to any algebra A on X, andthatanalgebrais
closed under taking pairwise (and thus finite) intersections. (To prove the first claim,
obse rv e that, since A is nonempt y, there exists an A ⊆ X in A, and hence X\A
belongs to X. Thus X = A ∪ (X\A) ∈ A.) Moreover, a collection Σ of subsets of X
is a σ-algebra,ifitisanalgebraandisclosedundertakingcountable unions. By the
de Mo rg an Law, this also implies that Σ is closed under taking countable (finite or
infinite) intersections: If C is a nonempty coun table subset of Σ, then

C ∈ Σ. It is
useful to note that there is no difference between an algebra and a σ-algebra when
the ground set X under consideration is finite.
1
Before c on s ide rin g some exam p le s, let us provide a quick in te r p retation of the
form al mode l at hand. Given a nonempty set X and a σ-algebra Σ on X,wethink
of X as the set of all possible outcomes that may result in an experim ent, the so-
called sample sp ace, and view a ny member of Σ (and only suc h a subset of X)
as an “ev ent” that may take place in the experiment. To illustrate, consider the
experim ent of rolling an ordinary die once. It is natural to tak e X := {1, . , 6} as
the s ample space of this experimen t. But what is a n “event” h ere? The answer
depends on the actual scenario that one wishes to model. If it is possible to discern
1
These are relatively easy claims, but it is probably a good idea to warm up by proving them.
In particular, how do you kno w that a σ-algebra is actually an algebra?
2
the differences between all sub sets of X, then we would take 2
X
as the σ-algebra of

the m odel, thereby deeming an y subset of X as a conceivabl e ev ent (e.g. {1, 2, 3}
w ould be the ev en t that “a num ber strictly less than 4 comes up”). O n the other
hand, the situatio n we w ish to m odel ma y c all for a different t ype of an ev ent space.
For exa mple, if w e wan t to model the beliefs of a person who will be told after the
experiment only whetherornot1hascomeup,{1, 2, 3} w ould n ot really be deemed
as a conceivable event. (If the outcome is 2, one would lik e to sa y that {1, 2, 3}
has occurred, but given her informational limitation, our individual has no wa y of
concluding this.) Ind e ed , this person may have an assessm ent only of the likelihood
of 1 coming up in the experiment, so a non trivial “event” for her is either “1 comes
up” or “1 does n’t come up.” Consequ e ntly, to model the beliefs of this individual, it
mak es more sen se to ch oose a σ-algebra lik e {∅,X,{1}, {2, , 6}}. An “even t” in this
model w ould then be one of the four members of this particular collection of sets.
In practice, then, there is some latitude in choosing a p articular class Σ of events to
endow a sample space X with. Ho wev er, we cannot do this in a completely arbitrary
way . If A is an event, then w e need to be able to talk about this event not occurring,
that is, to deem the set X\A also as an event. This is guaranteed b y condition
(i) above. Similarly, we wish to be able to ta lk about at l e ast one of c ountably
many events occurring, and this is the rationa le beh ind con d ition (iii) above. In
addition, conditions (i) and (iii) force us to view “countably many events occurring
simultaneously” as an e vent as well. To giv e an exam ple, consider the experiment
of rolling an ordinary die arbitrarily many tim es. Clearly, we would take X = N

as the s ample space o f t his experiment. Suppose next t hat we would lik e to be
able talk about the situa tion that in the ith roll of t he die, num ber 2 com es up.
Then we w ould choose a σ-algebra that wou ld certain ly contain all sets of the form
A
i
:= {(ω
m
) ∈ N


: ω
i
=2}. This σ-alg ebra must contain m any other types of
subsets of X. For instance, the situ ation that “in neither the first nor the second roll
2 turns up” must formally be an event, because {(ω
m
) ∈ N

: ω
1
, ω
2
=2} equals
(X\A
1
) ∩ (X\A
2
). Similarly, since each A
i
is deemed as an “event,” a σ-algebra
main tains that


A
i
(“2 comes up at least once through the rolls”) and


A

i
(“eac h roll results in 2 coming up”) are considered as “events” in our model.
In short, g iven a σ-algebra Σ on X, the int uit i ve concept of an “event” is formalized
as any Σ-measurab le set. That is, and ma rk this, we sa y tha t A is an even t if and
only if A ∈ Σ, and for this reason a σ-algebra on X is often referred to as an even t
space on X. One ma y define man y different ev ent spaces on a given sample space,
so what an “ev ent” really is depends on the model one c hooses to work with.
E{dpsoh 1. [1] 2
X
and {∅,X} are σ-algebras on an y nonempt y set X. The collection
2
X
corresponds to the finest event space allo w ing each subset of X to be deemed as
an “ev ent.”
2
By con trast, {∅,X} is the coarsest possible even t space that allows one
to perceive of only two ty pes of event s, “nothing happens” and “something happens.”
2
I have already told you that certain subsets of X may not be deemed as “e vents” for an o bserver
3
[2] Let X := {a, b, c, d}. None of the collections {∅}, {X}, {∅,X,{a}} and {∅,X,{a},
{b, c, d}, {b}, {a, c, d}} qualify as an algebra on X. On the other hand, each of the col-
lections {∅,X}, {∅,X,{a}, {b, c, d}} and {{∅,X,{a}, {b, c, d}, {b}, {a, c, d}, {a, b}, {c, d}}
is an algebra on X.
[3] If X is finite and A is an algebra on X, then A is a σ-algebra. So, as noted
earlier, the distinction between the notions of an algebra and a σ-algebra disappear
inthecaseoffinite sample spaces.
[4] Let us agree to call an interval right-semiclosed if i t has t he form (a, b]
with −∞ ≤ a ≤ b<∞, or of the form (a, ∞) with −∞ ≤ a. The class of all
righ t semic lose d in te rvals is obviously not an algebra on R. But the set A of all

finite unions of right-semic los ed inter vals — called the algebra induced b y righ t-
semiclosed intervals —isanalgebraonR.Infact,A is the smallest algebra that
contains all right-sem iclosed intervals . I t is not a σ-algebra. (Proofs?)
[5] A := {S ⊆ N :min{|S| , |N\S|} < ∞} is an algebra on N but it is not a
σ-algebra. Indeed, {i} ∈ A for each odd i ∈ N, but {1, 3, } /∈ A. 
Exercise 1. Let X be a metric space, and let A
1
be the class that consists of all
open subsets of
X, A
2
the class of all closed subsets of X, and A
3
:= A
1
∪ A
2
.
Determine if any of these classes is an algebra or a σ-algebra.
Exercise 2.
Let X beanynonemptyset,andΩ aclassofσ-algebras on X.
(a) Show that

Ω is a σ-algebra on X.
(b) Give an e xample to sho w that

Ω need not be an algebra even if Ω is finite.
Exercise 3. Define
A :=


A ⊆ N :(
1
n
|A ∩ {1, , n}|) is convergent

.
(Note. For any A ∈ A,thenumberlim
1
n
|A ∩ {1, , n}| is called the asymptotic
density of
A.) True or false: A is an algebra but not a σ-algebra.

Exercise 4. Show that a σ-algebra cannot be countably infinite.
In practice it is not uncommon that we have a pretty good idea about the kinds
of sets we wish to consid er as ev ents, but we have difficulty in terms o f finding a
“good” σ-algebra for t he problem because the collection of s ets we ha ve at h and
does not constitute a σ-algebra. T he resolution is usually to extend the collection
with limited information, so 2
X
may not always be the relev ant event space to endow X with. (I
will talk about this issue at greater length when studying the notion of conditional probability in
Chapter F.) Apart from this, there are also technical reasons for why one cannot alway s view 2
X
as
a useful event space. Roughly speaking, when X is an infinite set, 2
X
may be “too large” of a set
for one to be able to assign probability numbers to each element of 2
X

in a nontrivial way. (More
on this in Section 3.5.)
4
of sets which we are interested in to a σ-algebra in a minimal wa y. (We consider a
minimal extension because we wish to depart from our “in teresting” sets as little as
possib le. Other wis e taking 2
X
as the event space would trivially solve t he problem of
extension.) This idea leads us to the follo wing fundamen tal concept.
Dhilqlwlrq. Let X be a nonempt y set and A a nonempt y subclass of 2
X
.The
sma lle st σ-algebra on X that contain s A (in the sense t hat this σ-algebra is included
in any other σ-algebra that contains A ) is called the σ-algebra gene rated by A,
and is denoted as σ(A).
For example, if X := {a, b, c}, then σ({∅})=σ({X})={∅,X}, σ({∅,X,{a}})=
{∅,X,{a}, {b, c}}, and σ({∅,X,{a}, {b}})=2
X
. Of course, we ha v e Σ = σ(Σ) for
an y σ-algebra Σ on any non emp ty set.
Does any nonempt y class of sets generate a σ-algebra? The answer does not follow
readily from the definition above, because i t is not se lf-evid ent i f we can always
find a s mallest σ-algeb ra that extend s any given nonempty class of se ts. Our first
proposition, howev e r, sh ows that we can actually do this, so t h ere is r eally no e xis te nce
problem regar ding genera ted σ-alge bras.
3
Pursrvlwlrq 1. Let X be a nonempty set and A a nonempty subclass of 2
X
.There
exists a unique smallest σ-algebra that includes A,soσ(A)iswell-defined. We ha v e

σ(A)=

{Σ : Σ is a σ-algebr a and A ⊆ Σ} .
Exercise 5.
H
Prove Proposition 1.
Exercise 6. Does the
σ-algebra generated by the algebra of Example 1.[4] include
all open sets in
R?
Exercise 7.
H
Compute σ(A), where A := {S ⊆ R :min{|S| , |R\S|} < ∞}.
1.2 Borel σ-alge br a s
Let X be an y metric space, and let O
X
stand for the set of all open sets in X. The
mem bers of O
X
are of obvious importance, but unfortunately O
X
need not ev en be
an algebra. In metric spaces, then, it is natural to consider the σ-algeb r a generated
by O
X
. This σ-algebra is called the Borel σ-algebra on X, and its members are
referred to as Borel sets (or in probabilistic jargon, Borel e vents). Throughout
this text, we denote the Borel σ-algebra on a m etric space X by B(X). By definition,
therefore, w e hav e B(X)=σ(O
X

).
3
As you will soon painfully find out, however, the explicit characterization of a generated σ-
algebra can be a seriously elusive problem. Just to get a feeling for the difficulties that one ma y
encoun ter in this regard, try t o “compute” the σ-algebra σ ({{a} : a ∈ Q}) on R.
5
Notation. We write B[a, b] for B([a, b]), and B(a, b] for B((a, b]),where−∞ <a<
b<∞.
E{dpsoh 2. By definition, B(R)=σ(O
R
), but one does no t actually need all open
sets in R for generat ing B(R). For instance, what if we used instead the class of all
open in tervals, call it A
1
, as a pr imitive collection and attemp t to find σ(A
1
)? This
w ould lead us exactly to the σ-algeb ra σ(O
R
)! To see this, observe first that σ(O
R
)
is ob viou s ly a σ-algebra that contains A
1
so that we clearly have σ(A
1
) ⊆ σ(O
R
).
(Recall the definition of σ(A

1
)!) To establish the converse containment, remember
that every open set in R can be written as the union of countably m an y open intervals.
(Right?) Thus, w e ha ve O
R
⊆ σ(A
1
). (Why exactly?) But then, since σ(O
R
) is the
sma lle st σ-algebra th at con tain s O
R
,andσ(A
1
) is of cou rs e a σ-algebra, w e must
have σ(O
R
) ⊆ σ(A
1
). So, w e conclude: σ(O
R
)=σ(A
1
).
In fact, there are all sorts of other w ays of generating the Borel σ-algebra on R.
For instance, consider the follo wing classes:
A
2
:= th e set of all closed intervals
A

3
:= th e set of all closed sets in R
A
4
:= th e set of all intervals of the form (a, b]
A
5
:= th e set of all intervals of the form (−∞,a]
A
6
:= th e set of all intervals of the form (−∞,a).
It is easy to show that all of these collection s genera te the same σ-algebra:
B(R):=σ(O
R
)=σ(A
1
)=···= σ(A
6
). (1)
We h ave already sho wed that σ(O
R
)=σ(A
1
). On the other hand, f or any closed
interval [a, b], we have [a, b]=



a −
1

i
,b+
1
i

∈ σ(O
R
), so w e ha v e A
2
⊆ σ(O
R
)
so that σ(A
2
) ⊆ σ(O
R
). Conversely, for any open interval (a, b), we have (a, b)=



a +
1
i
,b−
1
i

∈ σ(A
2
). So A

1
⊆ σ(A
2
), and it follo ws that σ(A
1
) ⊆ σ(A
2
). The
rest of the claims in (1) can be pro ved sim ilarly. 
This example shows that different collections of sets migh t well generate the same
σ-algebra. In fact, it is generally true that the Borel σ-algebra on a metric space is
also generated by the class of all closed subsets of this space. That is, for any metric
space X,
B(X):=σ({O ⊆ X : O is open})=σ({S ⊆ X : S is closed}).
(Verify!) T h e fo llowin g exerc ises play on this them e a bit more.
Exercise 8. Show that there is a countable subset A of 2
R
such that σ(A)=B(R).
6
Exercise 9. For any n ∈ N, let
A
1
:= {X
n
J
i
: J
i
is a bounded open interval,i=1, ,n} ,
A

2
:= {X
n
J
i
: J
i
is a bounded right-closed in terval,i=1, , n} ,
A
3
:= {X
n
J
i
: J
i
is a bounded closed interval,i=1, , n} .
Show that we have B(R
n
)=σ(A
1
)=σ(A
2
)=σ(A
3
).
Exercise 10. Prove: If X is a separable metric space, then B(X)=σ({N
ε,X
(x):
x ∈ X and ε > 0}).


Exercise 11. For any m ∈ N, and (t
i
,B
i
) ∈ [0, 1] ×B[0, 1],i=1, , m, define
A(t
1
, , t
m
,B
1
, , B
m
):={f ∈ C[0, 1] : f(t
i
) ∈ B
i
,i=1, , m},
and
A := {A(t
1
, , t
m
,B
1
, ,B
m
):m ∈ N and (t
i

,B
i
) ∈ [0, 1]×B[0, 1],i =1, , m}.
Prove that σ(A)=B(C[0, 1]).
The fact that there is ofte n no wa y of giving an explicit description of a generated
σ-algebra is a source of discom f ort. Ne verth eless , one c an usually say quite a b it
about σ(A) ev en without ha ving a specific formula th at tells us h ow its members are
deriv ed from those of A. Indeed, in all of the examples (exercises) considered abo ve,
w e (you) have “ computed” σ(A) by using the definition of the “generated σ-algebra”
directly. The fo llowing ex ercise provides anoth er illustration of this.
Exercise 12.
H
Let X be a metric space, and Y ametricsubspaceofX. Prov e that
B(Y )={B ∩ Y : B ∈ B(X)}.
Th e obse r vation noted in the previous exe r cis e is quite useful. For instance , it
implies th at the knowledge of B(R) is sufficien t to d escribe the c lass of all Borel
subsets of [0, 1];wehaveB[0, 1] = {B ∩ [0, 1] : B ∈ B(R)}. Similarly, B(R
n
+
)=
{B ∩ R
n
+
: B ∈ B(R
n
)}. We conclude with a less immediate corollary.
Exercise 13.
H
For any S ∈ B[0, 1] and α ∈ R, show that (S + α) ∩ [0, 1] ∈ B[0, 1].
2 Probabilit y S paces

We are now ready to introduce the con cep t of pro ba bility measure.
4
4
The ori gins of probability theory goes back to the famous exchange between Blaise Pascal and
Pierre Fermat t hat started in 1654. While Pascal and Ferma t were mostly concerned with gambling
7
Dhilqlwlrq. Let (X, Σ) be a measurable space. A function p : Σ → R is said to be
σ-additive if
p



i=1
A
i

=


i=1
p(A
i
)
for a ny (A
m
) ∈ Σ

with A
i
∩ A

j
= ∅ for each i = j. Any σ-additive function
p : Σ → R
+
with p(∅)=0is called a measure on Σ (or on X if Σ is clear from the
context), and we refer to the list (X, Σ,p) as a measure space.Ifp(X) < ∞, then p
is called a finite measure, and the list (X, Σ,p) is referred to as a finite measure
space. In particular, if p(X)=1holds, then p is said to be a probability measure ,
andinthiscase,(X, Σ,p) is called a pro bability spa ce.
Dhilqlwlrq. Gi ven a metric space X,anymeasurep on B(X) is called a Borel
measure on X, andinthiscase(X, B(X),p) is referred to as a Borel space.If,in
addition, p is a probability measure, then (X, B(X),p) is called a Borel probabilit y
space.
Notation. Throughout this text, the set of all Borel probabilit y measures on a
metric space X is denoted a s P(X).
We think of a probability mea sure p as a function that assigns to each event (that
is, to each member of the σ-algeb ra that p is defined on) a number between 0 and
1. This n umber corresponds to the likelihood of the occurrence of that event. The
map p is σ-additive in the sense that it is additiv e with respect to countably many
pairwise disjoint ev ents. This additivity property, which is the heart and soul of
measure theory, entails several other useful properties for probability measures. For
instance, it implies that any proba b ility measure is finitely additive,thatis,
p

m

i=1
A
i


=
m

i=1
p(A
i
)
for any finite class {A
1
, , A
m
} of pairw ise disjoin t events. (To verify this, use σ-
additivity an d the fact that {A
1
, , A
m
, ∅, ∅, } is a c ollection of pairwise disjoint
ev ents.) In turn, this implies that, for an y even t s A and B with A ⊆ B, we have
p(B\A)=p(B)− p(A),becausep(B)=p(A ∪ (B\A)) = p(A)+p(B\A). Some other
useful properties of probability measures are giv en next.
t ype problems, the importance and applicability of the general topic was shortly understood, a nd the
subject was dev eloped by many mathematicians, including Jakob Bernoulli, Abraham de Moivre,
and Pierre Laplace. Despite the host of work that took place in the 18th and 19th centuries,
howev er, a universally agreed definition of “probabilit y” did not appear until 1933. A t this date
Andrei Kolmogorov introduced the (axiomatic) definition that we are about to present, and set the
theory on rigorous grounds, m uch the same w ay Euclid has given an axiomatic basis for planar
geometry.
8
Exercise 14. Let (X, Σ,p) be a probability space, m ∈ N, and let A, B, A
i

∈ Σ,
i =1, , m. Pro ve:
(a)If
A ⊆ B, then p(A) ≤ p(B),
(b) p(X\A)=1− p(A),
(c) p(

m
A
i
) ≤

m
p(A
i
),
(d) (Bonferroni’s Inequality) p(

m
A
i
) ≥

m
p(A
i
) − (m − 1).
Warnin g. One is often tempted to conclude from Exercise 14.(a) that an y subset
of an event of proba bility zero occurs with probability zero. Th ere is a catc h here .
How do yo u know that this subset is assigned a probability at all? For instan ce, let

X := {a, b, c}, Σ := {∅,X,{a, b }, {c}} and let p be th e probability measure on Σ that
satisfies p({c})=1. Here, while p({a, b})=0, it is not true that p({a})=0since p is
not even d efined at {a}. This probab ility space maintains that {a} is not an event.
Note. Those probability spaces for which any subset of an event o f probability zero
is an event (and hence occurs with probability zero) are called complete.Withthe
excep tion o f a few (optional) rem arks, however, this notion w ill not play an im portant
role in the present exposition.
Exercise 15. (The Exclusion-Inclusion Formula) Let (X, Σ,p) be a probability
space,
m ∈ N, and A
1
, , A
m
∈ Σ. Where N
t
:= {(i
1
, ,i
t
) ∈ {1, , m}
t
:
i
1
< ···<i
t
},t=1, , m, show that
p

m


i=1
A
i

=

i∈N
1
p(A
i
) −

(i,j)∈N
2
p(A
i
∩ A
j
)+

(i,j,k)∈N
3
p(A
i
∩ A
j
∩ A
k
) −

···+(−1)
m−1
p

m

i=1
A
i

.
The follo w ing are simple but surprisingly useful observations.
Pursrvlwlrq 2. Let (X, Σ,p) be a probability space, and let (A
m
) ∈ Σ

. If A
1

A
2
⊆ ···(in which case w e say that (A
m
) is an increasin g seq uence),then
lim p(A
m
)=p




i=1
A
i

.
On the other hand, if A
1
⊇ A
2
⊇ ··· (in which case we say that (A
m
) is a de creasing
sequence),then
lim p(A
m
)=p



i=1
A
i

.
Proof. Let (A
m
) ∈ Σ

be an increasing sequence. Set B
1

:= A
1
and B
i
:=
A
i
\A
i−1
,i=2, , and note that B
i
∈ Σ for each i and


A
i
=


B
i
. But
9
B
i
∩ B
j
= ∅ for any i = j, so, by σ-additivit y,
p




i=1
A
i

= p



i=1
B
i

=


i=1
p(B
i
) = lim
m→∞
m

i=1
p(B
i
) = lim
m→∞
p


m

i=1
B
i

= lim
m→∞
p (A
m
) .
The proof of the second claim is left as an exercise. 
As an immed iate application of this result and Exercise 14.(c), w e obtain a ba sic
inequality of probability theory:
Boole’s Inequalit y. For any prob ability space (X, Σ,p),
p



i=1
A
i




i=1
p(A
i

) for any (A
m
) ∈ Σ

.
Proof. Use Proposition 2 and Exercise 14.(c). 
Exercise 16.
H
Let (X, Σ,p) be a probability space. Show that if (A
m
) ∈ Σ

satisfies p(A
i
∩ A
j
)=0for every i = j, then
p



i=1
A
i

=


i=1
p(A

i
).
Exercise 17. Given an y probability space (X, Σ,p),showthat
p



i=1
A
i

− p



i=1
B
i




i=1
(p(A
i
) − p(B
i
))
for all (A
m

), (B
m
) ∈ Σ

with B
m
⊆ A
m
,m=1, 2,
Exercise 18.
H
Let (X, B(X),p) be a Borel probabilit y space, and let O
X
and C
X
denote the class of all open and closed subsets of X, respectively.
(a)Provethat
sup{p(T ) ∈ C
X
: T ⊆ S} = p(S)=inf{p(O) ∈ O
X
: S ⊆ O}
for any S ∈ B(X).
(b)Showthat,ifX is σ-compact, that is, it can be written as a union of countably
many compact subsets of itself, then
p(S)=sup{p(K):Kis a compact subset of Xwith K ⊆ S}
for any S ∈ B(X). (Note. Suc h a Borel probability measure is said to be regular.)
10
The observa tions noted in Proposition 2 are often referred to as the con tin uity
(from below and above, r espectively) properties of a probabilit y measure.

5
As the
proof of this result makes transparent, these properties are derive d directly from the
σ-additivity of a p robability m easure. In dee d , any finite measure on Σ satisfies these
pro pert i es.
Warnin g. The first claim of Propo sition 2 is valid even when p is an infinite measure.
(The proof goes through verbatim.) In this ca se, t he validit y of the s ec on d claim,
howeve r, requ ires the additional ass umption that p(A
k
) < ∞ for s ome k. To see
the need for this additional hypothesis, consider the measure space (N, 2
N
,q), where
q(S):=|S| . (q is called the counting measure.) Here, if A
m
:= {m, m +1, } for
each m ∈ N, then


A
i
= ∅ and y et q(A
m
)=∞ for eac h m.
It is useful to observ e that a partial conv erse of this observation is also true. To
mak e this precise, let us agree to refer to a function definedonanalgebraA as σ-
additiv e on A if it is additive w ith respect to an y coun t a b ly many pairwise disjoint
events the union of which belongs to A. (Notice that this definition conforms with
the way we used the t erm “σ-additive” for a measure so far.) It turns ou t that finite
additivity and con tinu ity of a set function imply its σ-additivit y.

Pursrvlwlrq 3. Let A be an algebra (on some nonempty set),andq : A → R
+
a finitely additive function such that, for any decreasing sequence (C
m
) ∈ A

with


C
i
= ∅, we have lim q(C
m
)=0. Then, q is σ-additiv e on A.
Proof. Take any class {A
m
∈ A : m =1, 2, } suc h that A
i
∩A
j
= ∅ for each i = j,
and A :=


A
i
∈ A. Let B
m
:=


m
A
i
, and observe that q(A)=q(A\B
m
)+q(B
m
)
for each m, by finite add itiv ity o f q.But(A\B
m
) is a de cr eas in g s e qu e nc e in A with


A\B
i
= ∅ so that, by hypothesis, lim q(A\B
m
)=0. Therefo r e, letting m →∞
in the equation q(A)=q(A\B
m
)+q(B
m
) and using the finite additivity of q again,
we get
q(A) = lim q(B
m
) = lim q

m


i=1
A
i

=lim
m

i=1
q (A
i
)=


i=1
q (A
i
)
and hence the proposition. 
In wo rds, a finitely additive set fun ction which is continuo us from above at the
empty set is σ-additive. As you can n ow sho w easily, an analogous result can also be
proved in terms of con tinuity from below.
5
It is custom ary to write A
m



A
i
if (A

m
) is an increasing sequence of sets, and A
m



A
i
if it is a decreasing one. Thus Proposition 2 states that A
m



A
i
implies p(A
m
)  p(


A
i
),
and similarly for decreasing sequences of events. This is the motivation behind the term “continuity
of a probability measure.”
11
Exercise 19. Let A := {S ⊆ N :min{|S| , |N\S|} < ∞},anddefine p : A → [0, 1]
as
p(S):=


1, |N\S| < ∞
0, |S| < ∞
.
Show that p is finitely additive, but not σ-additive.
Let us conclude this section with a brief summ ary. At this poin t, yo u should
be somewhat co mfortab le with the notion of prob ab ilit y space (X, Σ,p). In such a
space, X stands for the sample space of the experiment being modeled, the set of all
outcomes,sotospeak. Theσ-algebra Σ, on the othe r hand, tells us which subsets
of X can be discerned in the experimen t, that is, of which subsets of X we can talk
about the lik elihood of occurring or not occu rr ing. (But recall that things are not
comple tely arb itrary ; by definition of a σ-algebra, there is quite a bit of consiste n cy in
what is and is n ot deemed as an event.) Finally, the probability measu re p quan tifies
the lik elihood of the m embers of Σ. Things hang tigh t together b y the propert y of
σ-additivity; th e likelih ood of the union of a co untably many pairwise d isjoint eve nts
is simply the sum of the individual probabilities of each of these events.
Andnow,it’stimetomakethingsabitmoreconcrete.
3 Con str ucting Pr obability Sp a c e s
3.1 M otivating Exam ple s
Our first example provides the formal descrip tion of the canonical probability spac e
whose sample space is finite. This space corresponds to a special case of the more
general formulation of what is said to be a simple probability space .
E{dpsoh 3. Let X be any metric space. The support of an y function f ∈ R
X
is
defined as the set
supp(f):=cl{ω ∈ X : f(ω) > 0}.
Since ever y finite set is closed in a metric space, we have
supp(f)={ω ∈ X : f(ω) > 0}
for all f with finite suppor t. O n the other ha n d, we sa y th at f is a simple densit y
function if f ≥ 0, supp(f) is finite and


ω∈supp(f)
f(ω)=1. Asimpledensity
function f induces a probability measure p
f
on a σ-algebra Σ on X a s follows:
p
f
(S):=

ω∈ supp (f )∩S
f(ω) for any S ∈ Σ.
(Note. If supp(f ) ∩ S = ∅, then p
f
(S)=0.) For ob vious reasons, suc h a probability
measu r e is called simple. We denote the set of all simple probability measures by
12
P
s
(X), and s ay that (X, Σ,p
f
) is a s imple pro b ab ility spa ce . It is easily seen that
an y probabilit y space (X, 2
X
,p) with |X| < ∞ is a simple probab ility space. (Fo r,
f ∈ [0, 1]
X
defined by f(ω):=p({ω}) is a simple density function, and p = p
f
.

6
)
Since singleton sets are clos ed in a ny m e tric sp ac e, (X, 2
X
,p) is a B orel pr ob ab ility
spac e, provided that X is a finite set. More generally, we define the support of any
Borel probability measu re p ∈ P(X),denotedsupp(p), as the smallest closed set S
such tha t p(S)=1. Given this definition, a Bo rel proba b ility measu re is simple iff
it has finite support. (Note. Suc h measures are referred to as sim ple lotte r ie s in
decision theory.) 
Exercise 20. Let X be a nonempt y countable set and p :2
X
→ R. Prov e:
(a)
(X, 2
X
,p) is a probability space iff there exists an f ∈ [0, 1]
X
such that p(S)=

ω∈S
f(ω).
(b)If(X, 2
X
,p) is a probability space such that there exist an ε > 0 and f ∈ [ε, 1]
X
with p(S)=

ω∈S
f(ω) for each S ∈ 2

X
, then |X| < ∞.
Exercise 21.
H
Let X b e a countably infinite set, and take any f ∈ B(X) with f ≥ 0.
Show that there exists a g ∈ R
X
+
such that (X, 2
X
,p) is a probability space where
p :2
X
→ R is given by p(S):=

ω∈S
g(ω)f(ω).
The follo wing example is very important. It sho ws that constructing non-simple
probabilitymeasuresonaninfinite sample space is in general not a trivial matter.
We will revisit this example several times throughout the sequel.
E{dpsoh 4. C onsider the experimen t of tossing successiv ely k many (fair) coins.
Denoting ‘heads’ by 1 and ‘tails’ by 0 (for convenience), the sample space of this
experiment can be writt en as X := {0, 1}
k
. Given that X is finite, there is no problem
with taking 2
X
as the relev an t event space. After all, thanks to the finiteness of X,
there is a natu ral w ay of assigning probabilities to events by using the notion of
relative frequency.Hencewedefine p(S):=

|S|
2
k
for an y event S ∈ 2
X
. Of course,
p is none other than the simple probabilit y measure induced by the simple density
function f(ω):=
1
|X|
for each ω ∈ X.Ifwedropthefiniteness assumption, howev er,
things get slig htly icy, and it is actually at this point that the use of the formalism
of the general probabilit y m odel kicks in.
Consider the experiment of tossing a (fair) coin infinitely many times. The sam ple
space o f this experiment is the seque n ce space {0, 1}

. How do w e define events and
probab ilities here? The prob lem is that infin ite car din ality of our sample spac e makes
it impossible to use the idea of relative frequ en c y to assign proba b ilities to all the
ev ents that we are interested in. Yet, intuitively, we still want to use the relative
frequenc y interpret ation of probabilit y her e. For instance, w e really want to be able
6
That is to say, if X is finite, specifying p on singleton events defines p on the entire 2
X
: p(S)=

ω∈S
p({ω}) for any S ⊆ X.
13
to sa y that the probability of observing infin itely many tails is 1. Or, what is a bit

more problematic, we wan t to be able to sa y that after sufficiently man y tosses, the
probability that the relativ e frequency of heads tends to
1
2
is large (because the coin
is fair).
So, how should we defin e our even t space? Here is the idea. Let us first deal w ith
the “easy” events. For example, con sider the set
{(ω
m
) ∈ {0, 1}

: ω
1
= a
1
, , ω
k
= a
k
},
where k ∈ N and a
1
, , a
k
∈ {0, 1}. This set is said to be a cylinder set, for it is
comp le tely dete rmined b y a finite n umber of its initial elements. This property mak es
our relative frequen cy intu ition operationa l. Clearly, we wish to assign pr ob ab ility
1/2
k

to this even t. More generally, we want to consider as an ev ent any cylinder
set, that is, any set of the fo rm {(ω
m
) ∈ {0, 1}

:(ω
1
, , ω
k
) ∈ S} where k ∈ N and
S ⊆ {0, 1}
k
. Since this is the even t that “the outcome of the first k tosses belongs
to S,” it is natural to assign to it the probabilit y
|S|
2
k
. What next? We ll, it turns out
that this is all we need to do. So far w e know that we wish to include the set of all
cylind e r s e t s
A :=


k=1

{(ω
m
):(ω
1
, , ω

k
) ∈ S} : S ⊆ {0, 1}
k

in our eve nt space. (C h eck th at this class is an a lgebra on {0, 1}

.) So why don’t we
cons ider A as the nucleus of our even t space, and take the σ-algebra that it generates,
σ(A), as the ev ent space for the problem? A fter all, not only is σ(A) is a σ-algebra
that differs from A in a minim al way, it also contains all sorts of inte resting ev ents t hat
are not contained in A. Fo r instance, in con t rast to the collection A, σ(A) maintain s
that “all tosses after the fifth toss come up hea ds ,” is an event, for {(ω
m
):ω
k
=1}
is a cylinder set for each k, and hence
{(ω
m
):ω
k
=1fo r all k ≥ 6} =


k=6
{(ω
m
):ω
k
=1} ∈ σ(A). (2)

Similarly, the situation “infinitely man y head s come up throughou t the experim ent”
is captured b y σ(A) (but not by A). For,
{(ω
m
):ω
k
=1for infinitely many k} =


k=1


i=k
{(ω
m
):ω
i
=1} ∈ σ(A). (3)
(This is not entirely obvio us; make sure you v erify both of the claims made in (3).)
So, you see, there are many non-cylinder even ts that we can deduce from cylinder sets
b y taking unions, inters ection s and comp lemen ts , and by taking unions, intersec tion s
and complem ents of the resulting sets, and so on. The end point of this process, the
explicit des cript ion o f wh ich canno t really be giv e n , is none othe r than σ(A).
7
7
Quiz. True or false: {(ω
m
):lim
1
k


k
ω
i
=
1
2
} ∈ σ(A).
14
All this is good, σ(A) certainly looks lik e a good even t space to endo w our sample
space {0, 1}

with. But it is w o rrisome that we on ly know wh at probabilities to
assign to the members of A so far. What do we do about the members of σ(A)\A?
(And you are quite righ t if y ou suspect that there are v e ry many such sets.) The good
news is that we don’t have to do anything about them, because the probabilities of
these sets are already determined! Read on. 
3.2 Carathéodory’s E xtension Theorem
The stage is now set for the follo wing fundamen tal theorem of measure theory, which
w e state here without proof.
Cdudwkìrgru|’v E{whqvlrq Tkhruhp. Let A be an algebra on a n onem pty set
X and q : A → R
+
. If q is σ-add itive on A , then there exists a measure p on σ(A)
such that p(A)=q(A) for each A ∈ A. Moreover, if q(X) < ∞, then p is unique.
Th is is a powerful theorem that allows us to construct a probability mea su re
(uniquely) on a σ-algebra by specifying the behavior of the mea su r e only on the
algebra that generates this σ-algebra. Since algebras are often much easier than σ-
algebras to work with, Carathéodory’s Extension Theorem turns ou t to be extremely
useful in constru c ting pro bability mea su re s. For instance, we may apply this theorem

to Example 4 where, as discussed abo ve, we know how to assign probabilities to the
cylinder subsets of {0, 1}

.
Mruh rq E{dpsoh 4. Consider the fram ework of Exercise 4 , and define q ∈ [0, 1]
A
by q({0, 1 }

):=1and
q({(ω
m
):(ω
1
, , ω
k
) ∈ S}):=
|S|
2
k
for each k ∈ N and S ⊆ {0, 1}
k
. (Is q well-de fined?) No w q is easily checked to be
finitely additive. Moreover, as we sho w next, q is continuo us from abo ve at the empty
set, so we hav e the following fact:
Claim. q is a σ-additive function o n A.
ProofofClaim. Take an y decreasing sequence (A
m
) of cylinder sets in {0, 1}

with



A
i
= ∅. Since {0, 1}

is compact (why ?), and each A
i
is closed in {0, 1}

,


A
i
= ∅ implies tha t the cla ss {A
1
,A
2
, } cannot hav e the Finite Intersection
pro perty. Th us

M
A
i
= ∅ for some M ∈ N. It follows that, for any m ≥ M, we
have A
m
⊆ A
M

=

M
A
i
= ∅ so that lim q(A
m
)=q(∅)=0. Applying Proposition 3
com pletes the ar gument. 
The stage is now set for Carathéodory’s Extension Theorem. Applying this the-
orem, we actually find a unique p robab ility measure p on σ(A) which agree s with q
15
on each cylinder set. I n turn, this solv es nicely the problem of finding the “right”
prob ability space for the experiment of Example 4.
8
But, how does p attach probabilities to every event in σ(A)? Unfortunately, a
complete answer w ould require us to go t hrough the proof o f Carathéodory’s Ex-
tension Theore m in this particular con te xt, and we wish to a void this at this stage.
However, it is not difficult to find the probab ility of at l e as t some non-c ylinder even t s
in our experim e nt. For instance, let us co mpute the probability of t he eve nt that “all
tosses after the fif th t oss com e up heads” (recall (2)). B y using Proposition 2, this is
done easily. Defining the cylinder sets A
k
:= {(ω
m
):ω
6
= ···= ω
k
=1} for each

k ≥ 6, and using that proposition, we get
p



k=6
{(ω
m
):ω
k
=1}

= lim
k→∞
p (A
k
) = lim
k→∞
q (A
k
) = lim
k→∞
2
5
2
k
=0,
a very agreeable finding. We can use a similar technique to compute the probabilit y
of many other in teresting ev ents. For instance, in the case of the ev ent given in (3),
we have

p



k=1


i=k
{(ω
m
):ω
i
=1}

=lim
k→∞
p



i=k
{(ω
m
):ω
i
=1}

=1− lim
k→∞
p ({(ω

m
):ω
i
=0fo r all i ≥ k})
=1,
once again quite an intuitive observat ion. 
Exercise 22. Consider the probability space ({0, 1}

, σ(A),p) we have constructed
abov e for the experiment of tossing infinitely many fair coins. Show that the state-
ments “at least one head comes up after the tenth toss,” “only heads come up after
fini tely many tosses,” and “a tail comes up at every even toss,” are formally captured
as events in the model at hand. Compute the probability of each of these events.
Exercise 23. Let
(X, Σ,q) be a probability space, and S asubsetofX with S/∈ Σ.
(a) Show that
σ(Σ ∪ {S})={(S ∩ A) ∪ ((X\S) ∩ B):A, B ∈ Σ}.
(b) By using Carathéodory’s Extension Theorem, show that there is a probability
measure
p on σ(Σ ∪ {S}) such that p(A)=q(A) for each A ∈ Σ.
(c) Do part (b) without using Carathéodory’s Extension Theorem.
8
This example attests to the usefulness of the notion of σ-algebra. Suppose you instead designated
2
{0,1}

as the event space of this experiment. How would you define the probability of an arbitrary
set in {0, 1}

?

Quiz. Sho w that 2
{0,1}

= σ(A) in the example at hand.
16
3.3 The Lebesgue-Stieltjes Probabilit y Measure
We next mov e to another examp le in whic h we again use Carathéodory’s Exten -
sion Theorem to construc t the “right” probability space. This example will pla y a
fundam ental role in muc h o f what follows.
Let us first recall the notion of distribution function.
Dhilqlwlrq. AmapF : R →[0, 1] is said to be a distribution function if it is
increasing, righ t-con tinu ous and we hav e F (−∞)=0=1− F (∞).
9
Exercise 24. Show that a distribution function can have at most countably many
discon tinuit y points. Also sho w that if a distribution function is continuous, then it
must be uniformly continuous.
E{dpsoh 5. Let A be the algebra induced by the r igh t-semiclosed interva ls (E xample
1.[4]). Let F be a distribution fun ctio n. Define the map q ∈ [0, 1]
A
as follo ws :
()If−∞ ≤ a ≤ b<∞, then q((a, b]) := F(b) − F(a);
()If−∞ ≤ a, then q((a, ∞)) := 1 − F (a); and
()IfA
1
, , A
m
are finitely many disjoin t in tervals in A, then
q

m


i=1
A
i

:=
m

i=1
q(A
i
).
10
So far so good. O n ce again our problem is to find a probability measure on σ(A) that
accords with q, which is a po tentially difficult problem. Moreo ver, what if there are
t wo such measures, which one should we choos e? Carathé odory ’ s Extension Theorem
deals with these issues at one stroke. If w e can show that q is σ-additive on A, we can
conc lude tha t q actually “defines” the probability measure that is induced b y F on
the Borel σ-algebra B(R). A fter all, Carathéodory’s Extension Theorem w ould then
say that there is a uniqu e extension of q to a probability measu re on σ(A)=B(R).
This probab ility measure, denoted p
F
, is called the Lebesgue-Stieltjes probabilit y
measure induced by F on R.
We are not done though, we still hav e to establish the σ-additivity of q. The
strategy of attack is iden tical to that in the case of Exam ple 4. Since q is obviously
finitely additive, it is enough to establish the con tinuity of q from below at the em pt y
9
Notation. F (−∞) := lim
t→−∞

F (t) and F (∞) := lim
t→∞
F (t). Moreover, for any real number
a, F (a−) denotes the left-limit of F at a, that is, F(a−) := lim
m→∞
F ( a−
1
m
). The expression F (a+)
is understood similarly.
10
Howdoweknowthatq is well-defined? Since a right-semiclosed interval can be written as a finite
union of other right-semiclosed in tervals, w e have at the m omen t two di ffe rent ways of c omputing the
probability of such intervals, which may be, in principle, distinct from each other. But things work
out fine here. For insta n ce, it is immediately verified that q((−∞,b]) = q((−∞,b− 1] ∪ (b − 1,b])
for any b ∈ R. One may easily generalize this example to v er ify that q is w ell- defined.
17
set (Proposition 3). To this end, take a decreasing sequence (A
m
) in A such that


A
i
= ∅. Take an arbitrary ε > 0,andfixsomeindexi ∈ N. Since A
i
equ als the
union of finitely man y right-closed intervals, and F is right-con tin uous, one can show
that we can find a bounded set B
i

in A such that cl
R
(B
i
) ⊆ A
i
and q(A
i
)−q(B
i
) <
ε
2
i
.
(Proof. Exercise.) The boundedness of B
i
implies tha t cl
R
(B
i
) is closed in R. But R is
a com pact metric space ( yes?), and


cl
R
(B
i
)=∅. It follo ws that


M
cl
R
(B
i
)=∅ for
some sufficien tly large positiv e integer M.(Why?)Consequently,foreachm ≥ M,
we have

m
B
i
= ∅, and therefore,
q(A
m
)=q

A
m
\
m

i=1
B
i

≤ q

m


i=1
(A
i
\B
i
)

since A
1
⊇ ··· ⊇ A
m
. We are almost done. All w e need to observ e now is that
q (

m
(A
i
\B
i
)) ≤

m
q(A
i
\B
i
).
11
From this, it follo ws that

q(A
m
) ≤
m

i=1
(q(A
i
) − q(B
i
)) <
m

i=1
ε
2
i
< ε,m≥ M.
Conclusion: For any ε > 0, there exists an M ∈ N su ch that q(A
m
) < ε for a ll
m ≥ M, that is, lim q(A
m
)=0. 
A major upshot of Example 5 is the follo w ing: O ne can always define a probability
measu re on the reals by means of a distribution function. I nterestingly, the con verse
of this is also true. T ha t is to sa y, any Borel probability measure p on R arises this
w ay. Indeed, for an y suc h p,themapt → p((−∞,t]) on R is a d is tribu tion function.
This means that, on R, a Borel probability measure c an actually be identified with a
distribution function.

Exercise 25.
H
Show that, for an y p ∈ P(R), the map F
p
: x → p((−∞,t]) is a
distribution function
. Moreover, prove that
p({t})=F
p
(t) − F
p
(t−) for any t ∈ R,
so F
p
continuous at t iff p({t})=0.
11
Where does this come from? Well, from Exercise 14.(c), or better, from
Boole’s Inequality for Finitely A dditive Set Functions:IfC is an algebra and r : C → [0, 1] finitely
additive, then r (

m
C
i
) ≤

m
r (C
i
) for any m ∈ N and C
1

, , C
m
∈ C.
Proof is easy. For m =2, let D = C
1
∩ C
2
and use finite additivity to get
r (C
1
∪ C
2
)=r((C
1
\D) ∪ D ∪ (C
2
\D))
= r(C
1
\D)+r(D)+r(C
2
\D)
= r(C
1
) − r(D)+r(D)+r(C
2
) − r(D)
≤ r(C
1
)+r(C

2
).
The rest follows by induction.
18
We ha ve constructed in Example 5 the Lebesgue-Stieltjes measure induced b y a
distrib u t ion function F on the entire R. The analogous construction w ork s for any
interval in R. For instance, if X := (a, b] with −∞ <a<b<∞,andF ∈ [0, 1]
X
is
an increasing and right-contin u ous function with F (a+) = 0 and F (b)=1, then we
can de fine the Lebesgue -Stieltjes pr o bability mea s ur e induced by F on (a, b] by
using precisely the approach dev eloped in Example 5. It is not difficult to show that
this measure is the restriction of the Lebesgue-Stieltjes probability measure induced
by the distribution function G on R to B(a, b], where G is the (unique) distribution
function w ith G|
X
= F.
We should also n ote that F(b)=1amounts only to a normalization here. Indeed,
the argument outlined in Exam ple 5 work s for any right-contin u ous and increasing
F ∈ R
(a,b]
suc h that F (b) >F(a). (The only modification needed in the argumen t
given in Exam p le 5 is that we no w consider only the right-closed in tervals that are
in (a, b] when defining q via F and set q(X)=F (b) − F (a).) O f course, the resul t i ng
(unique) measure p — no w called the Lebesgue-Stieltjes measure induced by F on
(a, b] — is not a probability measure (unless F (b) − F (a)=1). T h is measure rather
assesses the “measure” of the space X as F(b) − F(a).
3.4 The Lebesgue Measure
Even th ou gh w e focus on proba bility measur es throughou t this text, we n eed to
consid er at least one infinite measure, whic h is, geometrically speaking, the natural

measure of the real line. To in troduce this measure, tak e any i ∈ Z, let X
i
:= (i, i+1],
and define F
i
: X
i
→ [0, 1] by F
i
(t):=t − i.Let
i
denote the Lebesgue-Stieltjes
measure induced by F
i
on X
i
for eac h i ∈ Z. We define the Le besgue measure  on
B(R) by
(S):=

i∈Z

i
(S ∩ X
i
).
Clearly, this m easure agrees with each 
i
on X
i

in the sense that (S)=
i
(S) for
an y S ∈ B(X
i
),i∈ Z. Moreo ver, it assigns to any interval in R itslengthasits
measure.
12
Exercise 26.
H
Prove t hat (R, B(R), ) is a measure space such that
((a, b]) = b − a, −∞ ≤ a<b<∞.
The restriction of  to an y Borel subset X of R is a Borel measure on X, that
is, (X, B(X), |
B(X)
) is a Borel measure space f or an y X ∈ B(R). For brevity, we
denote this me asure space simply as (X,B(X), ) in what follows. For instance,
([0, 1], B[0, 1], ) is a Borel probabilit y s pace.
12
Quiz.Whyis w ell-defined?
19
Let us establish a few elem entary facts about the Lebesgue measure. First of all,
what is the Lebesgue measure of a singleton set? The answ er is:
({a})=



m=1
(a −
1

m
,a]

= lim
m→∞
((a −
1
m
,a]) = lim
m→∞
1
m
=0
for an y real n u mber a. (Why the second equality?) Consequently, any singleton set
in R has Lebesgue measure zero. In fact, any count able set has this property, since
by σ-additivity of , we have
({a
1
,a
2
, })=



i=1
{a
i
}





i=1
({a
i
})=0
for any (a
m
) ∈ [a, b]

. For instance, w e ha ve (Q)=0.
13
3.5 More on the Lebesgue Measure
The previous subsection con tain s just about all you need to kno w about the Lebesgue
measure to follow the subsequent dev elopmen t. So, if you wish to get to the core of
probab ility theory right away, you may proceed at this poin t directly to Section 5.
The present subsection aims at completing the above discussion by going over a few
highlights of the Lebesgue measure theory. The presenta tion takes place mostly by
means of exercises.
Exercise 27.
H
(Tr anslation Invariance of ) For any (S, α) ∈ B(R) × R, show tha t
(S + α)=(S).
Exercise 28.
H
(a)(Non-atomicity of ) Show that, for any A ∈ B(R) with (A) > 0,
there exists a B ∈ B(R) with B ⊆ A and 0 < (B) < (A).
(b) Give an example of a measure on B(R) which does not possess either of the
properties mentioned in part (a) and Exercise 27.
13

While variou s attempts of formulating (what we now call) the Lebesgue measure were made
prior to the contributions of Emile Borel and Henri Lebesgue (in their respective doctoral theses
of 1854 and 1902), these attempts w ere not brought to their fruition precisely because they too
assigned measure zero to countably infinite sets, an implication that was deemed “absurd” by the
mathematical community of the day. (Even the otherwise revolutionary Cantor was no exception to
this.) In succession, Borel and Lebesgue set the theory on a completely rigorous foundation, and as
the structure of c ountable sets were better understood in time, it was eventually accepted that an
infinite set can be deemed “very small,” in fact “neg ligible,” from the measure-theoretic perspective.
(See Hawkins (1980), especially pp. 172-180, for a beautiful survey on the origins of the theory of
theLebesguemeasureandintegral.)
The situation may at first seem somewhat reminiscent of Cantor’s countability theory viewing
countably infinite sets “smaller” than uncountable sets, but this is misleading. For, there are in fact
uncountable sets in [a, b] which have Lebesgue measure zero. While such sets are somewhat esoteric,
andwillnotconcernushere,youshouldmakenote o f the fact that the “relative size” of a subset
of the real line from the “countability” and “measure” perspectives may well be radically different.
20
It is worth noting that the probability space ([0, 1], B[0, 1], ) is not complete,that
is, there are -null sets A in B[0, 1] such that B/∈ B[0, 1] for some B ⊂ A. (Here by
an -null set A, w e mean an y Borel subset A of [0, 1] with (A)=0.) However, we
can “complete” this space in a straigh tfo rward ma nner. Define
L[0, 1] := {S ∪ B : S ∈ B[0, 1] and B is a subset of an -null eve nt}.
(Any member of L[0, 1] is said to be a Lebesgue measurable set. ) No w define


(S ∪ B):=(S) for any S ∈ B[0, 1] and any subset B of an -null even t. Then
([0, 1], L[0, 1], 

) is a complete probability space.
14
Th is space — called the Lebesgue

proba bility space — extends ([0, 1], B[0, 1], ) in the sense that B[0, 1] ⊆ L[0, 1] and


|
B[0,1]
= . Moreover, it is the smallest such extension in the sense that if ([0, 1], Σ, μ)
is an y c omplete probabilit y space with B[0, 1] ⊆ Σ and μ|
B[0,1]
= , then L[0, 1] ⊆ Σ.
Curiously, L[0, 1] is m uch larg er tha n B[0, 1].
15
And y e t, there are still sets in [0, 1]
which do not belong to L[0, 1],thatis,there are sets that are not Lebesgu e measurable.
The following exercise walks you through a proof of this fact.

Exercise 29.
H
For any set S in [0, 1] and any α ∈ [0, 1], let us agree to write S ⊕ α
for the set {t ∈ [0, 1] : t = s + α (mod 1) and s ∈ S}.
16
(a) Show that S ⊕α is Lebesgue measurable if S is Lebesgue measurable, and in this
case
(S ⊕ α)=(S).
Now define the equivalence relation ≈ on [0, 1] by α ≈ β iff α − β ∈ Q. Use the
Axiom of Choice to select exactly one element from eac h of the induced equivalence
classes, and denote the resulting collection by
S. Enumerate next the rationals in
[0, 1] as {r
1
,r

2
, }, and define S
m
:= S ⊕ r
m
for each m.
(b)Showthat{S
1
,S
2
, } is a partition of [0, 1].
(c) Use parts (a) and (b) to conclude that w e would have ([0, 1]) ∈ {0, ∞} if S was
Lebesgue measurable. Thus,
S cannot be Lebesgue measurable.
17
(d)ProveVitali’s Theore m: There is no probabilit y space ([0, 1], 2
[0,1]
,p) such that
p(S ⊕ α)=p(S) for all S ⊆ [0, 1].
14
Quiz. Prove!
15
The cardinality of L[0, 1] is strictly larger than that of B[0, 1]. (The cardinality of B[0, 1] is the
same as that of R.) This is clearly not the right place to prove these facts. If you are interested,
have a look a t Hewitt a nd Strom berg (1965), pp. 133-134.
16
For any a, b ∈ [0, 1],a+ b (mod 1) equals a + b if a + b ≤ 1, and a + b − 1 otherwise.
17
More generally, every set of positive Lebesgue measure in [0, 1] (or in R) c ontains a L ebesgue
nonmeasurable subset. The present proof, which is due to Guiseppe V itali, can easily be modified

to establish this stronger statement. Thomas (1985) prov ides an alternative proof that derives from
basic graph theory.
Note. Lebesgue nonmeasurable sets cannot be found b y the finite constructiv e method. Loosely
said, Solo vay (1970) have shown that the existence of such a set in [0 , 1] canno t be proved (within the
axiomatic system of s tandard set theor y) without invoking the Axiom of Choice. (If you’re int erested
in these sort of things, you m ay want to read the e xpository account o f Briggs and Schaffter (1979).)
21
Vitali’s Theorem sho ws that the use of the probability space s that tak e as the
event space the power set of the sample space may sometime s be seriously limited.
Insigh t:Theσ-algebra tec hn ology is indispensable for the development of probabilit y
theory.
4 The Sierpinski Class Lem ma
In this s h ort section we provide a pr oof of t he uniquenes s p art of Carathé odory’s
Extension Theorem. As you will see later, the technique we will in troduce for this
purpose is useful in a good number of other occasion s as well.
Let us agree to call a class S of subsets of a give n nonempty set X a Sierpinski
class (or for short, an S-class)onX, pro vided tha t
(i) if A, B ∈ S and A ⊆ B, then B\A ∈ S, and
(ii) if A
1
,A
2
, ∈ S and A
1
⊆ A
2
⊆ ···, then


A

i
∈ S.
The smallest S-class on X that contains a given class of subsets of X, say A, is called
the S-class generated b y A, and is deno ted by s(A). It is not difficult to verify
tha t suc h a set exists . Indeed, we have
s(A)=

{C ⊆ 2
X
: A ⊆ C and C is an S-class on X}.
This follows from the fact that the intersection of any collection of S-clas ses on X is
again an S-class on X.
The follo w ing (easy) exercise reports a useful observation about S-classes that we
will need short ly.
Exercise 30.
H
Let X be a nonempty set, and S an S-class on X.Provethatif
X ∈ S and S is closed under taking finite intersections, then S must be a σ-algebra.
Here comes a maj or result that w e shall later use again and again.
Tkh Slhuslqvnl Codvv Lhppd.
18
Giv en an y nonempty set X,andletA ⊆ 2
X
be
closed under taking finite int ersections, and X ∈ A. If S is an S-class o n X such that
A ⊆ S, then σ(A) ⊆ S.
Proof. Let S
0
:= s(A). Obviously, X ∈ S
0

. Thus, by Exercise 30, if we can show
that S
0
is closed under taking finite intersections, then w e can conclude that S
0
is a
σ-algebra. From t his it would follow that σ(A) ⊆ S
0
⊆ S, as w e seek.
19
18
This result is often referred to as Dynkin’s π-λ Theorem (where an S-class is instead called a
λ-system). However, historically speaking, I think it is more suitable to use the terminology we
adopt here, for even a stronger result is proved b y Sierpinski (1928), albeit in a non-probabilistic
con t ext. (I learned this from Bert Fristedt.)
19
So, m y objectiv e is to derive the statement
A ∩ B ∈ S
0
for all (A, B) ∈ S
0
×S
0
,
22
Define
S
1
:= {A ⊆ X : A ∩ B ∈ S
0

for all B ∈ A}.
By hypothesis, w e have A ⊆ S
1
. Moreov er, S
1
is an S-class on X. Indeed, if A, C ∈ S
1
and A ⊆ C, then
(C\A) ∩ B =(C ∩ B)\(A ∩ B) ∈ S
0
for all B ∈ A,
and if A
1
,A
2
, ∈ S
1
and A
1
⊆ A
2
⊆ ···, then



i=1
A
i

∩ B =



i=1
(A
i
∩ B) ∈ S
0
for all B ∈ A.
It follo w s that S
0
⊆ S
1
, that is, A ∩ B ∈ S
0
for all (A, B) ∈ S
0
×A.
Now define
S
2
:= {B ⊆ X : A ∩ B ∈ S
0
for all A ∈ S
0
}.
By what is established in the previous paragraph, w e have A ⊆ S
2
. B u t, again , one can
easily che ck that S
2

is an S-class on X. T herefore, S
0
⊆ S
2
, that is, A∩B ∈ S
0
for all
A, B ∈ S
0
. B y induction, it follows that S
0
is closed under taking finite intersections,
and we are done. 
What is the point of all this? Well, t he idea is the f ollowing. If w e learned
somehow that a propert y holds for all sets in a class A which contains the samp le
space, and is closed under taking finite in tersections, and if, in addition, we managed
to show that the class of all sets for which this propert y is true is an S-class, then
we may use the Sierpinski Class Lemma to con c lud e that all sets in th e σ-algebra
generated b y A actually belong to the latter class, and hence satisfy the property in
question. Since it is usually easier to work with S-classes rather than σ-algebras, this
observation ma y, in turn, provide help when one needs to “ go from a given set to the
σ-algebra generate d by that set.” To illustrate, consider th e following claim:
Pursrvlwlrq 4. Let X and A be as in the Sierpins ki C lass Lemma. If p and q are
tw o finite measures on σ(A) such that p|
A
= q|
A
, then p = q.
Amoment’sreflection will show that th is is even s t ronger than the uniqueness pa rt
of Carathéodory’s Extension Theorem (for w e do not require here A to be an algebra).

from the s tatement
A ∩ B ∈ S
0
for all (A, B) ∈ A×A,
which is true by hypothesis. Watc h out for a very pretty trick! I will first prove the intermediate
statement
A ∩ B ∈ S
0
for all (A, B) ∈ S
0
×A,
andthendealthefinal blow by using this intermediate step.
23
How does one prove someth in g like this? Let’s use the idea outlined inform ally abo ve.
Define S := {S ⊆ X : p(S)=q(S)}. Using Proposition 2, it is easy to verify that S
is an S-class on X. B ut, by h ypothesis, the property p(S)=q(S) holds for all S in
A. By the Sierpins ki Class Lemma , then , σ(A) ⊆ S,thatis,p(S)=q(S) holds for
all S ∈ σ(A), and w e are done. (Nice trick, no?)
Warnin g. T he uniqueness result reported in Proposition 4 is not valid for infinite
measu res in general. However, if X can be written as a countable union of disjoint
sets X
i
, and p(X
i
)=q(X
i
) for each i, then Proposition 4 applies even though p(X)=
q(X)=∞.
We conclude by noting that closedness of A under taking finite in tersections is
crucial f or Proposition 4. To se e this, let X := {a, b, c, d} and A := {{a, b}, {b, c}} so

that σ(A)=2
X
.Nowletp be the probabilit y measure on 2
X
that assigns proba-
bility
1
2
to the outcomes b and d,andletq be the probability measure that assigns
probab ility
1
2
to the outcomes a and c. Clearly, p and q are probability measures on
2
X
with p = q on A but p = q in general. (C ompare w ith Proposition 4.) What goes
wrong here is that the Sierpinski Class Lemma does not w ork when A is not closed
under taking fin ite intersections. In d eed, S = {{a, b}, {b, c}, {a, b, c}} is a superset
of A which is an S-class, and y et w e hav e σ(A)=2
X
- S in this example. Notice
that, the problem would disappear if we replaced A with A

= {{a, b}, {b, c}, {b},X}.
Since A

is closed under taking finite inte rse ct ions, by the Sierp inski Class Lemma,
an y two probab ility measu res on 2
X
that agree on A


must agree on σ(A

)=2
X
.
20
Exercise 31. If X is a metric s pace and p, q ∈ P(X) with p(O)=q(O) for all open
subsets
O of X, then p = q. Give two proofs of this, one that uses the uniqueness
part of Carathéodory’s Extension Theorem, and another that uses the Sierpinski
Class Lemma.
Exercise 32.
H
Let X be a nonempty set. A class M ⊆ 2
X
is said to be a monotone
class on
X if, for any (A
m
) ∈ M

,A
1
⊆ A
2
⊆ ···implies


A

i
∈ M and
A
1
⊇ A
2
⊇ ···implies


A
i
∈ M.
(a) Show that a monotone class on
X which is an algebra is a σ-algebra on X.
(b)Showthatif
A ⊆ 2
X
is an algebra, the smallest class that con tains A —denoted
as
m(A) — must be an algebra on X.
(c) (Halmos) Prov e the Monotone Class Lemma:If
A is an algebra on X, and if A

is a monotone class on X, then A ⊆ A

implies σ(A) ⊆ A

.
(d) Prove the uniqueness part of Carathéodory’s Extension Theorem b y using the
Monotone Class Lemma.

20
For concreteness, here is a direct proof. Let p and q be t wo such probability measures. Then p
and q agree on both {a, b} and {b} so that p({a})=p({a, b}) − p({b})=q({a, b}) − q({b})=q({a}).
One can similarly show that p({c})=q({c}). Finally, these measures agree on {d} as well, because
p({d})=p(X) −

t∈{a,b,c}
p({t})=q(X) −

t∈{a,b,c}
q({t})=q({d}).
24
5 Random Variables
One is often inte reste d in a particular c ha rac teris tic of the outcome of a random
experimen t. To deal with such situations we need to transform a giv e n probabilit y
space (that models the mothe r experiment) to another probability space the sample
space of which is a subset of R (or a more complex metric space). This transforma-
tion is done by means of a random variable. For instance, consider the experiment
of tossing (independently) t wo fair dice, and suppose that for some reason we are
in terested in the sum of the faces of th ese dice. We could model he re the mother
experiment by means of the probability space (X, 2
X
,p) where X := {1, , 6}
2
and
p(S):=
|S|
36
for an y S ∈ 2
X

. On the other hand, this is not imme diately useful, for
we are in te r es te d in the exper im ent only insofar as its implic ation s for th e sum of t h e
faces of the two dice are concerned. To obtain the probabilit y space that is tailored
for our purposes her e , we would us e the map x : X → {2, , 12} which is defined
by x(i, j):=i + j. ( T his map i s an ex ample of a random variable). I ndeed, the
probability space we are after is none other than (Y, 2
Y
,q), where Y = {2, , 12} an d
q(S):=p({ω ∈ X : x(ω) ∈ S}) for each S ∈ 2
Y
. Of course, we could get to this space
directly by defining q(S):=

a∈A
1
36
(6 − |7 − a|) for e ach S ∈ 2
Y
, but as y ou will see,
the previous method is far superior.
None of this is really new to yo u, so let us mov e on to the formal development.
5.1 Random Variables as Measurable Fu nction s
Here is the formal defin ition of a random variable.
Dhilqlwlrq. Let (X, Σ) be a measurable space. A mapping x : X → R such that
x
−1
(B) ∈ Σ for every Borel subset B of R is called a random va r iable on (X, Σ).
More g ene rally, if Y is a metric space, a nd x is a m ap from X into Y su ch that
x
−1

(B) ∈ Σ for every B ∈ B(Y ), then x is called a Y -va lued random variable on
(X, Σ).
21
Notation. In this book the set of all random variables on a measurable space (X, Σ)
is denoted as RV (X, Σ).Moreover,wedefine
RV
+
(X, Σ):={x ∈ RV (X, Σ):x ≥ 0}.
(This notation is not standard in the literature.)
A few remarks on term inology are in order.
Rhpdun 1. [1] One often talks about a ran d om variable “on (X, Σ,p),” but strictly
speaking, this means that x is a rando m var ia ble on (X, Σ). Indeed, the measure p
21
Thus, by conve ntion, I call an R-valued random v ariable simply as a “random variable.”
25

×