Tải bản đầy đủ (.pdf) (54 trang)

A Course in Mathematical Statistics phần 3 docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (337.03 KB, 54 trang )

90 4 Distribution Functions, Probability Densities, and Their Relationship
4.1.3 Refer to Exercise 3.3.13, in Chapter 3, and determine the d.f.’s corre-
sponding to the p.d.f.’s given there.
4.1.4 Refer to Exercise 3.3.14, in Chapter 3, and determine the d.f.’s corre-
sponding to the p.d.f.’s given there.
4.1.5 Let X be an r.v. with d.f. F. Determine the d.f. of the following r.v.’s:
−X, X
2
, aX + b, XI
[a,b)
(X) when:
i) X is continuous and F is strictly increasing;
ii) X is discrete.
4.1.6 Refer to the proof of Theorem 1 (iv) and show that we may assume
that x
n
↓−∞ (x
n
↑∞) instead of x
n
→−∞(x
n
→∞).
4.1.7 Let f and F be the p.d.f. and the d.f., respectively, of an r.v. X. Then
show that F is continuous, and dF(x)/dx = f(x) at the continuity points x of f.
4.1.8
i) Show that the following function F is a d.f. (Logistic distribution) and
derive the corresponding p.d.f., f.

Fx
e


x
x
()
=
+
∈>∈
−+
()
1
1
0
αβ
αβ
,,,;
ޒޒ
ii) Show that f(x) =
α
F(x)[1 − F(x)].
4.1.9 Refer to Exercise 3.3.17 in Chapter 3 and determine the d.f. F corre-
sponding to the p.d.f. f given there. Write out the expressions of F and f for
n = 2 and n = 3.
4.1.10 If X is an r.v. distributed as N(3, 0.25), use Table 3 in Appendix III in
order to compute the following probabilities:
i) P(X <−1);
ii) P(X > 2.5);
iii) P(−0.5 < X < 1.3).
4.1.11 The distribution of IQ’s of the people in a given group is well approxi-
mated by the Normal distribution with
μ
= 105 and

σ
= 20. What proportion of
the individuals in the group in question has an IQ:
i) At least 150?
ii) At most 80?
iii) Between 95 and 125?
4.1.12 A certain manufacturing process produces light bulbs whose life
length (in hours) is an r.v. X distributed as N(2,000, 200
2
). A light bulb is
supposed to be defective if its lifetime is less than 1,800. If 25 light bulbs are
4.1 The Cumulative Distribution Function 91
tested, what is the probability that at most 15 of them are defective? (Use the
required independence.)
4.1.13 A manufacturing process produces
1
2
-inch ball bearings, which are
assumed to be satisfactory if their diameter lies in the interval 0.5 ± 0.0006 and
defective otherwise. A day’s production is examined, and it is found that the
distribution of the actual diameters of the ball bearings is approximately
normal with mean
μ
= 0.5007 inch and
σ
= 0.0005 inch. Compute the propor-
tion of defective ball bearings.
4.1.14 If X is an r.v. distributed as N(
μ
,

σ
2
), find the value of c (in terms of
μ
and
σ
) for which P(X < c) = 2 − 9P(X > c).
4.1.15 Refer to the Weibull p.d.f., f, given in Exercise 3.3.19 in Chapter 3 and
do the following:
i) Calculate the corresponding d.f. F and the reliability function
ޒ
(x) = 1 −
F(x);
ii) Also, calculate the failure (or hazard) rate

Hx
fx
x
()
=
()
()
ޒ
, and draw its graph
for
α
= 1 and
β
=
1

2
, 1, 2;
iii) For s and t > 0, calculate the probability P(X > s + t|X > t) where X is an r.v.
having the Weibull distribution;
iv) What do the quantities F(x),
ޒ
(x), H(x) and the probability in
part (iii) become in the special case of the Negative Exponential
distribution?
4.2 The d.f. of a Random Vector and Its Properties—Marginal and Conditional
d.f.’s and p.d.f.’s
For the case of a two-dimensional r. vector, a result analogous to Theorem
1 can be established. So consider the case that k = 2. We then have X =
(X
1
, X
2
)′ and the d.f. F(or F
X
or F
X
1
,
X
2
) of X, or the joint distribution function
of X
1
, X
2

, is F(x
1
, x
2
) = P(X
1
≤ x
1
, X
2
≤ x
2
). Then the following theorem holds
true.
With the above notation we have
i) 0 ≤ F(x
1
, x
2
) ≤ 1, x
1
, x
2

ޒ
.
ii) The variation of F over rectangles with sides parallel to the axes, given in
Fig. 4.2, is ≥ 0.
iii) F is continuous from the right with respect to each of the coordinates x
1

, x
2
,
or both of them jointly.
THEOREM 4
4.2 The d.f. of a Random Vector and Its Properties 91
92 4 Distribution Functions, Probability Densities, and Their Relationship
y
0
x
x
2
x
1
y
1
y
2
ϩϪ
Ϫϩ
(x
1
, y
2
)
(x
1
, y
1
)

(x
2
, y
1
)
(x
2
, y
2
)
Figure 4.2 The variation
V
of
F
over the
rectangle is:
F
(
x
1
,
y
1
) +
F
(
x
2
,
y

2
) −
F
(
x
1
,
y
2
) −
F
(
x
2
,
y
1
)
iv) If both x
1
, x
2
, →∞, then F(x
1
, x
2
) → 1, and if at least one of the x
1
, x
2


−∞, then F(x
1
, x
2
) → 0. We express this by writing F(∞, ∞) = 1, F(−∞, x
2
) =
F(x
1
, −∞) = F(−∞, −∞) = 0, where −∞ < x
1
, x
2
<∞.
PROOF
i) Obvious.
ii) V = P(x
1
< X
1
≤ x
2
, y
1
< X
2
≤ y
2
) and is hence, clearly, ≥ 0.

iii) Same as in Theorem 3. (If x = (x
1
, x
2
)′, and z
n
= (x
1n
, x
2n
)′, then z
n
↓ x means
x
1n
↓ x
1
, x
2n
↓ x
2
).
iv) If x
1
, x
2
↑∞, then (−∞, x
1
] × (−∞, x
2

] ↑ R
2
, so that F(x
1
, x
2
) → P(S) = 1. If
at least one of x
1
, x
2
goes (↓) to −∞, then (−∞, x
1
] × (−∞, x
2
] ↓∅, hence
Fx x P
12
0,.
()
→∅
()
=

REMARK 3
The function F(x
1
, ∞) = F
1
(x

1
) is the d.f. of the random variable
X
1
. In fact, F(x
1
, ∞) = F
1
(x
1
) is the d.f. of the random variable X
1
. In fact,
Fx PX x X x
PX x X PX x F x
x
n
n
1112
11 2 11 11
, lim ,
,.

()
=≤≤
()
=≤−∞<<∞
()
=≤
()

=
()
↑∞
Similarly F(∞, x
2
) = F
2
(x
2
) is the d.f. of the random variable X
2
. F
1
, F
2
are called
marginal d.f.’s.
REMARK 4
It should be pointed out here that results like those discussed
in parts (i)–(iv) in Remark 1 still hold true here (appropriately interpreted).
In particular, part (iv) says that F(x
1
, x
2
) has second order partial derivatives
and

∂∂
2
12

12 12
xx
Fx x fx x,,
()
=
()
at continuity points of f.
For k > 2, we have a theorem strictly analogous to Theorems 3 and 6 and
also remarks such as Remark 1(i)–(iv) following Theorem 3. In particular, the
analog of (iv) says that F(x
1
, , x
k
) has kth order partial derivatives and
4.1 The Cumulative Distribution Function 93

∂∂ ∂
k
k
kk
xx x
Fx x fx x
12
11
⋅⋅⋅
⋅⋅⋅
()
=
⋅⋅⋅
()

,, ,,
at continuity points of f, where F, or F
X
, or F
X
1
,···,X
k
, is the d.f. of X, or the joint
distribution function of X
1
, , X
k
. As in the two-dimensional case,
Fx Fx
jjj

⋅⋅⋅
∞∞
⋅⋅⋅

()
=
()
,,,,,,
is the d.f. of the random variable X
j
, and if m x
j
’s are replaced by ∞ (1 < m < k),

then the resulting function is the joint d.f. of the random variables correspond-
ing to the remaining (k − m) X
j
’s. All these d.f.’s are called marginal distribu-
tion functions.
In Statement 2, we have seen that if X = (X
1
, , X
k
)′ is an r. vector, then
X
j
, j = 1, 2, . . . , k are r.v.’s and vice versa. Then the p.d.f. of X, f(x) =
f(x
1
, , x
k
), is also called the joint p.d.f. of the r.v.’s X
1
, , X
k
.
Consider first the case k = 2; that is, X = (X
1
, X
2
)′, f(x) = f(x
1
, x
2

) and set
fx
fx x
fx x dx
fx
fx x
fx x dx
x
x
11
12
12 2
22
12
12 1
2
1
()
=
()
()





()
=
()
()










−∞

−∞

,
,
,
,.
Then f
1
, f
2
are p.d.f.’s. In fact, f
1
(x
1
) ≥ 0 and
fx fx x
xxx
11 1 2
1

211
()
=
()
=
∑∑∑
,,
or
f x dx f x x dx dx
11 1 1 2 1 2
1
()
=
()
=
−∞

−∞

−∞

∫∫∫
,.
Similarly we get the result for f
2
. Furthermore, f
1
is the p.d.f. of X
1
, and f

2
is the
p.d.f. of X
2
. In fact,

PX B
fx x fx x f x
f x x dx dx f x x dx dx f x dx
xBx xxB xB
BBB
1
12 12 11
12 12 12 2 1 11 1
12 211

()
=
()
=
()
=
()
()
=
()
[]
=
()






∈∈ ∈∈∈
∑∑∑∑
∫∫∫∫∫
,,
,, .
,
ޒޒ
ޒޒ
Similarly f
2
is the p.d.f. of the r.v. X
2
. We call f
1
, f
2
the marginal p.d.f.’s. Now
suppose f
1
(x
1
) > 0. Then define f(x
2
|x
1
) as follows:

fx x
fx x
fx
21
12
11
()
=
()
()
,
.
4.2 The d.f. of a Random Vector and Its Properties 93
94 4 Distribution Functions, Probability Densities, and Their Relationship
This is considered as a function of x
2
, x
1
being an arbitrary, but fixed, value of
X
1
(f
1
(x
1
) > 0). Then f(·|x
1
) is a p.d.f. In fact, f(x
2
|x

1
) ≥ 0 and
fx x
fx
fx x
fx
fx
xx
21
11
12
11
11
11
1
22
()
=
()
()
=
()

()
=
∑∑
,,
fx x dx
fx
fx x dx

fx
fx
21 2
11
12 2
11
11
11
1
()
=
()
()
=
()

()
=
−∞

−∞

∫∫
,.
In a similar fashion, if f
2
(x
2
) > 0, we define f(x
1

|x
2
) by:
fxx
fx x
fx
12
12
22
()
=
()
()
,
and show that f(·|x
2
) is a p.d.f. Furthermore, if X
1
, X
2
are both discrete, the
f(x
2
|x
1
) has the following interpretation:
fx x
fx x
fx
PX x X x

PX x
PX x X x
21
12
11
112 2
11
2211
()
=
()
()
=
==
()
=
()
===
()
,,
.
Hence P(X
2
∈ B|X
1
= x
1
) =∑
x
2

∈B
f(x
2
|x
1
). For this reason, we call f(·|x
2
) the
conditional p.d.f. of X
2
, given that X
1
= x
1
(provided f
1
(x
1
) > 0). For a similar
reason, we call f(·|x
2
) the conditional p.d.f. of X
1
, given that X
2
= x
2
(provided
f
2

(x
2
) > 0). For the case that the p.d.f.’s f and f
2
are of the continuous type,
the conditional p.d.f. f (x
1
|x
2
) may be given an interpretation similar to the
one given above. By assuming (without loss of generality) that h
1
, h
2
> 0, one
has
1
1
1
1
111112 222
12 1 1 1 1 2 2 2 2
22222
12 1 2 1 1 2 2
hPx X x hx X x h
hh P x X x h x X x h
hPx X x h
hh F x x F x h x h
()
<≤+ <≤+

()
=
()
<≤+ < ≤+
()
()
<≤+
()
=
()()
++ +
()
,
,,
−−+
()
−+
()
[]
()
+
()

()
[]
Fx x h Fx h x
hFx h Fx
12 2 1 12
222 2 22
1

,,
where F is the joint d.f. of X
1
, X
2
and F
2
is the d.f. of X
2
. By letting h
1
, h
2
→ 0
and assuming that (x
1
, x
2
)′ and x
2
are continuity points of f and f
2
, respectively,
the last expression on the right-hand side above tends to f(x
1
, x
2
)/f
2
(x

2
) which
was denoted by f(x
1
|x
2
). Thus for small h
1
, h
2
, h
1
f(x
1
|x
2
) is approximately equal
to P(x
1
< X
1
≤ x
1
+ h
1
|x
2
< X
2
≤ x

2
+ h
2
), so that h
1
f(x
1
|x
2
) is approximately the
conditional probability that X
1
lies in a small neighborhood (of length h
1
) of x
1
,
given that X
2
lies in a small neighborhood of x
2
. A similar interpretation may
be given to f(x
2
|x
1
). We can also define the conditional d.f. of X
2
, given X
1

= x
1
,
by means of
4.1 The Cumulative Distribution Function 95
Fx x
fxx
fxx dx
xx
x
21
21
21 2
22
2
()
=

()

()








−∞



,
and similarly for F(x
1
|x
2
).
The concepts introduced thus far generalize in a straightforward way for
k > 2. Thus if X = (X
1
, , X
k
)′ with p.d.f. f(x
1
, , x
k
), then we have called
f(x
1
, , x
k
) the joint p.d.f. of the r.v.’s X
1
, X
2
, , X
k
. If we sum (integrate)
over t of the variables x

1
, , x
k
keeping the remaining s fixed (t + s = k), the
resulting function is the joint p.d.f. of the r.v.’s corresponding to the remaining
s variables; that is,
fxx
fx x
fx x dx dx
iii i
k
xx
kj j
ss
jj
t
t
11
1
1
1
1
,,
⋅⋅⋅
⋅⋅⋅
−∞

−∞

⋅⋅⋅

()
=
⋅⋅⋅
()
⋅⋅⋅
⋅⋅⋅
()
⋅⋅⋅






∫∫
,,
,,
,, .
,,
There are
kk k
k
k
12 1
22







+






+⋅⋅⋅+







=−
such p.d.f.’s which are also called marginal p.d.f.’s. Also if x
i
1
, , x
i
s
are such
that f
i
1
, ,
i
t

(x
i
1
, , x
i
s
) > 0, then the function (of x
j
1
, , x
j
t
) defined by
fx x x x
fx x
fxx
jjii
k
iii i
ts
ss
11
11
1
,, ,,
,,
,,
,,
⋅⋅⋅ ⋅⋅⋅
()

=
⋅⋅⋅
()
⋅⋅⋅
()
⋅⋅⋅
is a p.d.f. called the joint conditional p.d.f. of the r.v.’s X
j
1
, , X
j
t
, given X
i
1
=
x
i
1
,···,X
j
s
= x
j
s
, or just given X
i
1
, , X
i

s
. Again there are 2
k
− 2 joint condi-
tional p.d.f.’s involving all k r.v.’s X
1
, , X
k
. Conditional distribution func-
tions are defined in a way similar to the one for k = 2. Thus
Fx x x x
fx x x x
fx x x
jjii
jjii
xxxx
x
jji
tt
ts
jj
t
jj
t
j
t
11
11
11
1

11
,, ,,
,, ,,
,, ,
,, ,,
⋅⋅⋅ ⋅⋅⋅
()
=

⋅⋅⋅

⋅⋅⋅
()
⋅⋅⋅

⋅⋅⋅



⋅⋅⋅

()

⋅⋅⋅
()
−∞


⋅⋅ ⋅
()


⋅⋅⋅








−∞

,.xdx dx
ij j
x
st
j
t
1
We now present two examples of marginal and conditional p.d.f.’s, one
taken from a discrete distribution and the other taken from a continuous
distribution.
Let the r.v.’s X
1
, , X
k
have the Multinomial distribution with parameters
n and p
1
, , p

k
. Also, let s and t be integers such that 1 ≤ s, t < k and
s + t = k. Then in the notation employed above, we have:
4.2 The d.f. of a Random Vector and Its Properties 95
EXAMPLE 1
96 4 Distribution Functions, Probability Densities, and Their Relationship
ii)
fxx
n
xxnr
ppq
qp prx x
iii i
ii
i
x
i
x
nr
iiii
ss
s
i
s
i
s
ss
11
1
1

1
11
1
,,
,,
!
!!!
,
,;
⋅⋅⋅

⋅⋅⋅
()
=
⋅⋅⋅ −
()
⋅⋅⋅
= − +⋅⋅⋅+
()
= +⋅⋅⋅ +
that is, the r.v.’s X
i
1
, , X
i
s
and Y = n − (X
i
1
+ ···+ X

i
s
) have the Multinomial
distribution with parameters n and p
i
1
, , p
i
s
, q.
ii)
fx x x x
nr
xx
p
q
p
q
rx x
jjii
jj
j
x
j
x
ii
ts
t
jj
t

s
11
1
1
1
1
1
,, ,,
!
!!
,
;
⋅⋅⋅ ⋅⋅⋅
()
=

()
⋅⋅⋅






⋅⋅⋅







= +⋅⋅⋅+
that is, the (joint) conditional distribution of X
j
1
, , X
j
t
given X
i
1
, , X
i
s
is
Multinomial with parameters n − r and p
j
1
/q, , p
j
t
/q.
DISCUSSION
i) Clearly,
Xx Xx X XrnYrYnr
ii ii i i
ss s11 1
=
⋅⋅⋅
=

()
⊆ +⋅⋅⋅+ =
()
=−=
()
==−
()
,, ,
so that
Xx Xx Xx XxYnr
ii ii ii ii
ss ss11 11
=
⋅⋅⋅
=
()
==
⋅⋅⋅
==−
()
,, ,, , .
Denoting by O the outcome which is the grouping of all n outcomes distinct
from those designated by i
1
, , i
s
, we have that the probability of O is q, and
the number of its occurrences is Y. Thus, the r.v.’s X
i
1

, , X
i
s
and Y are
distributed as asserted.
ii) We have
fx x x x
fx x x x
fx x
fx x
fx x
n
xx
p
jjii
jjii
ii
k
ii
k
x
ts
ts
ss
11
11
11
1
1
1

1
, ,
, ,
,
⋅⋅⋅ ⋅⋅⋅
()
=
⋅⋅⋅ ⋅⋅⋅
()
⋅⋅⋅
()
=
⋅⋅⋅
()
⋅⋅⋅
()
=
⋅⋅⋅

,,
,, ,
,
,,
,,
!
!!
⋅⋅ ⋅







⋅⋅⋅ −
()
⋅⋅⋅








=
⋅⋅⋅ ⋅ ⋅⋅⋅
⋅⋅⋅ ⋅⋅⋅





p
n
xxnr
ppq
pppp
xxxx
k
x

ii
i
x
i
x
nr
i
x
i
x
j
x
j
x
iij j
k
s
i
s
i
s
i
s
i
s
j
t
j
t
st

!
!!!
!!!!
1
1
1
1
1
1
1
11
⎞⎞



⋅⋅⋅
⋅⋅⋅ −
()








− = − +⋅⋅⋅+
()
= +⋅⋅⋅+
()

=

()
⋅⋅⋅




+⋅⋅⋅+
ppq
xxnr
nr n x x x x
nr
xx
p
q
i
x
i
xx x
ii
iij j
jj
j
i
s
i
s
jj
t

s
st
t
1
11
1
11
1
1
!!!
!
!!
since
⎠⎠

⋅⋅⋅






x
j
x
j
t
j
t
p

q
1
,
as was to be seen.
Let the r.v.’s X
1
and X
2
have the Bivariate Normal distribution, and recall that
their (joint) p.d.f. is given by:
EXAMPLE 2
4.1 The Cumulative Distribution Function 97
fx x
xxxx
12
12
2
2
11
1
2
11
1
22
2
22
2
2
1
21

1
21
2
,
exp .
()
=

×−

()























+

























πσ σ ρ

ρ
μ
σ
ρ
μ
σ
μ
σ
μ
σ
We saw that the marginal p.d.f.’s f
1
, f
2
are N(
μ
1
,
σ
2
1
), N(
μ
2
,
σ
2
2
), respectively;
that is, X

1
, X
2
are also normally distributed. Furthermore, in the process of
proving that f(x
1
, x
2
) is a p.d.f., we rewrote it as follows:
fx x
xxb
12
12
2
11
2
1
2
2
2
2
2
2
1
21
2
21
, exp exp ,
()
=




()










⋅−

()


















πσ σ ρ
μ
σ
σρ
where
bx=+ −
()
μρ
σ
σ
μ
2
2
1
11
.
Hence
fx x
fx x
fx
xb
21
12
11
2
2

2
2
2
2
2
1
21
21
()
=
()
()
=



()


















,
exp
πσ ρ
σρ
which is the p.d.f. of an N(b,
σ
2
2
(1 −
ρ
2
)) r.v. Similarly f(x
1
|x
2
) is seen to be the
p.d.f. of an N(b′,
σ
2
1
(1 −
ρ
2
)) r.v., where

=+ −

()
bx
μρ
σ
σ
μ
1
1
2
22
.
Exercises
4.2.1 Refer to Exercise 3.2.17 in Chapter 3 and:
i) Find the marginal p.d.f.’s of the r.v.’s X
j
, j = 1, · · · , 6;
ii) Calculate the probability that X
1
≥ 5.
4.2.2 Refer to Exercise 3.2.18 in Chapter 3 and determine:
ii) The marginal p.d.f. of each one of X
1
, X
2
, X
3
;
ii) The conditional p.d.f. of X
1
, X

2
, given X
3
; X
1
, X
3
, given X
2
; X
2
, X
3
, given
X
1
;
Exercises 97
98 4 Distribution Functions, Probability Densities, and Their Relationship
iii) The conditional p.d.f. of X
1
, given X
2
, X
3
; X
2
, given X
3
, X

1
; X
3
, given X
1
,
X
2
.
If n = 20, provide expressions for the following probabilities:
iv) P(3X
1
+ X
2
≤ 5);
v) P(X
1
< X
2
< X
3
);
vi) P(X
1
+ X
2
= 10|X
3
= 5);
vii) P(3 ≤ X

1
≤ 10|X
2
= X
3
);
viii) P(X
1
< 3X
2
|X
1
> X
3
).
4.2.3 Let X, Y be r.v.’s jointly distributed with p.d.f. f given by f(x, y) = 2/c
2
if 0 ≤ x ≤ y, 0 ≤ y ≤ c and 0 otherwise.
i) Determine the constant c;
ii) Find the marginal p.d.f.’s of X and Y;
iii) Find the conditional p.d.f. of X, given Y, and the conditional p.d.f. of Y,
given X;
iv) Calculate the probability that X ≤ 1.
4.2.4 Let the r.v.’s X, Y be jointly distributed with p.d.f. f given by f(x, y) =
e
−x−y
I
(0,∞)×(0,∞)
(x, y). Compute the following probabilities:
i) P(X ≤ x);

ii) P(Y ≤ y);
iii) P(X < Y);
iv) P(X + Y ≤ 3).
4.2.5 If the joint p.d.f. f of the r.v.’s X
j
, j = 1, 2, 3, is given by
fxxx ce Ixxx
cx x x
A123
3
123
12 3
,, ,, ,
()
=
()
−++
()
where
A =∞
()
×∞
()
×∞
()
000,,,,
i) Determine the constant c;
ii) Find the marginal p.d.f. of each one of the r.v.’s X
j
, j = 1, 2, 3;

iii) Find the conditional (joint) p.d.f. of X
1
, X
2
, given X
3
, and the conditional
p.d.f. of X
1
, given X
2
, X
3
;
iv) Find the conditional d.f.’s corresponding to the conditional p.d.f.’s in (iii).
4.2.6 Consider the function given below:
fxy
ye
x
xy
x
y
()
=
=
⋅⋅⋅








!
,,,;
,
01 0
0 otherwise.
4.1 The Cumulative Distribution Function 99
i) Show that for each fixed y, f(·|y) is a p.d.f., the conditional p.d.f. of an r.v.
X, given that another r.v. Y equals y;
ii) If the marginal p.d.f. of Y is Negative Exponential with parameter
λ
= 1,
what is the joint p.d.f. of X, Y?
iii) Show that the marginal p.d.f. of X is given by f(x) = (
1
2
)
x+1
I
A
(x), where
A = {0, 1, 2, . . . }.
4.2.7 Let Y be an r.v. distributed as P(
λ
) and suppose that the conditional
distribution of the r.v. X, given Y = n, is B(n, p). Determine the p.d.f. of X and
the conditional p.d.f. of Y, given X = x.
4.2.8 Consider the function f defined as follows:

fx x
xx
e
xxI x x
12
1
2
2
2
1
3
2
3
11 11
12
1
22
1
4
, exp ,
,,
()
=−
+







+
()

[]
×−
[]
ππ
and show that:
i) f is a non-Normal Bivariate p.d.f.
ii) Both marginal p.d.f.’s
fx fx x dx
11 1 2 2
()
=
()
−∞


,
and
fx fx xdx
22 1 2 1
()
=
()
−∞


,
are Normal p.d.f.’s.

4.3 Quantiles and Modes of a Distribution
Let X be an r.v. with d.f. F and consider a number p such that 0 < p < 1. A
pth quantile of the r.v. X, or of its d.f. F, is a number denoted by x
p
and having
the following property: P(X ≤ x
p
) ≥ p and P(X ≥ x
p
) ≥ 1 − p. For p = 0.25 we
get a quartile of X, or its d.f., and for p = 0.5 we get a median of X, or its
d.f. For illustrative purposes, consider the following simple examples.
Let X be an r.v. distributed as U(0, 1) and let p = 0.10, 0.20, 0.30, 0.40, 0.50,
0.60, 0.70, 0.80 and 0.90. Determine the respective x
0.10
, x
0.20
, x
0.30
, x
0.40
, x
0.50
, x
0.60
,
x
0.70
, x
0.80

, and x
0.90
.
Since for 0 ≤ x ≤ 1, F(x) = x, we get: x
0.10
= 0.10, x
0.20
= 0.20, x
0.30
= 0.30,
x
0.40
= 0.40, x
0.50
= 0.50, x
0.60
= 0.60, x
0.70
= 0.70, x
0.80
= 0.80, and x
0.90
= 0.90.
Let X be an r.v. distributed as N(0, 1) and let p = 0.10, 0.20, 0.30, 0.40, 0.50,
0.60, 0.70, 0.80 and 0.90. Determine the respective x
0.10
, x
0.20
, x
0.30

, x
0.40
, x
0.50
, x
0.60
,
x
0.70
, x
0.80
, and x
0.90
.
EXAMPLE 4
EXAMPLE 3
4.3 Quantiles and Modes of a Distribution 99
100 4 Distribution Functions, Probability Densities, and Their Relationship
1
Typical cases:
F(x)
0
(a)
x
x
p
F(x)
p
p
0

][
(b)
x
x
p
F(x)
1
p
0
][
(e)
x
x
p
F(x)
p
0
(c)
x
x
p
F(x)
1
p
0
(d)
x
x
p
Figure 4.3 Observe that the figures demonstrate that, as defined,

x
p
need not be unique.
From the Normal Tables (Table 3 in Appendix III), by linear interpolation
and symmetry, we find: x
0.10
=−1.282, x
0.20
=−0.842, x
0.30
=−0.524, x
0.40
=−0.253,
x
0.50
= 0, x
0.60
= 0.253, x
0.70
= 0.524, x
0.80
= 0.842, and x
0.90
= 1.282.
Knowledge of quantiles x
p
for several values of p provides an indication as
to how the unit probability mass is distributed over the real line. In Fig. 4.3
various cases are demonstrated for determining graphically the pth quantile of
a d.f.

Let X be an r.v. with a p.d.f. f. Then a mode of f, if it exists, is any number
which maximizes f(x). In case f is a p.d.f. which is twice differentiable, a mode
can be found by differentiation. This process breaks down in the discrete cases.
The following theorems answer the question for two important discrete cases.
4.1 The Cumulative Distribution Function 101
Let X be B(n, p); that is,
fx
n
x
pq p q p x n
xnx
()
=






<< =− =
⋅⋅⋅

,,,,,,.01 1 01
Consider the number (n + 1)p and set m = [(n + 1)p], where [y] denotes the
largest integer which is ≤ y. Then if (n + 1)p is not an integer, f(x) has a unique
mode at x = m. If (n + 1)p is an integer, then f(x) has two modes obtained for
x = m and x = m − 1.
PROOF
For x ≥ 1, we have
fx

fx
n
x
pq
n
x
pq
n
xn x
pq
n
xnx
pq
nx
x
p
q
xnx
xnx
xnx
xnx
()

()
=














=

()

()
−+
()
=
−+


−−+

−−+
1
1
11
1
11
11
!
!!

!
!!
.
That is,
fx
fx
nx
x
p
q
()

()
=
−+

1
1
.
Hence f(x) > f(x − 1) ( f is increasing) if and only if
nx px p npxppxxp n px−+
()
>−
()
−+>− +
()
>11 1,,.or or
Thus if (n + 1)p is not an integer, f(x) keeps increasing for x ≤ m and then
decreases so the maximum occurs at x = m. If (n + 1)p is an integer, then the
maximum occurs at x = (n + 1)p, where f(x) = f(x − 1) (from above calcula-

tions). Thus
xn p=+
()
−11
is a second point which gives the maximum value. ᭡
Let X be P(
λ
); that is,
fx e
x
x
x
()
==
⋅⋅⋅
>

λ
λ
λ
!
,,,,,.012 0
Then if
λ
is not an integer, f(x) has a unique mode at x = [
λ
]. If
λ
is an integer,
then f(x) has two modes obtained for x =

λ
and x =
λ
− 1.
PROOF
For x ≥ 1, we have
THEOREM 5
4.3 Quantiles and Modes of a Distribution 101
THEOREM 6
102 4 Distribution Functions, Probability Densities, and Their Relationship
fx
fx
ex
ex
x
x
x
()

()
=
()

()
[]
=

−−
1
1

1
λ
λ
λ
λ
λ
!
!
.
Hence f(x) > f(x − 1) if and only if
λ
> x. Thus if
λ
is not an integer, f(x) keeps
increasing for x ≤ [
λ
] and then decreases. Then the maximum of f(x) occurs
at x = [
λ
]. If
λ
is an integer, then the maximum occurs at x =
λ
. But in this case
f(x) = f(x − 1) which implies that x =
λ
− 1 is a second point which gives
the maximum value to the p.d.f. ᭡
Exercises
4.3.1 Determine the pth quantile x

p
for each one of the p.d.f.’s given in
Exercises 3.2.13–15, 3.3.13–16 (Exercise 3.2.14 for
α
=
1
4
) in Chapter 3 if p =
0.75, 0.50.
4.3.2 Let X be an r.v. with p.d.f. f symmetric about a constant c (that is,
f(c − x) = f(c + x) for all x ∈
ޒ
). Then show that c is a median of f.
4.3.3 Draw four graphs—two each for B(n, p) and P(
λ
)—which represent
the possible occurrences for modes of the distributions B(n, p) and P(
λ
).
4.3.4 Consider the same p.d.f.’s mentioned in Exercise 4.3.1 from the point
of view of a mode.
4.4* Justification of Statements 1 and 2
In this section, a rigorous justification of Statements 1 and 2 made in Section
4.1 will be presented. For this purpose, some preliminary concepts and results
are needed and will be also discussed.
A set G in
ޒ
is called open if for every x in G there exists an open interval
containing x and contained in G. Without loss of generality, such intervals may
be taken to be centered at x.

It follows from this definition that an open interval is an open set, the
entire real line
ޒ
is an open set, and so is the empty set (in a vacuous manner).
Every open set in
ޒ
is measurable.
PROOF
Let G be an open set in
ޒ
, and for each x ∈ G, consider an open
interval centered at x and contained in G. Clearly, the union over x, as x varies
in G, of such intervals is equal to G. The same is true if we consider only those
intervals corresponding to all rationals x in G. These intervals are countably
many and each one of them is measurable; then so is their union.

LEMMA 1
DEFINITION 1
4.1 The Cumulative Distribution Function 103
A set G in
ޒ
m
, m ≥ 1, is called open if for every x in G there exists an open cube
in
ޒ
m
containing x and contained in G; by the term open “cube” we mean the
Cartesian product of m open intervals of equal length. Without loss of gener-
ality, such cubes may be taken to be centered at x.
Every open set in

ޒ
n
is measurable.
PROOF
It is analogous to that of Lemma 1. Indeed, let G be an open set in
ޒ
m
, and for each x ∈ G, consider an open cube centered at x and contained in
G. The union over x, as x varies in G, of such cubes clearly is equal to G. The
same is true if we restrict ourselves to x’s in G whose m coordinates are
rationals. Then the resulting cubes are countably many, and therefore their
union is measurable, since so is each cube.

Recall that a function g: S ⊆
ޒ

ޒ
is said to be continuous at x
0
∈ S if for
every
ε
> 0 there exists a
δ
=
δ
(
ε
, x
0

) > 0 such that |x − x
0
| <
ε
implies |g(x) − g(x
0
)|
<
δ
. The function g is continuous in S if it is continuous for every x ∈ S.
It follows from the concept of continuity that
ε
→ 0 implies
δ
→ 0.
Let g:
ޒ

ޒ
be continuous. Then g is measurable.
PROOF
By Theorem 5 in Chapter 1 it suffices to show that g
−1
(G) are meas-
urable sets for all open intevals G in
ޒ
. Set B = g
−1
(G). Thus if B =∅, the
assertion is valid, so let B ≠∅ and let x

0
be an arbitrary point of B, so that g(x
0
)
∈ G. Continuity of g at x
0
implies that for every
ε
> 0 there exists
δ
=
δ
(
ε
, x
0
)
> 0 such that |x − x
0
| <
ε
implies |g(x) − g(x
0
)| <
δ
. Equivalently, x ∈ (x
0

ε
, x

0
+
ε
) implies g(x) ∈ (g(x
0
) −
δ
, g(x
0
) +
δ
). Since g(x
0
) ∈ G and G is open, by
choosing
ε
sufficiently small, we can make
δ
so small that (g(x
0
) −
δ
, g(x
0
) +
δ
)
is contained in G. Thus, for such a choice of
ε
and

δ
, x ∈ (x
0

ε
, x
0
+
ε
) implies
that (g(x
0
) −
δ
, g(x
0
) +
δ
) ⊂ G. But B(= g
−1
(G)) is the set of all x ∈
ޒ
for which
g(x) ∈ G. As all x ∈ (x
0

ε
, x
0
+

ε
) have this property, it follows that (x
0

ε
, x
0
+
ε
) ⊂ B. Since x
0
is arbitrary in B, it follows that B is open. Then by Lemma
1, it is measurable. ᭡
The concept of continuity generalizes, of course, to Euclidean spaces of
higher dimensions, and then a result analogous to the one in Lemma 3 also
holds true.
A function g : S ⊆
ޒ
k

ޒ
m
(k, m ≥ 1) is said to be continuous at x
0

ޒ
k
if for
every
ε

> 0 there exists a
δ
=
δ
(
ε
, x
0
) > 0 such that ||x − x
0
|| <
ε
implies ||g(x) −
g(x
0
)|| <
δ
. The function g is continuous in S if it is continuous for every x ∈ S.
Here ||x|| stands for the usual norm in
ޒ
k
; i.e., for x = (x
1
, , x
k
)′, ||x|| =
x
i
i
k

2
1
12
=

()
, and similarly for the other quantities.
Once again, from the concept of continuity it follows that
ε
→ 0 implies
δ
→ 0.
Let g:
ޒ
k

ޒ
m
be continuous. Then g is measurable.
DEFINITION 2
LEMMA 4
4.4* Justification of Statements 1 and 2 103
LEMMA 2
DEFINITION 3
LEMMA 3
DEFINITION 4
104 4 Distribution Functions, Probability Densities, and Their Relationship
PROOF
The proof is similar to that of Lemma 3. The details are presented
here for the sake of completeness. Once again, it suffices to show that g

−1
(G)
are measurable sets for all open cubes G in
ޒ
m
. Set B = g
−1
(G). If B =∅ the
assertion is true, and therefore suppose that B ≠∅ and let x
0
be an arbitrary
point of B. Continuity of g at x
0
implies that for every
ε
> 0 there exists a
δ
=
δ
(
ε
, x
0
) > 0 such that ||x − x
0
|| <
ε
implies ||g(x) − g(x
0
)|| <

δ
; equivalently, x ∈
S(x
0
,
ε
) implies g(x) ∈ S(g(x
0
),
δ
), where S(c, r) stands for the open sphere with
center c and radius r. Since g(x
0
) ∈ G and G is open, we can choose
ε
so small
that the corresponding
δ
is sufficiently small to imply that g(x) ∈ S(g(x
0
),
δ
).
Thus, for such a choice of
ε
and
δ
, x ∈ S(x
0
,

ε
) implies that g(x) ∈ S(g(x
0
),
δ
).
Since B(= g
−1
(G)) is the set of all x ∈
ޒ
k
for which g(x) ∈ G, and x ∈ S(x
0
,
ε
)
implies that g(x) ∈ S(g(x
0
),
δ
), it follows that S(x
0
,
ε
) ⊂ B. At this point, observe
that it is clear that there is a cube containing x
0
and contained in S(x
0
,

ε
); call
it C(x
0
,
ε
). Then C(x
0
,
ε
) ⊂ B, and therefore B is open. By Lemma 2, it is also
measurable. ᭡
We may now proceed with the justification of Statement 1.
Let X : (S, A) → (
ޒ
k
, B
k
) be a random vector, and let g : (
ޒ
k
, B
k
) → (
ޒ
m
, B
m
)
be measurable. Then g(X): (S, A) → (

ޒ
m
, B
m
) and is a random vector. (That
is, measurable functions of random vectors are random vectors.)
PROOF
To prove that [g(X)]
−1
(B) ∈ A if B ∈ B
m
, we have

g B gB B B gB
k
XX X
()
[]
()
=
()
[]
=
()
=
()


−− − −
1

11 1
11
1
, where B
by the measurability of g. Also, X
−1
(B
1
) ∈ A since X is measurable. The proof
is completed. ᭡
To this theorem, we have the following
Let X be as above and g be continuous. Then g(X) is a random vector. (That
is, continuous functions of random vectors are random vectors.)
PROOF
The continuity of g implies its measurability by Lemma 3, and there-
fore the theorem applies and gives the result. ᭡
For j = 1, , k, the jth projection function g
j
is defined by: g
j
:
ޒ
k

ޒ
and
g
j
(x) = g
j

(x
1
, , x
k
) = x
j
.
It so happens that projection functions are continuous; that is,
The coordinate functions g
j
, j = 1, , k, as defined above, are continuous.
PROOF
For an arbitrary point x
0
in
ޒ
K
, consider x ∈
ޒ
K
such that ||x − x
0
|| <
ε
for some
ε
> 0. This is equivalent to ||x − x
0
||
2

<
ε
2
or
xx
j
j
k
j

()

<
=
0
2
1
2
ε
which
implies that (x
j
− x
0j
)
2
<
ε
2
for j = 1, , k, or |x

j
− x
0j
| <
ε
, j = 1, . . . , k. This last
expression is equivalent to |g
j
(x) − g
j
(x
0
)| <
ε
, j = 1, , k. Thus the definition
of continuity of g
j
is satisfied here for
δ
=
ε
. ᭡
Now consider a k-dimensional function X defined on the sample space S.
Then X may be written as X = (X
1
, , X
k
)′, where X
j
, j = 1, , k are real-

valued functions. The question then arises as to how X and X
j
, j = 1, , k are
THEOREM 7
LEMMA 5
DEFINITION 5
COROLLARY
4.1 The Cumulative Distribution Function 105
related from a measurability point of view. To this effect, we have the follow-
ing result.
Let X = (X
1
, , X
k
)′:(S, A) → (
ޒ
k
, B
k
). Then X is an r. vector if and only if
X
j
, j = 1, , k are r.v.’s.
PROOF
Suppose X is an r. vector and let g
j
, j = 1, , k be the coordinate
functions defined on
ޒ
k

. Then g
j
’s are continuous by Lemma 5 and therefore
measurable by Lemma 4. Then for each j = 1, . . . , k, g
j
(X) = g
j
(X
1
, , X
k
) =
X
j
is measurable and hence an r.v.
Next, assume that X
j
, j = 1, , k are r.v.’s. To show that X is an r. vector,
by special case 3 in Section 2 of Chapter 1, it suffices to show that X
−1
(B) ∈ A
for each B = (−∞, x
1
] × ···× (−∞, x
k
], x
1
, , x
k


ޒ
. Indeed,

XX
− −
=
()
=∈
()
=∈−∞
(
]
=
⋅⋅⋅
()
=−∞
(
]
()

1 1
1
1BBX xj kXx
jj jj
j
k
,, , , , .
I
A
The proof is completed. ᭡

Exercises
4.4.1 If X and Y are functions defined on the sample space S into the real line
ޒ
, show that:

s S Xs Ys x s Xs r s Ys x r
rQ

()
+
()
<
{}
=∈
()
<
{}
∩∈
()
<−
{}
[]

;;;, SS
U
where Q is the set of rationals in
ޒ
.
4.4.2 Use Exercise 4.4.1 in order to show that, if X and Y are r.v.’s, then so
is the function X + Y.

4.4.3
ii) If X is an r.v., then show that so is the function −X.
ii) Use part (i) and Exercise 4.4.2 to show that, if X and Y are r.v.’s, then so is
the function X − Y.
4.4.4
ii) If X is an r.v., then show that so is the function X
2
.
ii) Use the identity: XY =
1
2
(X + Y)
2

1
2
(X
2
+ Y
2
) in conjunction with part (i)
and Exercises 4.4.2 and 4.4.3(ii) to show that, if X and Y are r.v.’s, then so
is the function XY.
4.4.5
ii) If X is an r.v., then show that so is the function
1
X
, provided X ≠ 0.
ii) Use part (i) in conjunction with Exercise 4.4.4(ii) to show that, if X and Y
are r.v.’s, then so is the function

X
Y
, provided Y ≠ 0.
THEOREM 8
Exercises 105
106 5 Moments of Random Variables—Some Moment and Probability Inequalities
106
5.1 Moments of Random Variables
In the definitions to be given shortly, the following remark will prove useful.
REMARK 1
We say that the (infinite) series ∑
x
h(x), where x = (x
1
, , x
k
)′
varies over a discrete set in
ޒ
k
, k ≥ 1, converges absolutely if ∑
x
|h(x)| <∞. Also
we say that the integral
⋅⋅⋅ ⋅⋅⋅
()
∫∫
⋅⋅⋅
−∞


−∞

hx x dx dx
kk11
,,
converges absolutely
if
⋅⋅⋅
⋅⋅⋅
()
⋅⋅⋅ <∞
−∞

−∞

∫∫
h x x x dx dx dx
kk12 12
,, , .
In what follows, when we write (infinite) series or integrals it will always be
assumed that they converge absolutely. In this case, we say that the moments to
be defined below exist.
Let X = (X
1
, , X
k
)′ be an r. vector with p.d.f. f and consider the (measurable)
function g:
ޒ
k


ޒ
, so that g(X) = g(X
1
, , X
k
) is an r.v. Then we give the
iii) For n = 1, 2, . . . , the nth moment of g(X) is denoted by E[g(X)]
n
and is
defined by:
Eg
gf x x
g x x f x x dx dx
n
n
k
k
n
kk
X
xxx
x
()
[]
=
()
[]
()
=

⋅⋅⋅
()

⋅⋅⋅
⋅⋅⋅
()
[]
⋅⋅⋅
()
⋅⋅⋅








∫∫
−∞

−∞

,,,
,, ,, .
1
111
For n = 1, we get
Eg
gf

g x x f x x dx dx
kkk
X
xx
x
()
[]
=
()()
⋅⋅⋅
⋅⋅⋅
()
⋅⋅⋅
()
⋅⋅⋅






∫∫
−∞

−∞

111
,, ,,
Chapter 5
Moments of Random Variables—Some

Moment and Probability Inequalities
DEFINITION 1
5.1 Moments of Random Variables 107
and call it the mathematical expectation or mean value or just mean of g(X).
Another notation for E[g(X)] which is often used is
μ
g(X)
, or
μ
[g(X)], or just
μ
, if no confusion is possible.
iii) For r > 0, the rth absolute moment of g(X) is denoted by E|g(X)|
r
and is
defined by:
Eg
gf x x
g x x f x x dx dx
r
r
k
k
r
kk
X
xxx
x
()
=

() ()
=
⋅⋅⋅
()

⋅⋅⋅
⋅⋅⋅
()
⋅⋅⋅
()
⋅⋅⋅








∫∫
−∞

−∞

,,,
,, ,, .
1
111
iii) For an arbitrary constant c, and n and r as above, the nth moment and rth
absolute moment of g(X) about c are denoted by E[g(X) − c]

n
, E|g(X) − c|
r
,
respectively, and are defined as follows:
Eg c
gcf x x
gx x c fx x dx dx
n
n
k
k
n
kk
X
xxx
x
()

[]
=
()

[]
()
=
⋅⋅⋅
()

⋅⋅⋅

⋅⋅⋅
()

[]
⋅⋅⋅
()
⋅⋅⋅








∫∫
−∞

−∞

,,,
,, ,, ,
1
111
and
Eg c
gcf x x
gx x c f x x dx dx
r
r

k
k
r
kk
X
xxx
x
()
−=
()

()
=
⋅⋅⋅
()

⋅⋅⋅
⋅⋅⋅
()

⋅⋅⋅
()
⋅⋅⋅









∫∫
−∞

−∞

,,,
,, ,, .
1
111
For c = E[g(X)], the moments are called central moments. The 2nd central
moment of g(X), that is,
Eg Eg
gEg f x x
g x x Eg f x x dx dx
k
kkk
XX
xXxx
X
x
()

()
[]
{}
=
()

()

[]
()
=
⋅⋅⋅
()

⋅⋅⋅
⋅⋅⋅
()

()
[]
⋅⋅⋅
()
⋅⋅⋅








∫∫
−∞

−∞

2
2

1
1
2
11
,,,
,, ,,
is called the variance of g(X) and is also denoted by
σ
2
[g(X)], or
σ
g X
()
2
, or just
σ
2
, if no confusion is possible. The quantity
+
()
[]
=
()
[]
σσ
2
ggXX
is called the
standard deviation (s.d.) of g(X) and is also denoted by
σ

g(X)
, or just
σ
, if no
confusion is possible. The variance of an r.v. is referred to as the moment of
inertia in Mechanics.
5.1.1 Important Special Cases
1. Let g(X
1
, , X
k
) =
XX
n
k
n
k
1
1
⋅⋅⋅
, where n
j
≥ 0 are integers. Then
E(
XX
n
k
n
k
1

1
⋅⋅⋅
) is called the (n
1
, , n
k
)-joint moment of X
1
, , X
k
. In par-
ticular, for n
1
= ···= n
j−1
= n
j+1
= ···= n
k
= 0, n
j
= n, we get
108 5 Moments of Random Variables—Some Moment and Probability Inequalities
EX
xf xfx x
x f x x dx dx
xf x
xf x dx
j
n

j
n
j
n
k
xx
j
n
kk
j
n
jj
x
j
n
jj j
k
j
()
=
()
=
⋅⋅⋅
()
⋅⋅⋅
⋅⋅⋅
()
⋅⋅⋅








=
()
()





∑∑
∫∫


⋅⋅⋅
()

−∞

−∞

−∞

x
x
1
11

1
,,
,,
,,
which is the nth moment of the r.v. X
j
. Thus the nth moment of an r.v. X with
p.d.f. f is
EX
xfx
xfxdx
n
n
x
n
()
=
()
()







−∞

.
For n = 1, we get

EX
xf x
xf x dx
x
()
=
()
()







−∞

which is the mathematical expectation or mean value or just mean of X. This
quantity is also denoted by
μ
X
or
μ
(X) or just
μ
when no confusion is possible.
The quantity
μ
X
can be interpreted as follows: It follows from the defini-

tion that if X is a discrete uniform r.v., then
μ
X
is just the arithmetic average of
the possible outcomes of X. Also, if one recalls from physics or elementary
calculus the definition of center of gravity and its physical interpretation as the
point of balance of the distributed mass, the interpretation of
μ
X
as the mean
or expected value of the random variable is the natural one, provided the
probability distribution of X is interpreted as the unit mass distribution.
REMARK 2
In Definition 1, suppose X is a continuous r.v. Then E[g(X)] =
gxfxdx
()

()
−∞

. On the other hand, from the last expression above, E(X)=
xf x dx
−∞


()
.
There seems to be a discrepancy between these two definitions.
More specifically, in the definition of E[g(X)], one would expect to use the
p.d.f. of g(X) rather than that of X. Actually, the definition of E[g(X)], as

given, is correct and its justification is roughly as follows: Consider E[g(x)] =
gxfxdx
()

()
−∞

and set y = g(x). Suppose that g is differentiable and has an
inverse g
−1
, and that some further conditions are met. Then
g x f x dx yf g y
d
dy
gydy
()()
=
()
[]
()
−∞

−−
−∞

∫∫
11
.
5.1 Moments of Random Variables 109
On the other hand, if f

Y
is the p.d.f. of Y, then
fy fg y g y
Y
d
dy
()
=
()
[]
()
−−11
.
Therefore the last integral above is equal to
yf y dy
Y
()

−∞

,
which is consonant
with the definition of
E X xf x dx
()
=
()

−∞


.
(A justification of the above deriva-
tions is given in Theorem 2 of Chapter 9.)
2. For g as above, that is, g(X
1
, , X
k
) =
XX
n
k
n
k
1
1
⋅⋅⋅
and n
1
= ···= n
j−1
=
n
j+1
= ···= n
k
= 0, n
j
= 1, and c = E(X
j
), we get

EX EX
xEX f x x
x EX f x x dx dx
xEX fx
xEX fx
jj
n
jj
n
k
jj
n
kk
jj
n
jj
x
jj
n
jj
j

()
=

()()
=
⋅⋅⋅
()


⋅⋅⋅ −
()
⋅⋅⋅
()
⋅⋅⋅







=

()()

()()

∫∫

−∞

−∞

−∞
xx
x
,,,
,,
1

11
∞∞








dx
j
which is the nth central moment of the r.v. X
j
(or the nth moment of X
j
about its
mean).
Thus the nth central moment of an r.v. X with p.d.f. f and mean
μ
is
EX EX EX
xEXfx x fx
xEXfxdx x fxdx
nn
n
x
n
x
nn


()
=−
()
=

()()
=−
()()

()()
=−
()()





∑∑
∫∫
−∞

−∞

μ
μ
μ
.
In particular, for n = 2 the 2nd central moment of X is denoted by
σ

X
2
or
σ
2
(X)
or just
σ
2
when no confusion is possible, and is called the variance of X. Its
positive square root
σ
X
or
σ
(X) or just
σ
is called the standard deviation (s.d.)
of X.
As in the case of
μ
X
,
σ
X
2
has a physical interpretation also. Its definition
corresponds to that of the second moment, or moment of inertia. One recalls
that a large moment of inertia means the mass of the body is spread widely
about its center of gravity. Likewise a large variance corresponds to a probabil-

ity distribution which is not well concentrated about its mean value.
3. For g(X
1
, , X
k
) = (X
1
− EX
1
)
n
1
···(X
k
− EX
k
)
n
k
, the quantity
E X EX X EX
n
kk
n
k
11
1

()
⋅⋅⋅ −

()






is the (n
1
, , n
k
)-central joint moment of X
1
, , X
k
or the (n
1
, , n
k
)-joint
moment of X
1
, , X
k
about their means.
4. For g(X
1
, , X
k
) = X

j
(X
j
− 1) · · · (X
j
− n + 1), j = 1, , k, the quantity
EX X X n
xx x n fx
xx x n fxdx
jj j
jj j jj
x
jj j jj j
j

()
⋅⋅⋅ − +
()
[]
=

()
⋅⋅⋅ − +
()()

()
⋅⋅⋅ − +
()()








−∞

11
11
11
110 5 Moments of Random Variables—Some Moment and Probability Inequalities
is the nth factorial moment of the r.v. X
j
. Thus the nth factorial moment of an
r.v. X with p.d.f. f is
EX X X n
xx x n fx
xx x n fxdx
x

()
⋅⋅⋅ − +
()
[]
=

()
⋅⋅⋅ − +
()()


()
⋅⋅⋅ − +
()()







−∞

11
11
11.
5.1.2 Basic Properties of the Expectation of an R.V.
From the very definition of E[g(X)], the following properties are immediate.
(E1) E(c) = c, where c is a constant.
(E2) E[cg(X)] = cE[g(X)], and, in particular, E(cX) = cE(X) if X is an
r.v.
(E3) E[g(X) + d] = E[g(X)] + d, where d is a constant. In particular,
E(X + d) = E(X) + d if X is an r.v.
(E4) Combining (E2) and (E3), we get E[cg(X) + d] = cE[g(X)] + d,
and, in particular, E(cX + d) = cE(X) + d if X is an r.v.
(E4′)
Ecg cEg
jj
j
n
j

j
n
j
XX
()

[]
=

()
[]
==11
.
In fact, for example, in the continuous case, we have
E c g c g x x f x x dx dx
cgxxfx
jj
j
n
jj k
j
n
kk
j
j
n
jk
X
()









= ⋅⋅⋅
⋅⋅⋅
()








⋅⋅⋅
()
⋅⋅⋅
= ⋅⋅⋅
⋅⋅⋅
()
⋅⋅⋅
==
−∞

−∞


=
−∞

∑∑
∫∫


1
1
1
11
1
11
,, ,,
,, ,,xxdx dx
cEg
kk
jj
j
n
()
⋅⋅⋅
=
()
[]
−∞

=



1
1
X .
The discrete case follows similarly. In particular,
(E4″)
EcX cEX
jj
j
n
jj
j
n
==

()
=
()

11
.
(E5) If X ≥ 0, then E(X) ≥ 0.
Consequently, by means of (E5) and (E4″), we get that
(E5′) If X ≥ Y, then E(X) ≥ E(Y), where X and Y are r.v.’s (with finite
expectations).
(E6) |E[g(X)]| ≤ E|g(X)|.
(E7) If E|X|
r
<∞ for some r > 0, where X is an r.v., then E|X|
r′
<∞ for

all 0 < r′<r.
This is a consequence of the obvious inequality |X|
r′
≤ 1 + |X|
r
and (E5′).
Furthermore, since of n = 1, 2, . . . , we have |X
n
| = |X|
n
, by means of (E6),
it follows that
5.1 Moments of Random Variables 111
(E7′) If E(X
n
) exists (that is, E|X|
n
<∞) for some n = 2, 3, . . . , then E(X
n

)
also exists for all n′= 1, 2, . . . with n′<n.
5.1.3 Basic Properties of the Variance of an R.V.
Regarding the variance, the following properties are easily established by
means of the definition of the variance.
(V1)
σ
2
(c) = 0, where c is a constant.
(V2)

σ
2
[cg(X)] = c
2
σ
2
[g(X)], and, in particular,
σ
2
(cX) = c
2
σ
2
(X), if X is
an r.v.
(V3)
σ
2
[g(X) + d] =
σ
2
[g(X)], where d is a constant. In particular,
σ
2
(X + d) =
σ
2
(X), if X is an r.v.
In fact,
σ

σ
2
2
2
2
gdEgdEgd
E g Eg g
XXX
XX X
()
+
[]
=
()
+
[]

()
+
[]
{}
=
()

()
[]
=
()
[]
.

(V4) Combining (V2) and (V3), we get
σ
2
[cg(X) + d] = c
2
σ
2
[g(X)],
and, in particular,
σ
2
(cX + d) = c
2
σ
2
(X), if X is an r.v.
(V5)
σ
2
[g(X)] = E[g(X)]
2
− [Eg(X)]
2
, and, in particular,
(V5′)
σ
2
(X) = E(X
2
) − (EX)

2
, if X is an r.v.
In fact,
σ
2
22 2
22222
2
2
g E g Eg E g g Eg Eg
E g Eg Eg E g Eg
XXX XXXX
XXXXX
()
[]
=
()

()
[]
=
()
[]

() ()
+
()
[]







=
()
[]

()
[]
+
()
[]
=
()
[]

()
[]
,
the equality before the last one being true because of (E4′).
(V6)
σ
2
(X) = E[X(X − 1)] + EX − (EX)
2
, if X is an r.v., as is easily seen.
This formula is especially useful in calculating the variance of a
discrete r.v., as is seen below.
Exercises

5.1.1 Verify the details of properties (E1)–(E7).
5.1.2 Verify the details of properties (V1)–(V5).
5.1.3 For r′<r, show that |X|
r′
≤ 1 + |X|
r
and conclude that if E|X|
r
<∞, then
E|X|
r′
for all 0 < r′<r.
Exercises 111
112 5 Moments of Random Variables—Some Moment and Probability Inequalities
5.1.4 Verify the equality
([( )]) () () ()EgX gx f xdx yf ydy
XY
==
∫∫
−∞

−∞

for the
case that X ∼ N(0, 1) and Y = g(X) = X
2
.
5.1.5 For any event A, consider the r.v. X = I
A
, the indicator of A defined by

I
A
(s) = 1 for s ∈ A and I
A
(s) = 0 for s ∈ A
c
, and calculate EX
r
, r > 0, and also
σ
2
(X).
5.1.6 Let X be an r.v. such that
PX c PX c=−
()
==
()
=
1
2
.
Calculate EX,
σ
2
(X) and show that
P X EX c
X
c
−≤
()

=
()
σ
2
2
.
5.1.7 Let X be an r.v. with finite EX.
ii) For any constant c, show that E(X − c)
2
= E(X − EX)
2
+ (EX − c)
2
;
ii) Use part (i) to conclude that E(X − c)
2
is minimum for c = EX.
5.1.8 Let X be an r.v. such that EX
4
<∞. Then show that
ii) E(X − EX)
3
= EX
3
− 3(EX)(EX)
2
+ 2(EX)
3
;
ii) E(X − EX)

4
= EX
4
− 4(EX)(EX
3
) + 6(EX)
2
(EX
2
) − 3(EX)
4
.
5.1.9 If EX
4
<∞, show that:
EXX EX EX EX X X EX EX EX
EXX X X EX EX EX EX

()
[]
=− −
()

()
[]
=− +

()

()


()
[]
=− + −
11232
1 2 3 6 11 6
232
43 2
;;
.
(These relations provide a way of calculating EX
k
, k = 2, 3, 4 by means of
the factorial moments E[X(X − 1)], E[X(X − 1)(X − 2)], E[X(X − 1)(X − 2)
(X − 3)].)
5.1.10 Let X be the r.v. denoting the number of claims filed by a policy-
holder of an insurance company over a specified period of time. On the basis
of an extensive study of the claim records, it may be assumed that the distribu-
tion of X is as follows:
x 0123456
f(x) 0.304 0.287 0.208 0.115 0.061 0.019 0.006
iii) Calculate the EX and the
σ
2
(X);
iii) What premium should the company charge in order to break even?
iii) What should be the premium charged if the company is to expect to come
ahead by $M for administrative expenses and profit?
5.1 Moments of Random Variables 113
5.1.11 A roulette wheel has 38 slots of which 18 are red, 18 black, and 2

green.
iii) Suppose a gambler is placing a bet of $M on red. What is the gambler’s
expected gain or loss and what is the standard deviation?
iii) If the same bet of $M is placed on green and if $kM is the amount
the gambler wins, calculate the expected gain or loss and the standard
deviation.
iii) For what value of k do the two expectations in parts (i) and (ii) coincide?
iv) Does this value of k depend on M?
iv) How do the respective standard deviations compare?
5.1.12 Let X be an r.v. such that P(X = j) = (
1
2
)
j
, j = 1, 2, . . . .
ii) Compute EX, E[X(X − 1)];
ii) Use (i) in order to compute
σ
2
(X).
5.1.13 If X is an r.v. distributed as U(
α
,
β
), show that
EX X=
+
()
=


()
αβ
σ
αβ
212
2
2
,.
5.1.14 Let the r.v. X be distributed as U(
α
,
β
). Calculate EX
n
for any positive
integer n.
5.1.15 Let X be an r.v. with p.d.f. f symmetric about a constant c (that is,
f(c − x) = f(c + x) for every x).
ii) Then if EX exists, show that EX = c;
ii) If c = 0 and EX
2n+1
exists, show that EX
2n+1
= 0 (that is, those moments of X
of odd order which exist are all equal to zero).
5.1.16 Refer to Exercise 3.3.13(iv) in Chapter 3 and find the EX for those
α
’s
for which this expectation exists, where X is an r.v. having the distribution in
question.

5.1.17 Let X be an r.v. with p.d.f. given by
fx
x
c
Ix
cc
()
=
()

()
2
,
.
Compute EX
n
for any positive integer n, E|X
r
|, r > 0,
σ
2
(X).
5.1.18 Let X be an r.v. with finite expectation and d.f. F.
ii) Show that
EX F x dx F x dx=−
()
[]

()
−∞


∫∫
1
0
0
;
Exercises 113
114 5 Moments of Random Variables—Some Moment and Probability Inequalities
ii) Use the interpretation of the definite integral as an area in order to give a
geometric interpretation of EX.
5.1.19 Let X be an r.v. of the continuous type with finite EX and p.d.f. f.
ii) If m is a median of f and c is any constant, show that
EX c EX m c x f x dx
m
c
−= − + −
()()

2;
ii) Utilize (i) in order to conclude that E|X − c| is minimized for c = m. (Hint:
Consider the two cases that c ≥ m and c < m, and in each one split the
integral from −∞ to c and c to ∞ in order to remove the absolute value.
Then the fact that
fxdx fxdx
m
m
()

=
()


=
−∞

1
2
and simple manipulations
prove part (i). For part (ii), observe that
cxfxdx
m
c

()()


0
whether c ≥ m
or c < m.)
5.1.20 If the r.v. X is distributed according to the Weibull distribution (see
Exercise 4.1.15 in Chapter 4), then:
ii) Show that
EX EX=+






=+







ΓΓ1
1
1
2
1
2
2
β
α
β
α
ββ
,,
so that
σ
ββ
σ
β
22
2
1
2
1
1
X

()
=+






−+














ΓΓ ,
where recall that the Gamma function Γ is defined by Γ
γ
γ
()
=





tedt
t
1
0
,
γ
> 0;
ii) Determine the numerical values of EX and
σ
2
(X) for
α
= 1 and
β
=
1
2
,
β
= 1 and
β
= 2.
5.2 Expectations and Variances of Some r.v.’s
5.2.1 Discrete Case
1. Let X be B(n, p). Then E(X) = np,
σ
2

(X) = npq. In fact,
EX x
n
x
pq x
n
xn x
pq
nn
xnx
pq
np
n
xn x
pq
xnx xnx
x
n
x
n
xnx
x
n
x
n
()
=







=

()
=

()

()

()
=

()

()

()
−−
()
[]
−−
==

=


()

∑∑

!
!!
!
!!
!
!!
10
1
1
1
1
1
1
11 1
−−−
()
=

()

=




=

()


()

[]
=+
()
=
x
x
n
x
nx
x
n
n
np
n
xn x
p q np p q np
1
1
1
0
1
1
1
1
!
!!
.

×