Tải bản đầy đủ (.pdf) (63 trang)

Class Notes in Statistics and Econometrics Part 5 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (536.09 KB, 63 trang )

CHAPTER 9
Random Matrices
The step from random vectors to random matrices (and higher order random
arrays) is not as big as the step from individual random variables to random vectors.
We will first give a few quite trivial verifications that the expected value operator
is indeed a linear operator, and them make some not quite as trivial observations
about the expected values and higher moments of quadratic forms.
9.1. Linearity of Expected Values
Definition 9.1.1. Let Z be a random matrix with elements z
ij
. Then
E
[Z] is
the matrix with elements E[z
ij
].
245
246 9. RANDOM MATRICES
Theorem 9.1.2. If A, B, and C are constant matrices, then
E
[AZB + C] =
A
E
[Z]B + C.
Proof by multiplying out.
Theorem 9.1.3.
E
[Z

] = (
E


[Z])

;
E
[tr Z] = tr
E
[Z].
Theorem 9.1.4. For partitioned matrices
E
[

X
Y

] =

E
[X]
E
[Y ]

.
Special cases: If C is a constant, then
E
[C] = C,
E
[AX + BY ] = A
E
[X] +
B

E
[Y ], and
E
[a · X + b · Y ] = a ·
E
[X] + b ·
E
[Y ].
If X and Y are random matrices, then the covariance of these two matrices is
a four-way array containing the covariances of all elements of X with all elements
of Y . Certain conventions are necessary to arrange this four-way array in a two-
dimensional scheme that can be written on a sheet of paper. Before we develop
those, we will first define the covariance matrix for two random vectors.
Definition 9.1.5. The covariance matrix of two random vectors is defined as
(9.1.1)
C
[x, y] =
E
[(x −
E
[x])(y −
E
[y])

].
Theorem 9.1.6.
C
[x, y] =
E
[xy


] − (
E
[x])(
E
[y])

.
Theorem 9.1.7.
C
[Ax + b, Cy + d] = A
C
[x, y]C

.
9.1. LINEARITY OF EXPECTED VALUES 247
Problem 152. Prove theorem 9.1.7.
Theorem 9.1.8.
C
[

x
y

,

u
v

] =


C
[x, u]
C
[x, v]
C
[y, u]
C
[y, v]

.
Special case:
C
[Ax+By, Cu+Dv] = A
C
[x, u]C

+A
C
[x, v]D

+B
C
[y, u]C

+
B
C
[y, v]D


. To show this, express each of the arguments as a partitioned matrix,
then use theorem 9.1.7.
Definition 9.1.9.
V
[x] =
C
[x, x] is called the dispersion matrix.
It follows from theorem 9.1.8 that
(9.1.2)
V
[x] =





var[x
1
] cov[x
1
, x
2
] ··· cov[x
1
, x
n
]
cov[x
2
, x

1
] var[x
2
] ··· cov[x
2
, x
n
]
.
.
.
.
.
.
.
.
.
.
.
.
cov[x
n
, x
1
] cov[x
n
, x
2
] ··· var[x
n

]





Theorem 9.1.10.
V
[Ax] = A
V
[x]A

.
From this follows that
V
[x] is nonnegative definite (or, as it is also called, positive
semidefinite).
Problem 153. Assume y is a random vector, and var[y
i
] exists for every com-
ponent y
i
. Then the whole dispersion matrix
V
[y] exists.
248 9. RANDOM MATRICES
Theorem 9.1.11.
V
[x] is singular if and only if a vector a exists so that a


x
is almost surely a constant.
Proof: Call
V
[x] = Σ
Σ
Σ. Then Σ
Σ
Σ singular iff an a exists with Σ
Σ
Σa = o iff an a exists
with a

Σ
Σ
Σa = var[a

x] = 0 iff an a exists so that a

x is almost surely a constant.
This means, singular random variables have a restricted range, their values are
contained in a linear subspace. This has relevance for estimators involving singular
random variables: two such estimators (i.e., functions of a singular random variable)
should still be considered the same if their values coincide in that subspace in which
the values of the random variable is concentrated—even if elsewhere their values
differ.
Problem 154. [Seb77, exercise 1a–3 on p. 13] Let x = [x
1
, . . . , x
n

]

be a vector
of random variables, and let y
1
= x
1
and y
i
= x
i
− x
i−1
for i = 2, 3, . . . , n. What
must the dispersion matrix
V
[x] be so that the y
i
are uncorrelated with each other
and each have unit variance?
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 249
Answer. cov[x
i
, x
j
] = min(i, j).
y = Ax with A =





1 0 0 0 0
−1 1 0 0 0
0 −1 1 0 0
0 0 −1 1 0
0 0 0 −1 1




A
−1
=




1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
1 1 1 1 0
1 1 1 1 1




and A
−1
(A
−1

)

= (A

A)
−1
=




1 1 1 1 1
1 2 2 2 2
1 2 3 3 3
1 2 3 4 4
1 2 3 4 5





9.2. Means and Variances of Quadratic Forms in Random Matrices
9.2.1. Expected Value of Quadratic Form.
Theorem 9.2.1. Assume
E
[y] = η,
V
[y] = σ
2
Ψ, and A is a matrix of constants.

Then
(9.2.1) E[y

Ay] = σ
2
tr(AΨ) + η

Aη.
250 9. RANDOM MATRICES
Proof. Write y as the sum of η and ε
ε
ε = y − η; then
y

Ay = (ε
ε
ε + η)

A(ε
ε
ε + η)(9.2.2)
= ε
ε
ε


ε
ε + ε
ε
ε


Aη + η


ε
ε + η

Aη(9.2.3)
η

Aη is nonstochastic, and since
E

ε
ε] = o it follows
E[y

Ay] − η

Aη = E[ε
ε
ε


ε
ε](9.2.4)
= E[tr(ε
ε
ε



ε
ε)] = E[tr(Aε
ε
εε
ε
ε

)] = tr(A
E

ε
εε
ε
ε

])(9.2.5)
= σ
2
tr(AΨ).(9.2.6)
Here we used that tr(AB) = tr(BA) and, if c is a scalar, i.e., a 1 ×1 matrix, then
tr(c) = c. 
In tile notation (see Appendix B), the proof of theorem 9.2.1 is much more
straightforward and no longer seems to rely on “tricks.” From y ∼ (η,Σ
Σ
Σ), i.e., we
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 251
are writing now σ
2
Ψ = Σ

Σ
Σ, follows
E
[yy

] = ηη

+ Σ
Σ
Σ, therefore
(9.2.7)
E

y
y

=
η
η
+ Σ
Σ
Σ ; therefore
(9.2.8)
E

y
A
y

=

E

y
A
y

=
η
A
η
+
Σ
Σ
Σ A
.
Problem 155. [Seb77, Exercise 1b–2 on p. 16] If y
1
, y
2
, . . . , y
n
are mutually in-
dependent random variables with commom mean η, and with variances σ
2
1
, σ
2
2
, . . . , σ
2

n
,
respectively, prove that
(9.2.9)
1
n(n − 1)

i
(y
i
− ¯y)
2
is an unbiased estimator of var[¯y]. It is recommended to use theorem 9.2.1 for this.
252 9. RANDOM MATRICES
Answer. Write y =

y
1
y
2
. . . y
n


and Σ
Σ
Σ = diag(

σ
2

1
σ
2
2
. . . σ
2
n

). Then the
vector

y
1
− ¯y y
2
− ¯y . . . y
n
− ¯y


can be written as (I −
1
n
ιι

)y.
1
n
ιι


is idempotent,
therefore D = I −
1
n
ιι

is idempotent too. Our estimator is
1
n(n−1)
y

D y, and since the mean
vector η = ιη satisfies Dη = o, theorem 9.2.1 gives
E[y

D y] = tr[DΣ
Σ
Σ] = tr[Σ
Σ
Σ] −
1
n
tr[ιι

Σ
Σ
Σ](9.2.10)
= (σ
2
1

+ ··· + σ
2
n
) −
1
n
tr[ι

Σ
Σ
Σι](9.2.11)
=
n − 1
n

2
1
+ ··· + σ
2
n
).(9.2.12)
Divide this by n(n −1) to get (σ
2
1
+ ··· + σ
2
n
)/n
2
, which is var[¯y], as claimed. 

For the variances of quadratic forms we need the third and fourth moments of
the underlying random variables.
Problem 156. Let µ
i
= E[(y −E[y])
i
] be the ith centered moment of y, and let
σ =

µ
2
be its standard deviation. Then the skewness is defined as γ
1
= µ
3

3
, and
kurtosis is γ
2
= (µ
4

4
) −3. Show that skewness and kurtosis of ay + b are equal to
those of y if a > 0; for a < 0 the skewness changes its sign. Show that skewness γ
1
and kurtosis γ
2
always satisfy

(9.2.13) γ
2
1
≤ γ
2
+ 2.
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 253
Answer. Define ε = y − µ, and apply Cauchy-Schwartz for the variables ε and ε
2
:
(9.2.14) (σ
3
γ
1
)
2
= (E[ε
3
])
2
=

cov[ε, ε
2
]

2
≤ var[ε] var[ε
2
] = σ

6

2
+ 2)

Problem 157. Show that any real numbers γ
1
and γ
2
satisfying (9.2.13) can be
the skewness and kurtosis of a random variable.
Answer. To show that all combinations satisfying this inequality are possible, define
r =

γ
2
+ 3 −3γ
2
1
/4 a = r + γ
1
/2 b = r − γ
1
/2
and construct a random variable x which assumes the following three values:
(9.2.15) x =



a with probability 1/2ar

0 with probability 1/(γ
2
+ 3 −γ
2
1
),
−b with probability 1/2br
This variable has expected value zero, variance 1, its third moment is γ
1
, and its fourth moment
γ
2
+ 3.

Theorem 9.2.2. Given a random vector ε
ε
ε of independent variables ε
i
with zero
expected value E[ε
i
] = 0, and whose second and third moments are identical. Call
var[ε
i
] = σ
2
, and E[ε
3
i
] = σ

3
γ
1
(where σ is the positive square root of σ
2
). Here γ
1
is
254 9. RANDOM MATRICES
called the skewness of these variables. Then the following holds for the third mixed
moments:
(9.2.16) E[ε
i
ε
j
ε
k
] =

σ
3
γ
1
if i = j = k
0 otherwise
and from (9.2.16) follows that for any n ×1 vector a and symmetric n ×n matrices
C whose vector of diagonal elements is c,
(9.2.17) E[(a

ε

ε
ε)(ε
ε
ε

C ε
ε
ε)] = σ
3
γ
1
a

c.
Proof. If i = j = k = i, then E[ε
i
ε
j
ε
k
] = 0 · 0 · 0 = 0; if i = j = k then
E[ε
i
ε
j
ε
k
] = σ
2
· 0 = 0, same for i = j = k and j = i = k. Therefore only E[ε

3
i
]
remains, which proves (9.2.16). Now
(a

ε
ε
ε)(ε
ε
ε

C ε
ε
ε) =

i,j,k
a
i
c
jk
ε
i
ε
j
ε
k
(9.2.18)
E[a


ε
ε
εε
ε
ε

C ε
ε
ε] = σ
3
γ
1

i
a
i
c
ii
= σ
3
γ
1
a

c.(9.2.19)
One would like to have a matrix notation for (9.2.16) from which (9.2.17) follows by
a trivial operation. This is not easily possible in the usual notation, but it is possible
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 255
in tile notation:
E


ε
ε
ε
ε
ε
ε ε
ε
ε

= γ
1
σ
3

.(9.2.20)
Therefore
E

a
ε
ε
ε
ε
ε
ε ε
ε
ε
C


= γ
1
σ
3
a

C
(9.2.21)
256 9. RANDOM MATRICES
Since n ∆ C is the vector of diagonal elements of C, called c, the last term
in equation (9.2.21) is the scalar product a

c. 
Given a random vector ε
ε
ε of independent variables ε
i
with zero expected value
E[ε
i
] = 0 and identical second and fourth moments. Call var[ε
i
] = σ
2
and E[ε
4
i
] =
σ
4


2
+3), where γ
2
is the kurtosis. Then the following holds for the fourth moments:
(9.2.22) E[ε
i
ε
j
ε
k
ε
l
] =









σ
4

2
+ 3) if i = j = k = l
σ
4

if i = j = k = l or i = k = j = l
or i = l = j = k
0 otherwise.
It is not an accident that (9.2.22) is given element by element and not in matrix
notation. It is not possible to do this, not even with the Kronecker product. But it
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 257
is easy in tile notation:
(9.2.23)
E

ε
ε
ε ε
ε
ε
ε
ε
ε ε
ε
ε

= σ
4
+ σ
4
+ σ
4
+ γ
2
σ

4

Problem 158. [Seb77, pp. 14–16 and 52] Show that for any symmetric n × n
matrices A and B, whose vectors of diagonal elements are a and b,
(9.2.24) E[(ε
ε
ε


ε
ε)(ε
ε
ε


ε
ε)] = σ
4

tr A tr B + 2 tr(AB) + γ
2
a

b

.
258 9. RANDOM MATRICES
Answer. (9.2.24) is an immediate consequence of (9.2.23); this step is now trivial due to
linearity of the expected value:
E


A
ε
ε
ε ε
ε
ε
ε
ε
ε ε
ε
ε
B

= σ
4
A
B
+ σ
4
A
B
+ σ
4
A
B
+ γ
2
σ
4

A

B
The first term is tr AB. The second is tr AB

, but since A and B are symmetric, this is equal
to tr AB. The third term is tr A tr B. What is the fourth term? Diagonal arrays exist with any
number of arms, and any connected concatenation of diagonal arrays is again a diagonal array, see
(B.2.1). For instance,
(9.2.25)

=


.
9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 259
From this together with (B.1.4) one can see that the fourth term is the scalar product of the diagonal
vectors of A and B. 
Problem 159. Under the conditions of equation (9.2.23) show that
(9.2.26) cov[ε
ε
ε

C ε
ε
ε,ε
ε
ε



ε
ε] = σ
4
γ
2
c

d + 2σ
4
tr(CD).
Answer. Use cov[ε
ε
ε


ε
ε, ε
ε
ε


ε
ε] = E[(ε
ε
ε


ε
ε)(ε
ε

ε


ε
ε)] − E[ε
ε
ε


ε
ε] E[ε
ε
ε


ε
ε]. The first term is
given by (9.2.24). The second term is σ
4
tr C tr D, according to (9.2.1). 
Problem 160. (Not eligible for in-class exams) Take any symmetric matrix A
and deno te the vector of diagonal elements by a. Let x = θ + ε
ε
ε where ε
ε
ε satisfies the
conditions of theorem 9.2.2 and equation (9.2.23). Then
(9.2.27) var[x

Ax] = 4σ

2
θ

A
2
θ + 4σ
3
γ
1
θ

Aa + σ
4

γ
2
a

a + 2 tr(A
2
)

.
Answer. Proof: var[x

Ax] = E[(x

Ax)
2
] − (E[x


Ax])
2
. Since by assumption
V
[x] = σ
2
I,
the second term is, by theorem 9.2.1, (σ
2
tr A + θ

Aθ )
2
. Now look at first term. Again using the
notation ε
ε
ε = x − θ it follows from (9.2.3) that
(x

Ax)
2
= (ε
ε
ε


ε
ε)
2

+ 4(θ


ε
ε)
2
+ (θ

Aθ )
2
(9.2.28)
+ 4ε
ε
ε


ε
ε θ


ε
ε + 2ε
ε
ε


ε
ε θ

Aθ + 4 θ



ε
ε θ

Aθ .(9.2.29)
260 9. RANDOM MATRICES
We will take expectations of these terms one by one. Use (9.2.24) for first term:

ε
ε


ε
ε)
2
= σ
4

γ
2
a

a + (tr A)
2
+ 2 tr(A
2
)

.(9.2.30)

To deal with the second term in (9.2.29) define b = Aθ; then



ε
ε)
2
= (b

ε
ε
ε)
2
= b

ε
ε
εε
ε
ε

b = tr(b

ε
ε
εε
ε
ε

b) = tr(ε

ε
εε
ε
ε

bb

)(9.2.31)
E[(θ


ε
ε)
2
] = σ
2
tr(bb

) = σ
2
b

b = σ
2
θ

A
2
θ(9.2.32)
The third term is a constant which remains as it is; for the fourth term use (9.2.17)

ε
ε
ε


ε
ε θ


ε
ε = ε
ε
ε


ε
ε b

ε
ε
ε(9.2.33)
E[ε
ε
ε


ε
ε θ



ε
ε] = σ
3
γ
1
a

b = σ
3
γ
1
a

Aθ(9.2.34)
If one takes expected values, the fifth term becomes 2σ
2
tr(A) θ

Aθ , and the last term falls away.
Putting the pieces together the statement follows. 
CHAPTER 10
The Multivariate Normal Probability Distribution
10.1. More About the Univariate Case
By definition, z is a standard normal variable, in symbols, z ∼ N(0, 1), if it has
the density function
(10.1.1) f
z
(z) =
1



e

z
2
2
.
To verify that this is a density function we have to check two conditions. (1) It is
everywhere nonnegative. (2) Its integral from −∞to ∞ is 1. In order to evaluate this
integral, it is easier to work with the independent product of two standard normal
variables x and y; their joint density function is f
x,y
(x, y) =
1

e

x
2
+y
2
2
. In order to
261
262 10. MULTIVARIATE NORMAL
see that this joint density integrates to 1, go over to polar coordinates x = r cos φ,
y = r sin φ, i.e., compute the joint distribution of r and φ from that of x and y: the
absolute value of the Jacobian determinant is r, i.e., dx dy = r dr dφ, therefore
(10.1.2)


y=∞
y=−∞

x=∞
x=−∞
1

e

x
2
+y
2
2
dx dy =


φ=0


r=0
1

e

r
2
2
r dr dφ.
By s ubstituting t = r

2
/2, therefore dt = r dr, the inner integral becomes −
1

e
−t



0
=
1

; therefore the whole integral is 1. Therefore the product of the integrals of the
marginal densities is 1, and since each such marginal integral is positive and they are
equal, each of the marginal integrals is 1 too.
Problem 161. 6 points The Gamma function can be defined as Γ(r) =


0
x
r−1
e
−x
dx.
Show that Γ(
1
2
) =


π. (Hint: after substituting r = 1/2, apply the variable transfor-
mation x = z
2
/2 for nonnegative x and z only, and then reduce the resulting integral
to the integral over t he normal density function.)
Answer. Then dx = z dz,
dx

x
= dz

2. Therefore one can re duce it to the integral over the
normal density:
(10.1.3)


0
1

x
e
−x
dx =

2


0
e
−z

2
/2
dz =
1

2


−∞
e
−z
2
/2
dz =



2
=

π.

10.1. MORE ABOUT THE UNIVARIATE CASE 263
A univariate normal variable with mean µ and variance σ
2
is a variable x whose
standardized version z =
x−µ
σ
∼ N(0, 1). In this transformation from x to z, the

Jacobian determinant is
dz
dx
=
1
σ
; therefore the density function of x ∼ N(µ, σ
2
) is
(two notations, the second is perhaps more modern:)
(10.1.4) f
x
(x) =
1

2πσ
2
e

(x−µ)
2

2
= (2πσ
2
)
−1/2
exp

−(x − µ)

2
/2σ
2

.
Problem 162. 3 points Given n independent observations of a Normally dis-
tributed variable y ∼ N(µ, 1). Show that the sample mean ¯y is a sufficient statis-
tic for µ. Here is a formulation of the factorization theorem for sufficient statis-
tics, which you will need for this question: Given a family of probability densities
f
y
(y
1
, . . . , y
n
; θ) defined on R
n
, which depend on a parameter θ ∈ Θ. The statistic
T : R
n
→ R, y
1
, . . . , y
n
→ T(y
1
, . . . , y
n
) is sufficient for parameter θ if and only if
there exists a function of two variables g : R ×Θ → R, t, θ → g(t; θ), and a function

of n variables h : R
n
→ R, y
1
, . . . , y
n
→ h(y
1
, . . . , y
n
) so that
(10.1.5) f
y
(y
1
, . . . , y
n
; θ) = g

T (y
1
, . . . , y
n
); θ

· h(y
1
, . . . , y
n
).

Answer. The joint density function can be written (factorization indicated by ·):
(10.1.6)
(2π)
−n/2
exp


1
2
n

i=1
(y
i
−µ)
2

= (2π)
−n/2
exp


1
2
n

i=1
(y
i
−¯y)

2

·exp


n
2
(¯y−µ)
2

= h(y
1
, . . . , y
n
)·g(¯y; µ).
264 10. MULTIVARIATE NORMAL

10.2. Definition of Multivariate Normal
The multivariate normal distribution is an important family of distributions with
very nice properties. But one must be a little careful how to define it. One might
naively think a multivariate Normal is a vector random variable each component
of which is univariate Normal. But this is not the right definition. Normality of
the components is a necessary but not sufficient condition for a multivariate normal
vector. If u =

x
y

with both x and y multivariate normal, u is not necessarily
multivariate normal.

Here is a recursive definition from which one gets all multivariate normal distri-
butions:
(1) The univariate standard normal z, considered as a vector with one compo-
nent, is multivariate normal.
(2) If x and y are multivariate normal and they are independent, then u =

x
y

is multivariate normal.
(3) If y is multivariate normal, and A a matrix of constants (which need not
be square and is allowed to be singular), and b a vector of constants, then Ay + b
10.3. BIVARIATE NORMAL 265
is multivariate normal. In words: A vector consisting of linear combinations of the
same set of multivariate normal variables is again multivariate normal.
For simplicity we will go over now to the bivariate Normal distribution.
10.3. Special Case: Bivariate Normal
The following two simple rules allow to obtain all bivariate Normal random
variables:
(1) If x and y are independent and each of them has a (univariate) normal
distribution with mean 0 and the same variance σ
2
, then they are bivariate normal.
(They would be bivariate normal even if their variances were different and their
means not zero, but for the calculations below we will use only this special case, which
together with principle (2) is sufficient to get all bivariate normal distributions.)
(2) If x =

x
y


is bivariate normal and P is a 2 × 2 nonrandom matrix and µ
a nonrandom column vector with two elements, then P x + µ is bivariate normal as
well.
All other properties of bivariate Normal variables can be derived from this.
First let us derive the density function of a bivariate Normal distribution. Write
x =

x
y

. x and y are independent N(0, σ
2
). Therefore by principle (1) above the
266 10. MULTIVARIATE NORMAL
vector x is bivariate normal. Take any nonsingular 2 ×2 matrix P and a 2 vector
µ =

µ
ν

, and define

u
v

= u = P x + µ. We need nonsingularity because otherwise
the resulting variable would not have a bivariate density; its probability mass would
be concentrated on one straight line in the two-dimensional plane. What is the
joint density function of u? Since P is nonsingular, the transformation is on-to-one,

therefore we can apply the transformation theorem for densities. Let us first write
down the density function of x which we know:
(10.3.1) f
x,y
(x, y) =
1
2πσ
2
exp


1

2
(x
2
+ y
2
)

.
For the next step, remember that we have to express the old variable in terms
of the new one: x = P
−1
(u − µ). The Jacobian determinant is therefore J =
det(P
−1
). Also notice that, after the substitution

x

y

= P
−1

u − µ
v −ν

, the expo-
nent in the joint density function of x and y is −
1

2
(x
2
+ y
2
) = −
1

2

x
y



x
y


=

1

2

u − µ
v −ν


P
−1

P
−1

u − µ
v −ν

. Therefore the transformation theorem of density
10.3. BIVARIATE NORMAL 267
functions gives
(10.3.2) f
u,v
(u, v) =
1
2πσ
2



det(P
−1
)


exp


1

2

u − µ
v −ν


P
−1

P
−1

u − µ
v −ν


.
This expression can be made nicer. Note that the covariance matrix of the
transformed variables is
V

[

u
v

] = σ
2
P P

= σ
2
Ψ, say. Since P
−1

P
−1
P P

= I,
it follows P
−1

P
−1
= Ψ
−1
and


det(P

−1
)


= 1/

det(Ψ), therefore
(10.3.3) f
u,v
(u, v) =
1
2πσ
2
1

det(Ψ)
exp


1

2

u − µ
v −ν


Ψ
−1


u − µ
v −ν


.
This is the general formula for the density function of a bivariate normal with non-
singular covariance matrix σ
2
Ψ and mean vector µ. One can also use the following
notation which is valid for the multivariate Normal variable with n dimensions, with
mean vector µ and nonsingular covariance matrix σ
2
Ψ:
(10.3.4) f
x
(x) = (2πσ
2
)
−n/2
(det Ψ)
−1/2
exp


1

2
(x − µ)

Ψ

−1
(x − µ)

.
Problem 163. 1 point Show that the matrix product of (P
−1
)

P
−1
and P P

is the identity matrix.
268 10. MULTIVARIATE NORMAL
Problem 164. 3 points All vectors in this question are n × 1 column vectors.
Let y = α+ε
ε
ε, where α is a vector of constants and ε
ε
ε is jointly normal with
E

ε
ε] = o.
Often, the covariance matrix
V

ε
ε] is not given directly, but a n×n nonsingular matrix
T is known which has the property that the covariance matrix of Tε

ε
ε is σ
2
times the
n × n unit matrix, i.e.,
(10.3.5)
V
[Tε
ε
ε] = σ
2
I
n
.
Show that in this case the density function of y is
(10.3.6) f
y
(y) = (2πσ
2
)
−n/2
|det(T )|exp


1

2

T (y −α)



T (y −α)

.
Hint: define z = Tε
ε
ε, write down the density function of z, and make a transforma-
tion between z and y.
Answer. Since
E
[z] = o and
V
[z] = σ
2
I
n
, its density function is (2πσ
2
)
−n/2
exp(−z

z/2σ
2
).
Now express z, whose density we know, as a function of y, whose density function we want to know.
z = T (y − α) or
z
1
= t

11
(y
1
− α
1
) + t
12
(y
2
− α
2
) + ··· + t
1n
(y
n
− α
n
)(10.3.7)
.
.
.(10.3.8)
z
n
= t
n1
(y
1
− α
1
) + t

n2
(y
1
− α
2
) + ··· + t
nn
(y
n
− α
n
)(10.3.9)
therefore the Jacobian determinant is det(T ). This gives the result.
10.3. BIVARIATE NORMAL 269

10.3.1. Most Natural Form of Bivariate Normal Density.
Problem 165. In this exercise w e will write the bivariate normal density in its
most natural form. For this we set the multiplicative “nuisance parameter” σ
2
= 1,
i.e., write the covariance matrix as Ψ instead of σ
2
Ψ.
• a. 1 point Write the covariance matrix Ψ =
V
[

u
v


] in terms of the standard
deviations σ
u
and σ
v
and the correlation coefficient ρ.
• b. 1 point Show that the inverse of a 2 × 2 matrix has the following form:
(10.3.10)

a b
c d

−1
=
1
ad − bc

d −b
−c a

.
• c. 2 points Show that
q
2
=

u − µ v − ν

Ψ
−1


u − µ
v −ν

(10.3.11)
=
1
1 − ρ
2

(u − µ)
2
σ
2
u
− 2ρ
u − µ
σ
u
v −ν
σ
v
+
(v −ν)
2
σ
2
v

.(10.3.12)

×