Tải bản đầy đủ (.pdf) (36 trang)

THE MULTIVARIATE LINEAR REGRESSION MODEL

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.06 MB, 36 trang )

CHAPTER

24

The multivariate linear regression model

24.1

Introduction

The multivariate linear regression model is a direct extension of the linear
regression model to the case where the dependent variable is an mx 1
random vector y,. That is, the statistical GM takes the form
y,=

Bx,

+

(24.1)

tel,

Wy,

where y,: mx 1, B: kx m, x, k x 1, u,: mx
system of m linear regression equations:

Ve = BX, tuy,

1=1,2,...,m,



1. The system (1) is effectively a

teT,

(24.2)

with B=(B,, B2,.--. Bn).

In direct analogy with the m= 1 case (see Chapter 19) the multivariate
linear regression model will be derived from first principles based on the

joint distribution of the observable random variables involved, D(Z,; ý)
where Z,=(yi, Xi), (m+k)x
distributed vector, Le.
Y:

H

~N

1. Assuming

os

((c

2i

2)


that

Z,

for allreT,

T2;

is an

HD

normally

(24.3)

we can proceed to define the systematic and non-systematic components

by:

and

u.=E(y,/X,=x)=Bx,
u,=y, —Ely,/X,=x,),
571

B=E;;E,,
te T.


244)
(24.5)

-


$72

The multivariate linear regression model

Moreover, by construction, u, and y, satisfy the following properties:

(i)

E(u,) = ELE(u,/X,
= x,)] =0;

i)

Eluw)=E[EuayX,=x)]=1°

u,u,) = E[E(uu,/X,
= x,)] = 0
ts:

(iii)

EUsw)=E[Eua/X,=xJ]=E[w,Etw/X,=x)]=0.

(it)


where O=X,¡—¡¿:ŠX;; *¿;¡ (compare

19.2).

reT,

these with the results in Section

The similarity between the m= 1 case and the general case allows us to

consider several loose ends left in Chapter 19. The first is the use of the joint

distribution D(Z,; ys) in defining the model instead of concentrating
exclusively on D(y,/X,; w,). The loss of generality in postulating the form of
the joint distribution is more than compensated for by the additional
insight provided. In practice it is often easier to ‘judge’ the plausibility of

assumptions relating to the nature of D(Z,; y) rather than D(y,/X,; p,).

Moreover, in misspecification analysis the relationship between the
assumptions underlying the model and those underlying the random vector
process {Z,, t¢ 1} enhances our understanding of the nature of the

possible departures. An interesting example of this is the relationship of the

assumption that {Z,,reT} isa
(1)
normal (N);
(2)

independent (J); and
(3)
identically distributed (ID) process; and

[6]
[7]

[8]

(i)
(1)

(ili)

——_Dly,/X,; 4) is normal;

E(y,/X,=X,) 18 linear in x,;

Cov(y,/X,=x,) is homoskedastic (free of x,);

6=(B, Q) are time-invariant;

{¥,/X,,t€T} ts an independent process.

The relationship between
below:

these components

is shown


diagrammatically

(i)
(N)> 4 (i),

(ID) >[7],

(D> [8]

(iii)
The question which naturally arises is whether (i)}(ii) imply (N) or not. The
following lemma shows that if (i)-{ili) are supplemented by the assumption


24.1

Introduction

373

that X,~ N(0,Z,,), det(X,,)40, the reverse implication holds.
Lemma 24.1
Z,~ N(0,X) for te TT if and only if

(i)

X,~N(0,E,,), det(L,) #0;

ti)


E(y,/X,=X,)
=
2252x,;

(iii)

Covly,/X;=X,) =, —Ly.E57Z>,

(iii)

(y,/X,)~ N(B‘X,, Q)

(see Barra

(1981)).

The statistical GM

(1) for the sample period f=1.2..... T is written as

Y=XB+U.

(24.6)

whereY:T xm.X:T

x k.B:k x m.U: 7 x m. The system in (1) can be viewed

as the tth row of (6). The ith row taking the form


y,=XB,t+u,

i= 1.2.....m

(24.7)

represents all T observations on the ith regression in (2). In order to define
the conditional distribution D(Y/X; w,) we need the special notation of
Kronecker products (see Appendix 2). Using this notation the matrix
distribution can be written in the form

(Y/# =X)~ N(XB, 9 @ II),

(24.8)

where Q @I, represents the covariance of
ti
vec(Y)=

%2

: Tmx

1.

Ym
The vectoring operator vec( -) transforms a matrix into a column vector by
stacking the columns of the matrix one beneath the other. Using the
vectoring operator we can express (6) in the form

vec(Y) =(l„ @ X) vec(B) + vec(U)
or

y*=X,B, +,

in an obvious notation.

(24.9)
(24.10)


374

The multivariate linear regression model

The multivariate linear regression (MLR) model is of considerable
interest in econometrics because of its direct relationship with the
simultaneous equations formulation to be considered in Chapter 25. In
particular, the latter formulation can be viewed as a reparametrisation of
the MLR model where the statistical parameters of interest @=(B, Q) do not
coincide with the theoretical parameters of interest €. Instead, the two sets of
parameters are related by some system of implicit equations of the form:

h,(0.6)=0.

i=1,2,...,p.

(24.11)

These equations can be interpreted as providing an alternative

parametrisation for the statistical GM
in terms of the theoretical
parameters of interest. In view of this relationship between the two
statistical models a sound understanding of the MLR model will pave the
way for the simultaneous equations formulation in Chapter 25.

24.2

Specification and estimation

In direct analogy to the linear regression model (m= 1) the multivariate
linear regression model is specified as follows:

qd)

Statistical GM:
y:mxl,

[1]

y,=B’x,+u,,

Xx;:kx1,

teT

B:kxm.

The systematic and non-systematic components are:
H,= E(y,X,=x,)=Bx,


u,=y,— Ety,/X,=x,),

and by construction
E(u,) = EL E(u,/X, =x,)]=9,
E(uu,) = E[LE(wu,/X,

[2]

The

statistical parameters

X2; E¿y,

Q=%,,

= x,)] =0,

reT.

of interest are 6=(B,Q)

—2,2%37'E),.

where. B=

[3]
[4]


X, is assumed to be weakly exogenous with respect to 0.
No a prion information on Ø.

[5]

Rank(X)=k, X =(x,, X5,..., X7): T xk, for T>k.


24.2
ap

Specification and estimation

575

Probability model
0=|

DivX

a

Or

Pí— 3ÍW,— Bx)OQ_

0c

[6]


ty c—B%,)},

**xC”"†,íe vf.

(i) — Dly,/Xs @) — normal;

[7]

(1)
Ety,/X,
= x,) = B’x, — linear in x,;
(ili)
Cov(y,/X,= X,)
= 2 — homoskedastic (free of x,);
6 is time invariant.

(IW)

Sampling model

{8}

Y=(¡,Y¿..... Yr} 1S an independent sample sequentially drawn

from D(y,/X,; 6), t=1,2,..., T.and T2=m+k.
The above specification is almost identical with that of m= 1 considered
in Chapter 19. The discussion of the assumptions in the same chapter
applies to [1]-[8] above with only minor modifications due to m> 1. The
only real change brought about by m> | is the increase in the number of
statistical parameters of interest being mk +4m(m + 1). It should come as no


surprise to learn that the similarities between the two statistical models

extend to estimation, testing and prediction.
From assumptions [6] to [8] we can deduce
function takes the form

that

the

likelihood

r

10; Y) )=c(Y)

IP

(y,/X,; 9)

and the log likelihood is

log L=const ~5, lost (det Q)— 5 S (y,
1

)@~!(y,—Bx,) (24.12)
,

B


=const —4[T log(det 2) + trQ~!(Y —XB(Y—XB)]

(24.13)

(see exercise 1). The first-order conditions are

ÈÍ19§ È _ yy xxB)Q-'=0,
cB
ê
T
Clog
_T2 9 sy _xpy(y—xB)=0.
ãQ@"!

(24.14)
(24.15)

+ C™ denotes the space of all real positive definite symmetric matrices of rank m.


576

The multivariate linear regression model

These first-order conditions lead to the following MLE’s:

B=(XX)'!XY.
G=—


1

T

(24.16)

.

CU,

(24.17)

where U = Y —XB. For Q to be positive definite we need to assume that T >
m+k (see Dykstra (1970)). It is interesting to note that (16) amounts to
estimating cach regression equation separately by

,=(XX) lXy,

i=1,2.....m.

(24.18)

Moreover, the residuals from these separate regressions ủ,= y,— Xổ, can be
used to derive QD via @¡;=(1/T)ùjâ,, ij=1,2,....m.
As in the case of B in the linear regression model, the MLE B preserves the
original orthogonality between the systematic and non-systematic
components. That is, for #,=B’x, and a, =y, —Bx,

V,=f8,+úủ,


/=1,2..... T

(24.19)

and #, 1 ũ, This orthogonality can be used
measure by extending R?= 1—(a'd) (yy) to

to deñne

a goodness-of-lit

G=I—(Ù Õ)\(Y'Y)!=(Y'YT-ÙÔ)J(Y'Y)T1,

(24.20)

The matrix G varies between the identity matrix when U =0 and zero when
Y=U

(no

measure

explanation).

In

to a scalar we can

[


d,=-trG,

m

order

to

d,=det(G)

[

m

} 2.

Mm =

and

this

matrix

goodness-of-fit

(24.21)

(see Hooper (1959)).
In terms of the eigenvalues (4,, 7

goodness of fit take the form

dị=

reduce

use the trace or the determinant

,...,4,,) of G the above measures of

m

dạ=[|] 2,
¿=1

(24.22)

The orthogonality extends directly to M=XB and U and can be used to
show that B and Q are independent random matrices. In the present context
this amounts to

Cov(B @ Ô) =0,
where E(:) is relative to D(V/X: 0).

(24.23)


24.2

Specification and estimation


577

Finite sample properties of B and Oo
From the fact that B and Q are MLE’s we can deduce that they enjoy the
invariance property of such estimators (see Chapter 13) and they are
functions of the minimal sufficient statistics, if they exist. Using the Lehman—
Scheffe result (see Chapter 12) we can see that the ratio
D(Y/X; 6) |

DIV gi 8) RPL 20767 1[YY —Y0Yu —(Y —YuJXB—
BX(Y-Yạ];

is independent of 0 if YY=Y’Y,

a(Y)=(t,(¥),t,(¥)),
defines the set of minimal

and Y’'X = YX.

(24.24)

This implies that

where t,(Y)=Y’Y, t,(¥Y) =Y'X
sufficient statistics and

B=(X'X) '2,(YY,

(24.25)


~

(24.26)

1

Ơ=.r(Y)—r(VIXX) tra),
In order to discuss the other properties of B and
distributions.
Since

© let us derive their

B=B+(X’X) 'X'U
=B+LU,

L=(XX) 1X,

we can deduce that

B~ N(B, Q @ (X'X)~').
This is because B is

(24.27)

a linear function of Y where

(Y/X) ~ N(XB,Q © I).


(24.28)

Given that TO = Y(I — M,)Y’, its distribution is the matrix equivalent to the
chi-square, known as the Wishart distribution with T—k degrees of
freedom and scale matrix Q and written as

TQ ~ W,(Q, T—k)
(see Appendix

(24.29)

1). In the case where m= 1. TƠ =ữủ

TƠ~ø2z?(T—k),

E(TƠ)=ø?(T—ĐI.

and

(24.30)

The Wishart distribution enjoys most of the attractive properties of the
multivariate normal distribution (see Appendix 1). In direct analogy to (30),

E[(TQ) =(T—k)]Q,

(24.31)


578


The multivariate linear regression model

and thus Õ=[1/(T—k)]ỮÔ is an unbiased estimator of Q. In view of
(25}-(31) we can summarise the finite sample properties of the MLE’s B and
© of B and Q respectively:
(1)
B and
are invariant (with respect to Borel functions of the form

(3)
(4)

g(-): ©—

(1 + 3m(m + 1).

Band Ơ are functions of the minimal sufficient statistics t,(Y)=Y'Y
and 1,(Y)=Y’X.

B is an unbiased estimator of B (i.e. E+(B)=B) but Q is a biased
estimator of Q; Q=[1/(T—k)]U'U being unbiased.

Bisa fully efficient estimator ofB in view of the fact that Cov(B) =
O@(XX)'! and the information matrix of @=(B,Q) takes the
form
"
,


1=

(5)

9X
0

°

(Q°-'@Q7))

Nps

(2)

(2432)

(see Rothenberg (1973)).
B and Q are independent: in view of the orthogonality in (19).
Asymptotic properties of B and Q

Arguing again by analogy to the m= 1 case we can derive the asymptotic
properties of the MLE’s B and Q of B and Q, respectively.
(1)

Consistency:

(B *,B,Q24 Q)

In view of the result (B—B)~ N(0, Q @ (X’X)z!)


we can deduce that if

lim,_, , (XX); ! =0 then Cov(B) — 0 and thus B is a consistent estimator of
B (see Chapters

12 and

19). Similarly, given that lim, ,,, E(Q)=Q

and

lim,,,, CoQ) =0, AS Q.
Note that the following statements are equivalent:

(a)

lim (X’'X)7! =0;

Tox

(b)

^mn(X X)y

x

(c)

Amax(X X) 7


>-0

(d)

tr(XX)z!— 0;

asTox;
as

T—

Ky

+ Note that the expectation operator E(-) is relative to the underlying probability
model D(y,/X,; 8).


24.3.

A priori information

579

where 2„„(XX)y and Ama(X’X)7' refer to the smallest and largest
eigenvalue of (X'X); and its inverse respectively; see Amemiya (1985).
(ii)

Strong consistency:


If

.

-

lim(XX)ȣ!=0

T

a

Mn

(B — B)

and

Ảma(X X)r

|.
Armin

XX)

|<€C

for some arbitrary constant C, then Bo B; see Anderson and Taylor (1979).
(1)


Asymptotic normality

From the theory of maximum

likelihood estimation we know that under

relatively mild conditions (see Chapter 13) the MLE 6 of 6 \/T(6—6) ~

N(,I,,(0)~). For this result
I,,(0)=lim,_, ,.(1/T)1(8) as
the asymptotic information
if lim,_, , (X'X)/T=Q, < «x

deduce that

to apply, however, we need the boundedness of
well as its non-singularity. In the present case
matrix is bounded and non-singular (full rank)
and non-singular. Under this condition we can

\/ T(B—B) ~ N0,2 @ Qz')

(24.33)

J/T(Q —Q) ~ NO, 2(Q © Q))

(24.34)

and


(see Rothenberg (1973)).
Note that if {(X’X),;, T >k} is a sequence of k x k positive definite matrices
such that (X’X),;_, —(X’X); is positive semi-definite and e’(X’X);c + x as

T— x for every c#0 then lim;_,(X’X)7'=0.

(iv)
In view of (iii) we can deduce that B and Q are both asymptotically
unbiased and efficient.
24.3

A priori information

One particularly important departure from the assumptions underlying the
multivariate linear regression model is the introduction of a priori

restrictions related to 6. When such additional information is available

assumption [4] no longer applies and the results on estimation derived in

Section 24.2 need to be modified. The importance of a priori information in


580

the
can
will
the
(1)


The multivariate linear regression model

present context arises partly because it allows us to derive tests which
be usefully employed in misspecification testing and partly because this
provide the link between the multivariate linear regression model and
simultaneous equations model to be considered in Chapter 25.
Linear restrictions ‘related’ to X,

The first form of restrictions to be considered is
D,B+C, =0,

(24.35)

where D, : px k (pconstants. A particularly important special case of (35) is when

D,=(0.1,),
2

B-(

B
B;

}

C,=0

(24.36)


and (35) takes the form B;=0. That is, a subset of the coefficients in B is
zero. The thing to note about these restrictions 1s that they are not the same
as the form

RB=r,

(24.37)

discussed in the context of the m= | case (see Chapter 20). This is because
the D, matrix affects all the columns of B and thus the same restrictions.
apart from the constants in C,, are imposed on all m linear regression
equations. The form of restrictions comparable to (37) im the present
context is

RB, =".

(24.38)

where B,, = vec(B) =(B'. Bo... . , Bi.) 1 mk x 1, Rip x mk,
linear restrictions is more general

BI, +A, =0.

px 1. This form of

than (35) as well as

(24.39)


All three forms, (35), (38) and (39), will be discussed in this section because
they are interesting for different reasons.
When the restrictions (35) are interpreted in the context of the statistical
GM
y,=—Bx,+u,,

tel,

(24.40)

we can see that they are directly related to the regressors X;,,i=1,2,...,k.
The easiest way to take (35) into consideration in the estimation of @=(B, Q)
is to ‘solve’ the system (35) for B and substitute the ‘solution’ into (40). In

order to do that we define two arbitrary matrices D¥:(k — p) x k, rank (D*)=


24.3.
k—p,

and

A priori information

C#: (k—p) xm, and

581

reformulate


(35) into

DB+C=0

(24.41)

where

D=(D,.D*),

C

kxk,

c-(C1)

k xm.

The fact that rank(D)=k enables us to solve (41) for B to yield
B= —D 'C=G,C,
where

yields

G =(G,, G*)=

—D™!.

+ G*C%,


(24.42)

Substituting this into (40) for t=1,2,...,

Y*=X*C?+U,

T

(24.43)

where Y*=Y—XG,C, and X*=XGf. The fact that the form of the
underlying probability model is unchanged implies that the MLE of C?# is

Cƒ=(X*X*)!X*YY =(GƒXXG?) 'GƑX(VY—XG,C,)
=(GIXXG#)!G#XX(B—G,€,).
(24.44)
Hence, from (42) the constrained MLE

of B is

B=G,C,+G*(G†XXG#)"!G#XX(B—G,C,)
=G,C,+L(B—G,C,),
=B—P(B—G,C¡),

(24.45)

whereL=G†(G†XXG‡#) 'G*XX

where P=I—L.


(24.46)

Given that L?=L, P?= P and LP=0 (ie. they are orthogonal projections)
we can deduce that P takes the form
P=(XX) !D;,[D,(XX) !D¡] !D,

(24.47)

(see exercise 3 This implies that

B=B-(XX)“!D;,[D,(XX)”!Đ,]-!(D,B+C,).
since D,G,=I,. Moreover. the constrained MLE
a

~

SA

Q= Tle L= Qe,

Il.

«

~

sa

(B—B)(X X)(B — Bì.


(24.48)

of Q is
(24.49)

Looking at the constrained MLE’s of B and Q we can see that they are
direct extensions of the results in the case of m=1 in Chapter 20.
Another important special case of (35) is the case where all the coefficients

apart from the constant terms, say B

,,are zero. This can be expressed in the


382

The multivariate linear regression model

form (35) with
D, =(0,1,-1),

B=(B ;,B,,,),

C=0

and H, takes the form B,,,=9.
(2)

Linear restrictions ‘related’ to y,


The second form of restrictions to be considered is
Br, +A, =0,

(24.50)

where I',: mx q (qThe restrictions in (50) represent linear between-equations restrictions
because the ith row of B represents the ith coefficient on all equations.
Interpreted in the context of (35) these restrictions are directly related to the
yS. This implies that if we follow the procedure used for the restrictions in
(38) we have to be much more careful because the form of the underlying
probability model might be affected. Richard (1979) shows how this
procedure can give rise to the restricted MLE’s of Band Q. For expositional
purposes we will adopt the Lagrange multiplier procedure. The Lagrangian
function is

\(B, Q, M) = -5 log(det 9)—‡ tr Q~!(Y —XBJ(Y —XB)
—tr[A(BI, + A,)],

(24.51)

where A is a matrix of Lagrange multipliers.

él
an
ed

(XY—-XXB)Q'"!— AI; =0,
oT


=~ Q-HY
20°77
24 —XB)(Y
X —XB)=)=0,
él
AT

—(BL;+A¡)=0

(24.52)
.
(24.53)
(24.54)

(see Appendix 2). From (52) we can deduce that

(XX)(Ê_— B)= AT;O.

(24.55)

Premultiplying by A, and solving for A yields

A=(X’X)(BIr, —BI,)(M,QT,)7',

(24.56)

which in view of (54) becomes

A=(XXIÊT, +A,J(T;@F;)"!.


(24.57)


24.3.

A priori information

583

This implies that the constrained MLE’s of B and Q are

B=B-(BT,+A,)\(T;/ƠT,)-!T;ơ,

(24.58)

~ law
a lage
~ 2
Q=— ỮŨ=ơƠ +7 B- By (X'X)(B—B)

(24.59)

(see Richard (1979)). If we compare (58) with (48) we can see that the main
difference is that Q enters the MLE estimator of B in view of the fact that the
restrictions (50) affect the form of the probability model. It is interesting to
note that if we premultiply (58) by F, it yields (54). The above formulae, (58),
(59), will be of considerable value in Chapter 25.
(3)

Linear restrictions ‘related’ to both y, and X,


A natural way to proceed is to combine the linear restrictions (38) and (50)
in the form of

D,Br'+C=0,

(24.60)

where D,: px k,
|: mx q, C: px q, are known matrices with rank(D,)=p,
rank(I’,)=q. Using the Lagrangian function

T
5 log(det @) —3 tr @~!(Y — XB}(Y — XB)

(B,Q,A)=

—tr[A(D;BT,+©)],
we can show

that the restricted MLE’s

(24.61)

are

B=B-(XX)-!D;[D,(XX)~!D;/]r'!(D,BT, +C\Œ;ƠT;)1T;,Ơ,
~




A

+

|

we

&

~

3a

(B—B)(X X)(B — Bì).

(24.62)
(24.63)

An alternative way to derive (62) and (63) is to consider
D,B*+C=0

(24.64)

for the transformed specification
Y* = XB*+E.

where Y*=YT,, B*=BL,


(24.65)

and E=UF;.

The linear restrictions in (60) in vector form can be written as

vec(D, BI, + C)=(T, © D,) vec(B)
+ vec(C) =0

(24.66)

(, © D,)p, =r.

(24.67)

or


584

The multivariate linear regression model

where B, = vec B and r= — vec(C). This suggests that an obvious way to
generalise this is to substitute (/', © D,) with a pxkm matrix R to
formulate the restrictions in the form

RB, =r,

(24.68)


where rank(R)=p (pgeneral form of linear restrictions in view of the fact that B, enables us to
‘reach’ each coefficient of B directly and impose within-equation and
between-equations restrictions separately.
In the case where only within-equation linear restrictions are available R
is block-diagonal,

R={.

Le.

R,

0

0

R,z

---

0
_

0

Ty
land

r=|


OR,

R¿:p;xk,

T;
_

†.

(24.69)

Tin!

i=l1,2,...,m,

rank(R,)=p,

rp,xI.

Exclusion restrictions are a special case of within-equation restrictions
where R,; has a unit sub-matrix, of dimension equal to the number of
excluded

variables, and

zeros everywhere

else.


Across-equations linear restrictions can be accommodated in the off
block-diagonal submatrices R;,,i,j=1,2,....m,iAj ofR with R,, referring
to the restrictions between equations i and j.
Let us consider the derivation of the constrained MLE’s of Band Q under
the linear restrictions (68). The most convenient form of the statistical GM
for the sample period r= 1, 2,..., Tis not
Y=XB+U,

(24.70)

as in the case of (35) and (39), but its vectorised formulation
y, =X,B, +u,.

where

and

(24.71)

Y„=(Y(.Y2.....,Y„Ÿ: Tnx

|,

B„=(f.f›.....
P„): mkx I,

X, =([, @ X): Tin x mk,

u,„=(u/.,u¿....,0): lnx 1


Q,, =(Q © Fy): Tn x Tin.
The Lagrangian function is defined using the vector A: p x 1 of multipliers:

l8. Q„.

.

T

=

Â)= —a log(det Q,) — Wy, —X, By) Qe (V4 —XB,)
—A (RB, —¥).

(24.72)


24.4

The Zellner and Malinvaud formulations



inh

ee

*

70,7 2%

cl

clcao_inp,

1

+400

585

,

(24.73)

Wy„T—X„„)(y„—X„8,„)©; ! (24.74)

1

,

1) =0

=1

(24.75)
.

Looking at the above first-order conditions (73){75) we can see that they
constitute a system of non-linear equations which cannot be solved
explicitly unless Q is assumed to be known. In the latter case (73) and (75)

imply that
ñ. =B,—

x Xx) ) IR TR(X,Q7X,)7 'R*]- (RB, —r),(24.76)

tease -¬..and

(24.77)

_

B, =(X,Q,'°X,)

UX Qyty,.

(24.78)

If we compare these formulae with those in the m= 1 case (see Chapter 20)
we can see that the only difference (when Q is known) is the presence of Q,.

This

is because

in the m>1

case

the restrictions


RB,=r

affect the

underlying probability model by restricting y,. In the econometric literature
the estimator (78) is known as the generalised least-squares (GLS)
estimator.
In practice Q is unknown and thus in order to ‘solve’ the conditions
(73)-{75) we need to resort to iterative numerical optimisation (see Harvey
(1981), Quandt (1983). inter alia).
The purpose of the next section is to consider two special cases of (68)
where the restrictions can be substituted directly into a reformulated
statistical GM. These are the cases of exclusion and across-equations linear

homogeneous restrictions. In these two cases the constrained MLE of B,,

takes a form similar to (78).

24.4

The Zellner and Malinvaud formulations

In econometric modelling two special cases of the general linear restrictions
RỢ,=r

(24.79)

are particularly useful. These are the exclusion and across-equations linear

homogeneous restrictions. In order to illustrate these let us consider the



586

The multivariate linear regression model

two-equation case
(ante
Vat

(i)
(ii)

Bi2

Bai

nh)

Bar

Bas

X1

xX»,

u

+(


")

Har

X3¢

teT.

(24.80)

Exclusion restrictions: B,,=0, B.3=0;
Across-equation linear homogeneous restrictions: B,, =P, >.

It turns out that in these two cases the restrictions can be accommodated
directly into a reformulation of the statistical GM and no constrained
optimisation is necessary. The purpose of this section is to discuss the

estimation of B, under these two forms of restrictions and derive explicit

formulae which will prove useful in Chapter 25.
Let us consider the exclusion restrictions first. The vectorised form of

Y=XB+U,

as defined

(24.81)

in the previous sections, takes the explicit form


y1

X09



=

w„
or

--

0

By

x

.



,

À0 =

0


XỈ

\#y

.

uy

+



(24.82)

uy

¥,=X,By+U,

(24.83)

in an obvious notation. Exclusion restrictions can be accommodated
directly into (82) by allowing the regressor matrix X to be different for
different regression equations y,=X£,+u,,i=1,2,...,m, and redefining
the B;s accordingly. That is, reformulate (82) into
y1

or

X,


0

2=
:

0

xX

|,”
:

HP?
"0
:

Ym

0

0

--

Y„=XšØ8š +uy,

-

0


Br

:

¥

X,/

NBR

Uy

u

:
\ Up,

(24.84)
(24.85)

where X;, refers to the regressor data matrix for the ith equation and Ø# the

corresponding coefficients vector. In the case of the example in (80) with the


24.4

The Zellner and Malinvaud formulations

hs LIB


restrictions ¡¡ =0, 633 =0, (84) takes the form
Y2

0

X2/\B2

H;

587

nm

where X, =(x ,X3), X.=(%1,%2), Bi =(B21B31)' and B,=(B, 2822)’.
The formulation (84) is known as the seemingly unrelated regression
equations (SURE), a term coined by Zellner (1962), because the m linear
regression equations in (84) seem to be unrelated at first sight but this turns
out to be false. When different restrictions are placed on different equations
the original statistical GM is affected and the various equations become
interrelated. In particular the covariance matrix Q enters the estimator of

*. As shown in the previous section, in the case where Qis known the MLE
of B* takes the form

‡=(X‡(9~! @1,)X‡)"'X‡(@~' @I;)y,.

(24.87)

Otherwise, the MLE is derived using some iterative numerical procedure.

For this case Zellner (1962) suggested the two-step least-squares estimator

đ‡=(X‡(Ơ“'! @I,)X#)”!X‡(Ơ @ I,)y„.
where Q=(1/T)U'U, U=Y—XB.

(24.88)

It is not very difficult to see that this

estimator can be viewed as an approximation to the MLE

defined in the

previous section by the first-order conditions (73}{75) where only two
iterations were performed. One to derive © and then substitute into (87).
Zellner went on to show that if
Qc!

lim (x: a X:}=0, Tox

(24.89)

and non-singular, the asymptotic distribution of (87) and (88) coincide,
taking the form

X/T(ÿŸ — 8$) ~ XI0.Q¿ 1).
3

(24.90)


It is interesting to note that in the cases:

(a)

X,=X,=--=X,,=X:

and
(b)

Q= diagi@, ;, Om)

Bx = B,=(X,X,) 'Xyy,

(24.91)

(see Schmidt (1976)).
Another important special case of the linear restrictions in (79) is the case


588

The multivariate linear regression model

of across-equation linear homogeneous restrictions such as B,,=f,> in
example (80). Such restrictions can be accommodated into the formulation
(82) directly by redefining the regressor matrix as
xi;

O


0

x’

X*=]{

--

0

7

(24.92)

Oc:

OF

XY:

(where x;, refers to the regressors included in the ith equation) and the
coefficient vector B,,, So as to include only the independent coefficients. The
form

y,=X#š +u,

(24.93)

is said to be the Malinvaud form (see Malinvaud (1970)). For the above

example the restriction ~,,=f/,, can be accommodated into (80) by
defining X# and Bx as
Xa,

X

xr-("

*u

Ñ

0
X21

)

and

pt=|

Bay

Bs, |.

(24.94)

8

22


The constrained MLE

T

?:~{ » xzo
t=]

of Øš In the case where 22 is known is

'/)]

~1

TT

(24.95)

` XF⁄O ty,

t=1

Given that Q is usually unknown, the MLE of B* as defined in the previous
section by (73)(75) can be approximated by the GLS estimator based on
the iterative formula

j..=(Š

T


=1

xrô, 'x7]

-1

Tổ

Y XO; 'y,,

t=1

i=1,2,...,1,

(24.96

where / refers to the number of iterations which is either chosen a priori or
determined by some convergence criterion such as

|Ê*.(—f|
for somee>0,

eg. ¢=0.001.

(24.97)

In the case where /=2 the estimator defined by (96) coincides with
T


fs-( » xrô 'x7]
t=1

-i

T

>x X#Ô lyu

t=1

where Q=(1/T)U'U, U=Y — XB.

(24.98)


24.5
24.5

Specification testing

589

Specification testing

In the context of the linear regression model the F-type test proved to be by
far the most useful test in both specification as well as misspecification
analysis; see Chapters 19-22. The question which naturally arises is
whether the F-type test can be extended to the multivariate linear
regression model. The main purpose of this section is to derive an extended

F-type test which serves the same purpose as the F-test in Chapters 19-22.

From Section 24.2 we know that for the MLE’s B and Ô

(i)

(24.99)

B~ N(B, Q @ (X’X)~');

and

(ii)

TO ~ W(Q, T—k).

(24.100)

Using these results we can deduce that, in the case where we consider one
regression from the system, say the ith,

(24.101)

yi=XB;+ Uy,
the MLE’s of ổ; and ,; are

ô.=(XX) 'Xy,

and


Ou=F ũù,

ô,—=y,— XỔ;

(24.102)

Moreover, using the properties of the multivariate normal (see Chapter 15)
and Wishart (see Appendix 1) distributions we can deduce that

Ê,~N(,œ;¿(XX)”!)

and

!{

tự
Œ) tỉ

)xzư-b.

(24.103)

These results ensure that, in the case of linear restrictions related to B; of the

form Hạ: R,8,=r, against H,: R;Ø;#r, where R; and r; are p;x k and p;x 1
known matrices and rank(R;)=p;, the F-test based on the test statistic
FT y=

R.Ð.—r,[R(XX)T!R,]
!{(R,P,—r,)/T—k

Pi
nit t(X X)
i] Œ i8; —T¡) (=

uu;

)

(24.104)

is applicable without any changes. In particular, tests of significance for
individual coefficients based on the test statistic

(24.105)
(a special case of (104)) is applicable to the present context without any
modifications; see Chapters 19 and 20.


590

The multivariate linear regression model

Let us now consider the derivation of a test for the null hypothesis:

H,):DB-—C=0

against

H,:DB—C#0


where D and C are pxk and pxm known matrices,
particularly important special case of (106) is when

D=(0,1;,): ko xk,

B
B=(3']

and

2

(24.106)
rank(D)=p.

A

C=0:k,xm,

Le. Hy: B, =O against H,: B, #0. The constrained MLE’s of B and Q under
Hạ take the form

B=B-—(XX)”!D[D(XX)”!D']r '!DB—C)

(24.107)

ð=Ơ tự (B -B)(XX)(B — ơ),

(24.108)


and

where Ê=(XX)"!X'Y, Ô=(1/T)P
Ô, Ô=Y —XÊ are the unconstrained

MLE’s of B and Q (see Section 24.3 above). Using the same intuitive
argument as in the m= 1 case (see Chapter 20) a test for Hy could be based
on the distance

|DB —C|j.

(24.109)

The closer this distance is to zero the more the support for Hạ. If we
normalise this distance by defining the matrix quadratic form

@-~!(DB —C)[D(XXJ)~1D']-!(DB —C),

(24.110)

the similarity between (110) and the F-test statistic (104) is all too apparent.
Moreover, in view of the equality

v0 = 0'U +(DB—Cÿ[D(XX)”!D']:!Dồ—C)

(24.111)

stemming from (108), (110) can be written in the form

G=(0T-U 000)",


(24.112)

where U =Y —XB. This form constitutes a direct extension of the F-test
Statistic to the general m> 1 case. Continuing the analogy, we can show that

Ô~

W⁄(Q,T—k),

T>m+k,

(24.113)

where Ù'Ũ =ƯM¿U, My =I—X(XX)~'X'. Moreover, in view of (112) the

distribution of O’U —U’U

is a direct extension of the non-central chi-

square distribution, the non-central Wishart, denoted by

(Ữ Õ - Ô)~ W„(O,p; A),

T>m+k,

(24.114)




×