Tải bản đầy đủ (.pdf) (40 trang)

Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 7 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (259.15 KB, 40 trang )

7 Estimating Systems of Equations by OLS and GLS
7.1 Introduction
This chapter begins our analysis of linear systems of equations. The first method of
estimation we cover is system ordinary least squares, which is a direct extension of
OLS for single equations. In some important special cases the system OLS estimator
turns out to have a straightforward interpretation in terms of single-equation OLS
estimators. But the method is applicable to very general linear systems of equations.
We then turn to a generalized least squares (GLS) analysis. Under certain as-
sumptions, GLS—or its operationalized version, feasible GLS—will turn out to be
asymptotically more e‰cient than system OLS. However, we emphasize in this chapter
that the e‰ciency of GLS comes at a price: it requires stronger assumptions than
system OLS in order to be consistent. This is a practica lly important point that is
often overlooked in traditional treatments of linear systems, particularly those which
assume that explanatory variables are nonrandom.
As with our single-equation analysis, we assume that a random sample is available
from the population. Usually the unit of observation is obvious—such as a worker, a
household, a firm, or a city. For example, if we collect consumption data on various
commodities for a sample of families, the unit of observation is the family (not a
commodity).
The framework of this chapter is general enough to apply to panel data models.
Because the asymptotic analysis is done as the cross section dimension tends to in-
finity, the results are explicitly for the case where the cross section dimension is large
relative to the time series dimension. (For example, we may have observations on N
firms over the same T time periods for each firm. Then, we assume we have a random
sample of firms that have data in each of the T years.) The panel data model covered
here, while having many useful applications, does not fully exploit the replicability
over time. In Chapters 10 and 11 we explicitly consider panel data models that con-
tain time-invariant, unobserved e¤ects in the error term.
7.2 Some Examples
We begin with two examples of systems of equations. These examples are fairly gen-
eral, and we will see later that variants of them can also be cast as a general linear


system of equations.
Example 7.1 (Seemingly Unrelated Regressions): The population model is a set of
G linear equations,
y
1
¼ x
1
b
1
þ u
1
y
2
¼ x
2
b
2
þ u
2
.
.
.
y
G
¼ x
G
b
G
þ u
G

ð7:1Þ
where x
g
is 1 Â K
g
and b
g
is K
g
 1, g ¼ 1; 2; ; G. In many applications x
g
is the
same for all g (in which case th e b
g
necessarily have the same dimension), but the
general model allows the elements and the dimension of x
g
to vary across equations.
Remember, the system (7.1) represents a generic person, firm, city, or whatever from
the population. The system (7.1) is often called Zellner’s (1962) seemingly unrelated
regressions (SUR) model (for cross section data in this case). The name comes from
the fact that, since each equation in the system (7.1) has its own vector b
g
, it appears
that the equations are unrelated. Nevertheless, correlation across the errors in di¤er-
ent equations can provide links that can be exploited in estimation; we will see this
point later.
As a specific example, the system (7.1) might represen t a set of demand functions
for the population of families in a country:
housing ¼ b

10
þ b
11
houseprc þ b
12
foodprc þb
13
clothprc þ b
14
income
þ b
15
size þ b
16
age þ u
1
food ¼ b
20
þ b
21
houseprc þb
22
foodprc þ b
23
clothprc þb
24
income
þ b
25
size þ b

26
age þ u
2
clothing ¼ b
30
þ b
31
houseprc þ b
32
foodprc þ b
33
clothprc þb
34
income
þ b
35
size þ b
36
age þ u
3
In this example, G ¼ 3 and x
g
(a 1 Â 7 vector) is the same for g ¼ 1; 2; 3.
When we need to write the equations for a particular random draw from the pop-
ulation, y
g
, x
g
, and u
g

will also contain an i subscript: equation g becomes y
ig
¼
x
ig
b
g
þ u
ig
. For the purposes of stating assumptions, it does not matter whether or
not we include the i subscript. The system (7.1) has the advantage of being less clut-
tered while focusing attention on the population, as is appropriate for applications.
But for derivations we will often need to indicate the equation for a generic cross
section unit i.
When we study the asymptotic properties of various estimators of the b
g
, the
asymptotics is done with G fixed and N tending to infinity. In the household demand
example, we are interested in a set of three demand functions, and the unit of obser-
Chapter 7144
vation is the family. Therefore, inference is done as the number of families in the
sample tends to infinity.
The assumptions that we make about how the unobservables u
g
are related to the
explanatory variables ðx
1
; x
2
; ; x

G
Þ are crucial for determining which estimators of
the b
g
have acceptable properties. Often, when system (7.1) represents a structural
model (without omitted variables, errors-in-variables, or simultaneity), we can as-
sume that
Eðu
g
jx
1
; x
2
; ; x
G
Þ¼0; g ¼ 1; ; G ð7:2Þ
One important implication of assumption (7.2) is that u
g
is uncorrelated with the
explanatory variables in all equation s, as well as all functions of these explanatory
variables. When system (7.1) is a system of equations derived from economic theory,
assumption (7.2) is often very natural. For example, in the set of demand functions
that we have presented, x
g
1 x is the same for all g, and so assumption (7.2) is the
same as Eðu
g
jx
g
Þ¼Eðu

g
jxÞ¼0.
If assumption (7.2) is maintained, and if the x
g
are not the same across g, then any
explanatory variables excluded from equation g are assumed to have no e¤ect on
expected y
g
once x
g
has been controlled for. That is,
Eðy
g
jx
1
; x
2
; x
G
Þ¼Eðy
g
jx
g
Þ¼x
g
b
g
; g ¼ 1; 2; ; G ð7:3Þ
There are examples of SUR systems where assumption (7.3) is too strong, but stan-
dard SUR analysis either explicitly or implicitly makes this assumption.

Our next example involves panel data.
Example 7.2 (Panel Data Model): Suppose that for each cross section unit we ob-
serve data on the same set of variables for T time periods. Let x
t
be a 1 ÂK vector
for t ¼ 1; 2; ; T, and let b be a K Â 1 vector. The model in the population is
y
t
¼ x
t
b þ u
t
; t ¼ 1; 2; ; T ð7:4Þ
where y
t
is a scalar. For example, a simple equation to explain annual family saving
over a five-year span is
sav
t
¼ b
0
þ b
1
inc
t
þ b
2
age
t
þ b

3
educ
t
þ u
t
; t ¼ 1; 2; ; 5
where inc
t
is annual income, educ
t
is years of education of the household head, and
age
t
is age of the household head. This is an example of a linear panel data model.It
is a static model because all explanatory variables are dated contemporaneously with
sav
t
.
The panel data setup is conceptually very di¤erent from the SUR example. In Ex-
ample 7.1, each equation explains a di¤erent dependent variable for the same cross
Estimating Systems of Equations by OLS and GLS 145
section unit. Here we only have one dependent variable we are trying to explain—
sav—but we observe sav, and the explanatory variables, over a five-year period.
(Therefore, the label ‘‘system of equations’’ is really a misnomer for panel data
applications. At this point, we are using the phrase to denote more than one equation
in any context.) As we will see in the next section, the statistical properties of esti-
mators in SUR and panel data models can be analyzed within the same structure.
When we need to indicate that an equation is for a particular cross section unit i
during a particular time period t, we write y
it

¼ x
it
b þ u
it
. We will omit the i sub-
script whenever its omission does not cause confusion.
What kinds of exogeneity assumptions do we use for panel data analysis? One
possibility is to assume that u
t
and x
t
are orthogonal in the conditional mean sense:
Eðu
t
jx
t
Þ¼0; t ¼ 1; ; T ð7:5Þ
We call this contemporaneous exogeneity of x
t
because it only restricts the relation-
ship between the disturbance and explanatory variables in the same time period. It is
very important to distinguish assumption (7.5) from the stronger assumption
Eðu
t
jx
1
; x
2
; ; x
T

Þ¼0; t ¼ 1; ; T ð7:6Þ
which, combined with model (7.4), is identical to Eðy
t
jx
1
; x
2
; ; x
T
Þ¼Eðy
t
jx
t
Þ.
Assumption (7.5) places no restrictions on the relationship between x
s
and u
t
for
s 0 t, while assumption (7.6) implies that each u
t
is uncorrelated with the explanatory
variables in all time periods. When assumption (7.6) holds, we say that the explana-
tory variables fx
1
; x
2
; ; x
t
; ; x

T
g are strictly exogenous.
To illustrate the di¤erence between assumptions (7.5) and (7.6), let x
t
1 ð1; y
tÀ1
Þ.
Then assumption (7.5) holds if Eðy
t
j y
tÀ1
; y
tÀ2
; ; y
0
Þ¼b
0
þ b
1
y
tÀ1
, which imposes
first-order dynamics in the conditional mean. However, assumption (7.6) must fail
since x
tþ1
¼ð1; y
t
Þ, and therefore Eðu
t
jx

1
; x
2
; ; x
T
Þ¼Eðu
t
j y
0
; y
1
; ; y
TÀ1
Þ¼u
t
for t ¼ 1; 2; ; T À 1 (because u
t
¼ y
t
À b
0
À b
1
y
tÀ1
Þ.
Assumption (7.6) can fail even if x
t
does not contain a lagged dependent variable.
Consider a model relating poverty rates to welfare spending per capita, at the city

level. A finite distributed lag (FDL) model is
poverty
t
¼ y
t
þ d
0
welfare
t
þ d
1
welfare
tÀ1
þ d
2
welfare
tÀ2
þ u
t
ð7:7Þ
where we assume a two-year e¤ect. The parameter y
t
simply denotes a di¤erent ag-
gregate time e¤ect in each year. It is reasonable to think that welfare spending reacts
to lagged poverty rates. An equation that captures this feedback is
welfare
t
¼ h
t
þ r

1
poverty
tÀ1
þ r
t
ð7:8Þ
Chapter 7146
Even if equation (7.7) contains enough lags of welfare spending, assu mption (7.6)
would be violated if r
1
0 0 in equation (7.8) bec ause welfare
tþ1
depends on u
t
and
x
tþ1
includes welfare
tþ1
.
How we go about consistently estimating b depends crucially on whether we
maintain assumption (7.5) or the stronger assumption (7.6). Assuming that the x
it
are
fixed in repeated samples is e¤ectively the same as making assumption (7.6).
7.3 System OLS Estimation of a Multivariate Linear System
7.3.1 Preliminaries
We now analyze a general multivariate model that contains the examples in Section
7.2, and many others, as special cases. Assume that we have independent, identically
distributed cross section observations fðX

i
; y
i
Þ: i ¼ 1; 2; ; Ng, where X
i
is a G ÂK
matrix and y
i
is a G Â 1 vector. Thus, y
i
contains the dependent variables for all G
equations (or time periods, in the panel data case). The matrix X
i
contains the ex -
planatory variables appearing anywhere in the system. For notational clarity we in-
clude the i subscript for stating the general model and the assumptions.
The multivariate linear model for a random draw from the population can be
expressed as
y
i
¼ X
i
b þ u
i
ð7:9Þ
where b is the K Â 1 parameter vector of interest and u
i
is a G Â 1 vector of un-
observables. Equation (7.9) explains the G variables y
i1

; ; y
iG
in terms of X
i
and
the unobservables u
i
. Because of the random sampling assumption, we can state all
assumptions in terms of a generic observation; in examples, we will often omit the i
subscript.
Before stating any assumptions, we show how the two examples introduced in
Section 7.2 fit into this framework.
Example 7.1 (SUR, continued): The SUR model (7.1) can be expressed as in
equation (7.9) by defining y
i
¼ðy
i1
; y
i2
; ; y
iG
Þ
0
, u
i
¼ðu
i1
; u
i2
; ; u

iG
Þ
0
, and
X
i
¼
x
i1
00ÁÁÁ 0
0x
i2
0
00
.
.
.
.
.
.
0
000ÁÁÁ x
iG
0
B
B
B
B
B
B

B
@
1
C
C
C
C
C
C
C
A
; b ¼
b
1
b
2
.
.
.
b
G
0
B
B
B
B
@
1
C
C

C
C
A
ð7:10Þ
Estimating Systems of Equations by OLS and GLS 147
Note that the dimension of X
i
is G ÂðK
1
þ K
2
þÁÁÁþK
G
Þ, so we define K 1
K
1
þÁÁÁþK
G
.
Example 7.2 (Panel Data, continued): The panel data model (7.6) can be expressed
as in equation (7.9) by choosing X
i
to be the T ÂK matrix X
i
¼ðx
0
i1
; x
0
i2

; ; x
0
iT
Þ
0
.
7.3.2 Asymptotic Properties of System OLS
Given the model in equation (7.9), we can state the key orthogonality condition for
consistent estimation of b by system ordinary least square s (SOLS).
assumption SOLS.1: EðX
0
i
u
i
Þ¼0.
Assumption SOLS.1 appears similar to the orthogonality condition for OLS analysis
of single equations. What it implies di¤ers across examples because of the multiple-
equation nature of equation (7.9). For most applications, X
i
has a su‰cient number
of elements equal to unity so that Assumption SOLS.1 implies that Eðu
i
Þ¼0, and we
assume zero mean for the sake of discussion.
It is informative to see what Assumption SOLS.1 entails in the previous examples.
Example 7.1 (SUR, continued): In the SUR case, X
0
i
u
i

¼ðx
i1
u
i1
; ; x
iG
u
iG
Þ
0
, and
so Assumption SOLS.1 holds if and only if
Eðx
0
ig
u
ig
Þ¼0; g ¼ 1; 2; ; G ð7:11Þ
Thus, Assumption SOLS.1 does not require x
ih
and u
ig
to be uncorrelated when
h 0 g.
Example 7.2 (Panel Data, continued): For the panel data setup, X
0
i
u
i
¼

P
T
t¼1
x
0
it
u
it
;
therefore, a su‰cient, and very natural, condition for Assumption SOLS.1 is
Eðx
0
it
u
it
Þ¼0; t ¼ 1; 2; ; T ð7:12Þ
Like assumption (7.5), assumption (7.12) allows x
is
and u
it
to be correlated when
s 0 t; in fact, assumption (7.12) is weaker than assumption (7.5). Therefore, As-
sumption SOLS.1 does not impose strict exogeneity in panel data contexts.
Assumption SOLS.1 is the weakest assumption we can impose in a regression
framework to get consistent estimators of b. As the previous examples show, As-
sumption SOLS.1 allows some elements of X
i
to be correlated with elements of u
i
.

Much stronger is the zero conditional mean assumption
Eðu
i
jX
i
Þ¼0 ð7:13Þ
Chapter 7148
which implies, among other things, that every element of X
i
and every element of u
i
are uncorrelated. [Of course, assumption (7.13) is not as strong as assuming that u
i
and X
i
are actually independent.] Even though assumption (7.13) is stronger than
Assumption SOLS.1, it is, nevertheless, reasonable in some applications.
Under Assumption SOLS.1 the vector b satisfies
E½X
0
i
ðy
i
À X
i
bÞ ¼ 0 ð7:14Þ
or EðX
0
i
X

i
Þb ¼ E ðX
0
i
y
i
Þ. For each i, X
0
i
y
i
is a K Â1 random vector and X
0
i
X
i
is a
K ÂK symmetric, positive semidefinite random matrix. Therefore, EðX
0
i
X
i
Þ is always
a K Â K symmetric, positive semidefinite nonrandom matrix (the expectation here is
defined over the population distribution of X
i
). To be able to estimate b we need to
assume that it is the only K Â1 vector that satisfies assumption (7.14).
assumption SOLS.2: A 1 EðX
0

i
X
i
Þ is nonsingular (has rank K ).
Under Assumptions SOLS.1 and SOLS.2 we can write b as
b ¼½EðX
0
i
X
i
Þ
À1
EðX
0
i
y
i
Þð7:15Þ
which shows that Assumptions SOLS.1 and SOLS.2 identify the vector b. The anal-
ogy principle suggests that we estimate b by the sample analogue of assumption
(7.15). Define the system ordinary least squares (SOLS) estimator of b as
^
bb ¼ N
À1
X
N
i¼1
X
0
i

X
i
!
À1
N
À1
X
N
i¼1
X
0
i
y
i
!
ð7:16Þ
For computing
^
bb using matrix language programming, it is sometimes useful to write
^
bb ¼ðX
0

À1
X
0
Y, where X 1 ðX
0
1
; X

0
2
; ; X
0
N
Þ
0
is the NG ÂK matrix of stacked X
and Y 1 ðy
0
1
; y
0
2
; ; y
0
N
Þ
0
is the NG Â 1 vector of stacked observations on the y
i
. For
asymptotic derivations, equation (7.16) is much more convenient. In fact, the con-
sistency of
^
bb can be read o¤ of equation (7.16) by taking probability limits. We
summarize with a theorem:
theorem 7.1 (Consistency of System OLS): Under Assumptions SOLS.1 and
SOLS.2,
^

bb !
p
b.
It is useful to see what the system OLS estimator looks like for the SUR and panel
data examples.
Example 7.1 (SUR, continued): For the SUR model,
Estimating Systems of Equations by OLS and GLS 149
X
N
i¼1
X
0
i
X
i
¼
X
N
i¼1
x
0
i1
x
i1
00ÁÁÁ 0
0x
0
i2
x
i2

0
00
.
.
.
.
.
.
0
000ÁÁÁ x
0
iG
x
iG
0
B
B
B
B
B
B
B
@
1
C
C
C
C
C
C

C
A
;
X
N
i¼1
X
0
i
y
i
¼
X
N
i¼1
x
0
i1
y
i1
x
0
i2
y
i2
.
.
.
x
0

iG
y
iG
0
B
B
B
B
@
1
C
C
C
C
A
Straightforward inversion of a block diagonal matrix shows that the OLS estimator
from equation (7.16) can be written as
^
bb ¼ð
^
bb
0
1
;
^
bb
0
2
; ;
^

bb
0
G
Þ
0
, where each
^
bb
g
is just the
single-equation OLS estimator from the gth equation. In other words, system OLS
estimation of a SUR model (without restrictions on the parameter vectors b
g
)is
equivalent to OLS equation by equation. Assumption SOLS.2 is easily seen to hold if
Eðx
0
ig
x
ig
Þ is nonsingular for all g.
Example 7.2 (Panel Data, continued): In the panel data case,
X
N
i¼1
X
0
i
X
i

¼
X
N
i¼1
X
T
t¼1
x
0
it
x
it
;
X
N
i¼1
X
0
i
y
i
¼
X
N
i¼1
X
T
t¼1
x
0

it
y
it
Therefore, we can write
^
bb as
^
bb ¼
X
N
i¼1
X
T
t¼1
x
0
it
x
it
!
À1
X
N
i¼1
X
T
t¼1
x
0
it

y
it
!
ð7:17Þ
This estimator is called the pooled ordinary least squares (POLS) estimator because it
corresponds to running OLS on the observations pooled across i and t. We men-
tioned this estimator in the context of independent cross sections in Section 6.3. The
estimator in equation (7.17) is for the same cross section units sampled at di¤erent
points in time. Theorem 7.1 shows that the POLS estimator is consistent under
the orthogonality conditions in assumption (7.12) and the mild condition rank

P
T
t¼1
x
0
it
x
it
Þ¼K.
In the general system (7.9), the system OLS estimator does not necessarily have an
interpretation as OLS equation by equation or as pooled OLS. As we will see in
Section 7.7 for the SUR setup, sometimes we want to impose cross equation restric-
tions on the b
g
, in which case the system OLS estimator has no simple interpretation.
While OLS is consistent under Assumptions SOLS.1 and SOLS.2, it is not neces-
sarily unbiased. Assumption (7.13), and the finite sample assumption rankðX
0
XÞ¼

K, do ensure unbiasedness of OLS conditional on X. [This conclusion follows be-
cause, under independent sampling, Eðu
i
jX
1
; X
2
; ; X
N
Þ¼Eðu
i
jX
i
Þ¼0 under as-
Chapter 7150
sumption (7.13).] We focus on the weaker Assumption SOLS.1 because assumption
(7.13) is often violated in economic applications, something we will see especially in
our panel data analysis.
For inference, we need to find the asymptotic variance of the OLS estimator under
essentially the same two assumptions; technically, the following derivation requires
the elements of X
0
i
u
i
u
0
i
X
i

to have finite expected absolute value. From (7.16) and (7.9)
write
ffiffiffiffiffi
N
p
ð
^
bb À bÞ¼ N
À1
X
N
i¼1
X
0
i
X
i
!
À1
N
À1=2
X
N
i¼1
X
0
i
u
i
!

Because EðX
0
i
u
i
Þ¼0 under Assumption SOLS.1, the CLT implies that
N
À1=2
X
N
i¼1
X
0
i
u
i
!
d
Normalð0; BÞð7:18Þ
where
B 1 EðX
0
i
u
i
u
0
i
X
i

Þ1 VarðX
0
i
u
i
Þð7:19Þ
In particular, N
À1=2
P
N
i¼1
X
0
i
u
i
¼ O
p
ð1Þ. But ð X
0
X=NÞ
À1
¼ A
À1
þ o
p
ð1Þ,so
ffiffiffiffiffi
N
p

ð
^
bb À bÞ¼A
À1
N
À1=2
X
N
i¼1
X
0
i
u
i
!
þ½ðX
0
X=NÞ
À1
À A
À1
 N
À1=2
X
N
i¼1
X
0
i
u

i
!
¼ A
À1
N
À1=2
X
N
i¼1
X
0
i
u
i
!
þ o
p
ð1ÞÁO
p
ð1Þ
¼ A
À1
N
À1=2
X
N
i¼1
X
0
i

u
i
!
þ o
p
ð1Þð7:20Þ
Therefore, just as with single-equation OLS and 2SLS, we have obtained an asymp-
totic representation for
ffiffiffiffiffi
N
p
ð
^
bb À bÞ that is a nonrandom linear combination of a par-
tial sum that satisfies the CLT. Equations (7.18) and (7.20) and the asymptotic
equivalence lemma imply
ffiffiffiffiffi
N
p
ð
^
bb À bÞ!
d
Normalð0; A
À1
BA
À1
Þð7:21Þ
We summarize with a theorem.
theorem 7.2 (Asymptotic Normality of SOLS): Under Assumptions SOLS.1 and

SOLS.2, equation (7.21) holds.
Estimating Systems of Equations by OLS and GLS 151
The asymptotic variance of
^
bb is
Avarð
^
bbÞ¼A
À1
BA
À1
=N ð7:22Þ
so that Avarð
^
bbÞ shrinks to zero at the rate 1=N, as expected. Consistent estimation of
A is simple:
^
AA 1 X
0
X=N ¼ N
À1
X
N
i¼1
X
0
i
X
i
ð7:23Þ

A consistent estimator of B can be found using the analogy principle. First, because
B ¼ EðX
0
i
u
i
u
0
i
X
i
Þ, N
À1
P
N
i¼1
X
0
i
u
i
u
0
i
X
i
!
p
B. Since the u
i

are not observed, we replace
them with the SOLS residuals:
^
uu
i
1 y
i
À X
i
^
bb ¼ u
i
À X
i
ð
^
bb À bÞð7:24Þ
Using matrix algebra and the law of large numbers, it can be shown that
^
BB 1 N
À1
X
N
i¼1
X
0
i
^
uu
i

^
uu
0
i
X
i
!
p
B ð7:25Þ
[To establish equation (7.25), we need to assume that certain moments involving X
i
and u
i
are finite.] Therefore, Avar
ffiffiffiffiffi
N
p
ð
^
bb À bÞ is consistently estimated by
^
AA
À1
^
BB
^
AA
À1
,
and Avarð

^
bbÞ is estimated as
^
VV 1
X
N
i¼1
X
0
i
X
i
!
À1
X
N
i¼1
X
0
i
^
uu
i
^
uu
0
i
X
i
!

X
N
i¼1
X
0
i
X
i
!
À1
ð7:26Þ
Under Assumptions SOLS.1 and SOLS.2, we perform inference on b as if
^
bb is nor-
mally distributed with mean b and variance matrix (7.26). The square roots of the
diagonal elements of the matrix (7.26) are reported as the asymptotic standard errors.
The t ratio,
^
bb
j
=seð
^
bb
j
Þ, has a limiting normal distribution under the null hypothesis
H
0
: b
j
¼ 0. Sometimes the t statistics are treated as being distributed as t

NGÀK
, which
is asymptotically valid because NG À K should be large.
The estimator in matrix (7.26) is another example of a robust variance matrix esti-
mator because it is valid without any second-moment assumptions on the errors u
i
(except, as usual, that the second moments are well defined). In a multivariate setting
it is important to know what this robustness allows. First, the G Â G unconditional
variance matrix, W 1 Eðu
i
u
0
i
Þ, is entirely unrestricted. This fact allows cross equation
correlation in an SUR system as well as di¤erent error variances in each equation.
In panel data models, an unrestricted W allows for arbitrary serial correlation and
Chapter 7152
time-varying variances in the disturbances. A second kind of robustness is that the
conditional variance matrix, Varðu
i
jX
i
Þ, can depend on X
i
in an arbitrary, unknown
fashion. The generality a¤orded by formula (7.26) is possible because of the N ! y
asymptotics.
In special cases it is useful to impose more structure on the conditional and un-
conditional variance matrix of u
i

in order to simplify estimation of the asymptotic
variance. We will cover an important case in Section 7.5.2. Essentially, the key re-
striction will be that the conditional and unconditional variances of u
i
are the same.
There are also some special assumptions that greatly simplify the analysis of the
pooled OLS estimator for panel data; see Section 7.8.
7.3.3 Testing Multiple Hypotheses
Testing multiple hypotheses in a very robust manner is easy once
^
VV in matrix (7.26)
has been obtained. The robust Wald statistic for testing H
0
: Rb ¼ r, where R is Q Â K
with rank Q and r is Q Â1, has its usual form, W ¼ðR
^
bb À rÞ
0
ðR
^
VVR
0
Þ
À1
ðR
^
bb À rÞ.
Under H
0
, W @

a
w
2
Q
. In the SUR case this is the easiest and most robust way of
testing cross equation restrictions on the parameters in di¤erent equations using sys-
tem OLS. In the panel data setting, the robust Wald test provides a way of testing
multiple hypotheses about b without assuming homoskedasticity or serial indepen-
dence of the errors.
7.4 Consistency and Asymptotic Normality of Generalized Least Squares
7.4.1 Consistency
System OLS is consistent under fairly weak assumptions, and we have seen how to
perform robust inference using OLS. If we strengthen Assumption SOLS.1 and add
assumptions on the conditional variance matrix of u
i
, we can do better using a gen-
eralized least squares procedure. As we will see, GLS is not usually feasible because it
requires knowing the variance matrix of the errors up to a multiplicative constant.
Nevertheless, deriving the consistency and asymptotic distribution of the GLS esti-
mator is worthwhile because it turns out that the feasible GLS estimator is asymp-
totically equivalent to GLS.
We start with the model (7.9), but consistency of GLS generally requires a stronger
assumption than Assumption SOLS.1. We replace Assumption SOLS.1 with the as-
sumption that each element of u
i
is uncorrelated with each element of X
i
. We can
state this succinctly using the Kronecker product:
Estimating Systems of Equations by OLS and GLS 153

assumption SGLS.1: EðX
i
n u
i
Þ¼0.
Typically, at least one element of X
i
is unity, so in practice Assumption SGLS.1
implies that Eðu
i
Þ¼0. We will assume u
i
has a zero mean for our discussion but not
in proving any results.
Assumption SGLS.1 plays a crucial role in establishing consistency of the GLS
estimator, so it is important to recognize that it puts more restrictions on the ex-
planatory variables than does Assumption SOLS.1. In other words, when we allow
the explanatory variabl es to be random, GLS requires a stronger assumption than
system OLS in order to be consistent. Su‰cient for Assumption SGLS.1, but not
necessary, is the zero conditional mean assumption (7.13). This conclusion follows
from a standard iterated expectations argument.
For GLS estimation of multivariate equations with i.i.d. observations, the second-
moment matrix of u
i
plays a key role. Define the G Â G symmetric, positive semi-
definite matrix
W 1 Eðu
i
u
0

i
Þð7:27Þ
As mentioned in Section 7.3.2, we call W the unconditional variance matrix of u
i
.[In
the rare case that Eðu
i
Þ0 0, W is not the variance matrix of u
i
, but it is always the
appropriate matrix for GLS estimation.] It is important to remember that expression
(7.27) is definitional: because we are using random sampling, the unconditional vari-
ance matrix is necessarily the same for all i.
In place of Assumption SOLS.2, we assume that a weighted version of the expected
outer product of X
i
is nonsingular.
assumption SGLS.2: W is positive definite and EðX
0
i
W
À1
X
i
Þ is nonsingular.
For the general treatment we assume that W is positive definite, rather than just
positive semidefinite. In applications where the dependent variables across equations
satisfy an adding up constraint—such as expenditure shares summing to unity—an
equation must be dropped to ensure that W is nonsingular, a topic we return to in
Section 7.7.3. As a practical matter, Assumption SGLS.2 is not very restrictive. The

assumption that the K Â K matrix EðX
0
i
W
À1
X
i
Þ has rank K is the analogue of As-
sumption SOLS.2.
The usual motivation for the GLS estimator is to transform a system of equations
where the error has nonscalar variance-covariance matrix into a system where the
error vector has a scalar variance-covariance matrix. We obtain this by multiplying
equation (7.9) by W
À1=2
:
Chapter 7154
W
À1=2
y
i
¼ðW
À1=2
X
i
Þb þ W
À1=2
u
i
; or y
Ã

i
¼ X
Ã
i
b þ u
Ã
i
ð7:28Þ
Simple algebra shows that Eðu
Ã
i
u
Ã0
i
Þ¼I
G
.
Now we estimate equation (7.28) by system OLS. (As yet, we have no real justifi-
cation for this step, but we know SOLS is consistent under some assumptions.) Call
this estimator b
Ã
. Then
b
Ã
1
X
N
i¼1
X
Ã0

i
X
Ã
i
!
À1
X
N
i¼1
X
Ã0
i
y
Ã
i
!
¼
X
N
i¼1
X
0
i
W
À1
X
i
!
À1
X

N
i¼1
X
0
i
W
À1
y
i
!
ð7:29Þ
This is the generalized least squares (GLS) estimator of b. Under Assumption
SGLS.2, b
Ã
exists with probability approaching one as N ! y.
We can write b
Ã
using full matrix notation as b
Ã
¼½X
0
ðI
N
n W
À1
ÞX
À1
Á
½X
0

ðI
N
n W
À1
ÞY, where X and Y are the data matrices defined in Section 7.3.2 and
I
N
is the N Â N identity matrix. But for establishing the asymptotic properties of b
Ã
,
it is most convenient to work with equation (7.29).
We can establish consistency of b
Ã
under Assumptions SGLS.1 and SGLS.2 by
writing
b
Ã
¼ b þ N
À1
X
N
i¼1
X
0
i
W
À1
X
i
!

À1
N
À1
X
N
i¼1
X
0
i
W
À1
u
i
!
ð7:30Þ
By the weak law of large numbers (WLLN), N
À1
P
N
i¼1
X
0
i
W
À1
X
i
!
p
EðX

0
i
W
À1
X
i
Þ.By
Assumption SGLS.2 and Slutsky’s theorem (Lemma 3.4), N
À1
P
N
i¼1
X
0
i
W
À1
X
i

À1
!
p
A
À1
, where A is now defined as
A 1 EðX
0
i
W

À1
X
i
Þð7:31Þ
Now we must show that plim N
À1
P
N
i¼1
X
0
i
W
À1
u
i
¼ 0. By the WLLN, it is su‰cient
that EðX
0
i
W
À1
u
i
Þ¼0. This is where Assumption SGLS.1 comes in. We can argue this
point informally because W
À1
X
i
is a linear combination of X

i
, and since each element
of X
i
is uncorrelated with each element of u
i
, any linear combination of X
i
is uncor-
related with u
i
. We can also show this directly using the algebra of Kronecker prod-
ucts and vectorization. For conformable matrices D, E, and F, recall that vecðDEFÞ
¼ðF
0
n DÞ vecðEÞ, where vecðCÞ is the vectorization of the matrix C. [That is, vecðCÞ
is the column vector obtained by stacking the columns of C from first to last; see
Theil (1983).] Therefore, under Assumption SGLS.1,
vec EðX
0
i
W
À1
u
i
Þ¼E½ðu
0
i
n X
0

i
Þ vecðW
À1
Þ¼E½ðu
i
n X
i
Þ
0
 vecðW
À1
Þ¼0
Estimating Systems of Equations by OLS and GLS 155
where we have also used the fact that the expectation and vec operators can be
interchanged. We can now read the consistency of the GLS estimator o¤ of equation
(7.30). We do not state this conclusion as a theorem because the GLS estimator itself
is rarely available.
The proof of consistency that we have sketched fails if we only make Assumption
SOLS.1: EðX
0
i
u
i
Þ¼0 does not imply EðX
0
i
W
À1
u
i

Þ¼0, except when W and X
i
have
special structures. If Assumption SOLS.1 holds but Assumption SGLS.1 fails, the
transformation in equation (7.28) generally induces correlation between X
Ã
i
and u
Ã
i
.
This can be an important point, especially for certain panel data applications. If we
are willing to make the zero conditional mean assumption (7.13), b
Ã
can be shown to
be unbiased conditional on X.
7.4.2 Asymptotic Normality
We now sketch the asymptotic normality of the GLS estimator under Assumptions
SGLS.1 and SGLS.2 and some weak moment conditions. The first step is familiar:
ffiffiffiffiffi
N
p
ðb
Ã
À bÞ¼ N
À1
X
N
i¼1
X

0
i
W
À1
X
i
!
À1
N
À1=2
X
N
i¼1
X
0
i
W
À1
u
i
!
ð7:32Þ
By the CLT, N
À1=2
P
N
i¼1
X
0
i

W
À1
u
i
!
d
Normalð0; BÞ, where
B 1 EðX
0
i
W
À1
u
i
u
0
i
W
À1
X
i
Þð7:33Þ
Further, since N
À1=2
P
N
i¼1
X
0
i

W
À1
u
i
¼ O
p
ð1Þ and ðN
À1
P
N
i¼1
X
0
i
W
À1
X
i
Þ
À1
À A
À1
¼
o
p
ð1Þ, we can write
ffiffiffiffiffi
N
p
ðb

Ã
À bÞ¼A
À1
ðN
À1=2
P
N
i¼1
x
0
i
W
À1
u
i
Þþo
p
ð1Þ. It follows from
the asymptotic equivalence lemma that
ffiffiffiffiffi
N
p
ðb
Ã
À bÞ @
a
Normalð0; A
À1
BA
À1

Þð7:34Þ
Thus,
Avarð
^
bbÞ¼A
À1
BA
À1
=N ð7:35Þ
The asymptotic variance in equation (7.35) is not the asymptotic variance usually
derived for GLS estimation of systems of equations. Usually the formula is reported
as A
À1
=N. But equation (7.35) is the appropriate expression under the assumptions
made so far. The simpler form, whic h results when B ¼ A, is not generally valid
under Assumptions SGLS.1 and SGLS.2, because we have assumed nothing about
the variance matrix of u
i
conditional on X
i
. In Section 7.5.2 we make an assumption
that simplifies equation (7.35).
Chapter 7156
7.5 Feasible GLS
7.5.1 Asymptotic Properties
Obtaining the GLS estimator b
Ã
requires knowing W up to scale. That is, we must be
able to write W ¼ s
2

C where C is a known G Â G positive definite matrix and s
2
is
allowed to be an unknown constant. Sometimes C is known (one case is C ¼ I
G
), but
much more often it is unknown. Therefore, we now turn to the analysis of feasible
GLS (FGLS) estimation.
In FGLS estimation we replace the unknown matrix W with a consistent estimator.
Because the estimator of W appears highly nonlinearly in the expression for the
FGLS estimator, deriving finite sample properties of FGLS is generally di‰cult.
[However, under essentially assumption (7.13) and some additional assumptions,
including symmetry of the distribution of u
i
, Kakwani (1967) showed that the distri-
bution of the FGLS is symmetric about b, a property which means that the FGLS
is unbiased if its expected value exists; see also Schmidt (1976, Section 2.5).] The
asymptotic properties of the FGLS estimator are easily established as N ! y be-
cause, as we will show, its first-order asymptotic properties are identical to those of
the GLS estimator under Assumptions SGLS.1 and SGLS.2. It is for this purpose
that we spent some time on GLS. After establishing the asymptotic equivalence, we
can easily obtain the limiting distribution of the FGLS estimator. Of course, GLS is
trivially a special case of FGLS, where there is no first-stage estimation error.
We assume we have a consistent estimator,
^
WW,ofW:
plim
N!y
^
WW ¼ W ð7:36Þ

[Because the dimension of
^
WW does not depend on N , equation (7.36) makes sense
when defined element by element.] When W is allowed to be a general positive definite
matrix, the following estimation approach can be used. First, obtain the system OLS
estimator of b, which we denote
^
^
bb
^
bb in this section to avoid confusion. We already
showed that
^
^
bb
^
bb is consistent for b under Assumptions SOLS.1 and SOLS.2, and
therefore under Assumptions SGLS.1 and SOLS.2. (In what follows, we assume that
Assumptions SOLS.2 and SGLS.2 both hold.) By the WLLN, plimðN
À1
P
N
i¼1
u
i
u
0
i
Þ¼
W, and so a natural estimator of W is

^
WW 1 N
À1
X
N
i¼1
^
^
uu
^
uu
i
^
^
uu
^
uu
0
i
ð7:37Þ
Estimating Systems of Equations by OLS and GLS 157
where
^
^
uu
^
uu
i
1 y
i

À X
i
^
^
bb
^
bb are the SOLS residuals. We can show that this estimator is con-
sistent for W under Assumptions SGLS.1 and SOLS.2 and standard moment con-
ditions. First, write
^
^
uu
^
uu
i
¼ u
i
À X
i
ð
^
^
bb
^
bb À bÞð7:38Þ
so that
^
^
uu
^

uu
i
^
^
uu
^
uu
0
i
¼ u
i
u
0
i
À u
i
ð
^
^
bb
^
bb À bÞ
0
X
0
i
À X
i
ð
^

^
bb
^
bb À bÞu
0
i
þ X
i
ð
^
^
bb
^
bb À bÞð
^
^
bb
^
bb À bÞ
0
X
0
i
ð7:39Þ
Therefore, it su‰ces to show that the averages of the last three terms converge in
probability to zero. Write the average of the vec of the first term as N
À1
P
N
i¼1

ðX
i
n u
i
ÞÁ
ð
^
^
bb
^
bb À bÞ, which is o
p
ð1Þ because plimð
^
^
bb
^
bb À bÞ¼0 and N
À1
P
N
i¼1
ðX
i
n u
i
Þ!
p
0. The
third term is the transpose of the second. For the last term in equation (7.39), note

that the average of its vec can be written as
N
À1
X
N
i¼1
ðX
i
n X
i
ÞÁvecfð
^
^
bb
^
bb À bÞð
^
^
bb
^
bb À bÞ
0
gð7:40Þ
Now vecfð
^
^
bb
^
bb À bÞð
^

^
bb
^
bb À bÞ
0
g¼o
p
ð1Þ. Further, assuming that each element of X
i
has
finite second moment, N
À1
P
N
i¼1
ðX
i
n X
i
Þ¼O
p
ð1Þ by the WLLN. This step takes
care of the last term, since O
p
ð1ÞÁo
p
ð1Þ¼o
p
ð1Þ. We have shown that
^

WW ¼ N
À1
X
N
i¼1
u
i
u
0
i
þ o
p
ð1Þð7:41Þ
and so equation (7.36) follows immediately. [In fact, a more careful analysis shows
that the o
p
ð1Þ in equat ion (7.41) can be replaced by o
p
ðN
À1=2
Þ; see Problem 7.4.]
Sometimes the elements of W are restricted in some way (an important example is
the random e¤ects panel data model that we will cover in Chapter 10). In such cases
a di¤erent estimator of W is often used that exploits these restrictions. As with
^
WW
in equation (7.37), such estimators typically use the system OLS residuals in some
fashion and lead to consistent estimators assuming the structure of W is correctly
specified. The advantage of equation (7.37) is that it is consistent for W quite gener-
ally. However, if N is not very large relative to G, equation (7.37) can have poor finite

sample properties.
Given
^
WW,thefeasible GLS (FGLS) estimator of b is
^
bb ¼
X
N
i¼1
X
0
i
^
WW
À1
X
i
!
À1
X
N
i¼1
X
0
i
^
WW
À1
y
i

!
ð7:42Þ
or, in full matrix notation,
^
bb ¼½X
0
ðI
N
n
^
WW
À1
ÞX
À1
½X
0
ðI
N
n
^
WW
À1
ÞY.
Chapter 7158
We have already shown that the (infeasible) GLS estimator is consistent under
Assumptions SGLS.1 and SGLS.2. Becau se
^
WW converges to W, it is not surprising
that FGLS is also consistent. Rather than show this result separately, we verify the
stronger result that FGLS has the same limiting distribution as GLS.

The limiting distribution of FGLS is obtained by writing
ffiffiffiffiffi
N
p
ð
^
bb À bÞ¼ N
À1
X
N
i¼1
X
0
i
^
WW
À1
X
i
!
À1
N
À1=2
X
N
i¼1
X
0
i
^

WW
À1
u
i
!
ð7:43Þ
Now
N
À1=2
X
N
i¼1
X
0
i
^
WW
À1
u
i
À N
À1=2
X
N
i¼1
X
0
i
W
À1

u
i
¼ N
À1=2
X
N
i¼1
ðu
i
n X
i
Þ
0
"#
vecð
^
WW
À1
À W
À1
Þ
Under Assumption SGLS.1, the CLT implies that N
À1=2
P
N
i¼1
ðu
i
n X
i

Þ¼O
p
ð1Þ.
Because O
p
ð1ÞÁo
p
ð1Þ¼o
p
ð1Þ, it follows that
N
À1=2
X
N
i¼1
X
0
i
^
WW
À1
u
i
¼ N
À1=2
X
N
i¼1
X
0

i
W
À1
u
i
þ o
p
ð1Þ
A similar argument shows that N
À1
P
N
i¼1
X
0
i
^
WW
À1
X
i
¼ N
À1
P
N
i¼1
X
0
i
W

À1
X
i
þ o
p
ð1Þ.
Therefore, we have shown that
ffiffiffiffiffi
N
p
ð
^
bb À bÞ¼ N
À1
X
N
i¼1
X
0
i
W
À1
X
i
!
À1
N
À1=2
X
N

i¼1
X
0
i
W
À1
u
i
!
þ o
p
ð1Þð7:44Þ
The first term in equation (7.44) is just
ffiffiffiffiffi
N
p
ðb
Ã
À bÞ, where b
Ã
is the GLS estimator.
We can write equation (7.44) as
ffiffiffiffiffi
N
p
ð
^
bb À b
Ã
Þ¼o

p
ð1Þð7:45Þ
which shows that
^
bb and b
Ã
are
ffiffiffiffiffi
N
p
-equivalent. Recall from Chapter 3 that this
statement is much stronger than simply saying that b
Ã
and
^
bb are both consistent for
b. There are many estimators, such as system OLS, that are consistent for b but are
not
ffiffiffiffiffi
N
p
-equivalent to b
Ã
.
The asymptotic equivalence of
^
bb and b
Ã
has practically important consequences. The
most important of these is that, for performing asymptotic inference about b using

^
bb, we do not have to worry that
^
WW is an estimator of W. Of course, whether the
asymptotic approximation gives a reasonable approximation to the actual distribu-
tion of
^
bb is di‰cult to tell. With large N, the approximation is usually pretty good.
Estimating Systems of Equations by OLS and GLS 159
But if N is small relative to G, ignoring estimation of W in performing inference
about b can be misleading.
We summarize the limiting distribution of FGLS with a theorem.
theorem 7.3 (Asymptotic Normality of FGLS): Under Assumptions SGLS.1 and
SGLS.2,
ffiffiffiffiffi
N
p
ð
^
bb À bÞ @
a
Normalð0; A
À1
BA
À1
Þð7:46Þ
where A is defined in equation (7.31) and B is defined in equation (7.33).
In the FGLS context a consistent estimator of A is
^
AA 1 N

À1
X
N
i¼1
X
0
i
^
WW
À1
X
i
ð7:47Þ
A consistent estimator of B is also readily available after FGLS estimation. Define
the FGLS residuals by
^
uu
i
1 y
i
À X
i
^
bb; i ¼ 1; 2; ; N ð7:48Þ
[The only di¤erence between the FGLS and SOLS residuals is that the FGLS esti-
mator is inserted in place of the SOLS estimator; in particular, the FGLS residuals
are not from the transformed equation (7.28).] Using standard arguments, a consis-
tent estimator of B is
^
BB 1 N

À1
X
N
i¼1
X
0
i
^
WW
À1
^
uu
i
^
uu
0
i
^
WW
À1
X
i
The estimator of Avarð
^
bbÞ can be written as
^
AA
À1
^
BB

^
AA
À1
=N ¼
X
N
i¼1
X
0
i
^
WW
À1
X
i
!
À1
X
N
i¼1
X
0
i
^
WW
À1
^
uu
i
^

uu
0
i
^
WW
À1
X
i
!
X
N
i¼1
X
0
i
^
WW
À1
X
i
!
À1
ð7:49Þ
This is the extension of the White (1980b) heteroskedasticity-robust asymptotic vari-
ance estimator to the case of systems of equations; see also White (1984). This esti-
mator is valid under Assumptions SGLS.1 and SGLS.2; that is, it is completely
robust.
7.5.2 Asymptotic Variance of FGLS under a Standard Assumption
Under the assumptions so far, FGLS really has nothing to o¤er over SOLS. In ad-
dition to being computational ly more di‰cult, FGLS is less robust than SOLS. So

why is FGLS used? The answer is that, under an additional assumption, FGLS is
Chapter 7160
asymptotically more e‰cient than SOLS (and other estimators). First, we state the
weakest condition that simplifies estimation of the asymptotic variance for FGLS.
For reasons to be seen shortly, we call this a system homoskedasticity assumption.
assumption SGLS.3: EðX
0
i
W
À1
u
i
u
0
i
W
À1
X
i
Þ¼EðX
0
i
W
À1
X
i
Þ, where W 1 Eðu
i
u
0

i
Þ.
Another way to state this assumption is, B ¼ A, which, from expression (7.46), sim-
plifies the asymptotic variance. As stated, Assumption SGLS.3 is somewhat di‰cult
to interpret. When G ¼ 1, it reduces to Assumption OLS.3. When W is diagonal and
X
i
has either the SUR or panel data structure, Assumption SGLS.3 implies a kind of
conditional homoskedasticity in each equation (or time period). Generally, Assump-
tion SGLS.3 puts restrictions on the conditional variances and covariances of ele-
ments of u
i
. A su‰cient (though certainly not necessary) condition for Assumption
SGLS.3 is easier to interpret:
Eðu
i
u
0
i
jX
i
Þ¼Eðu
i
u
0
i
Þð7:50Þ
If Eðu
i
jX

i
Þ¼0, then assumption (7.50) is the same as assuming Varðu
i
jX
i
Þ¼
Varðu
i
Þ¼W, which means that each variance and each covariance of elements
involving u
i
must be constant conditional on all of X
i
. This is a very natural way of
stating a system homoskedasticity assumption, but it is sometimes too strong.
When G ¼ 2, W contains three distinct elements, s
2
1
¼ Eðu
2
i1
Þ, s
2
2
¼ Eðu
2
i2
Þ, and
s
12

¼ Eðu
i1
u
i2
Þ. These elements are not restricted by the assumptions we have made.
(The inequality js
12
j < s
1
s
2
must always hold for W to be a nonsingular covariance
matrix.) However, assumption (7.50) requires Eðu
2
i1
jX
i
Þ¼s
2
1
,Eðu
2
i2
jX
i
Þ¼s
2
2
, and
Eðu

i1
u
i2
jX
i
Þ¼s
12
: the conditional variances and covariance must not depend on X
i
.
That assumption (7.50) implies Assumption SGLS.3 is a consequence of iterated
expectations:
EðX
0
i
W
À1
u
i
u
0
i
W
À1
X
i
Þ¼E½EðX
0
i
W

À1
u
i
u
0
i
W
À1
X
i
jX
i
Þ
¼ E½ X
0
i
W
À1
Eðu
i
u
0
i
jX
i
ÞW
À1
X
i
¼EðX

0
i
W
À1
WW
À1
X
i
Þ
¼ EðX
0
i
W
À1
X
i
Þ
While assumption (7.50) is easier to intepret, we use Assumption SGLS.3 for stating
the next theorem because there are cases, including some dynamic panel data models,
where Assumption SGLS.3 holds but assumption (7.50) does not.
theorem 7.4 (Usual Variance Matrix for FGLS): Under Assump tions SGLS.1–
SGLS.3, the asymptotic variance of the FGLS estimator is Avarð
^
bbÞ¼A
À1
=N 1
½EðX
0
i
W

À1
X
i
Þ
À1
=N.
Estimating Systems of Equations by OLS and GLS 161
We obtain an estimator of Avarð
^
bbÞ by using our consistent estimator of A:
Av
^
aarð
^
bbÞ¼
^
AA
À1
=N ¼
X
N
i¼1
X
0
i
^
WW
À1
X
i

!
À1
ð7:51Þ
Equation (7.51) is the ‘‘usual’’ formula for the asymptotic variance of FGLS. It is
nonrobust in the sense that it relies on Assumption SGLS.3 in addition to Assump-
tions SGLS.1 and SGLS.2. If heteroskedasticity in u
i
is suspected, then the robust
estimator (7.49) should be used.
Assumption (7.50) also has important e‰ciency implications. One consequence of
Problem 7.2 is that, under Assumptions SGL S.1, SOLS.2, SGLS.2, a nd (7.50), the
FGLS estimator is more e‰cient than the system OLS estimator. We can actually say
much more: FGLS is more e‰cient than any other estimator that uses th e ortho-
gonality conditions EðX
i
n u
i
Þ¼0. This conclusion will follow as a special case of
Theorem 8.4 in Chapter 8, where we define the class of competing estimators. If
we replace Assumption SGLS.1 with the zero conditional mean assumption (7.13),
then an even stronger e‰ciency result holds for FGLS, something we treat in
Section 8.6.
7.6 Testing Using FGLS
Asymptotic standard errors are obtained in the usual fashion from the asymptotic
variance estimates. We can use the nonrobust version in equation (7.51) or, even
better, the robust version in equation (7.49), to construct t statistics and confidence
intervals. Testing multiple restrictions is fairly easy using the Wald test, which always
has the same general form. The important consideration lies in choosing the asymp-
totic variance estimate,
^

VV. Standard Wald statistics use equation (7.51), and this
approach produces limiting chi-square statistics under the homoskedasticity assump-
tion SGLS.3. Completely robust Wald statistics are obta ined by choosing
^
VV as in
equation (7.49).
If Assumption SGLS.3 holds under H
0
, we can define a statistic based on the
weighted sums of squared residuals. To obtain the statistic, we estimate the model
with and without the restrictions imposed on b, where the same estimator of W, usu-
ally based on the unrestricted SOLS residuals, is used in obtaining the restricted and
unrestricted FGLS estimators. Let
~
uu
i
denote the residuals from constrained FGLS
(with Q restrictions imposed on
~
bb) using variance matrix
^
WW. It can be shown that,
under H
0
and Assumptions SGLS.1–SGLS.3,
Chapter 7162
X
N
i¼1
~

uu
0
i
^
WW
À1
~
uu
i
À
X
N
i¼1
^
uu
0
i
^
WW
À1
^
uu
i
!
@
a
w
2
Q
ð7:52Þ

Gallant (1987) shows expression (7.52) for nonlinear models with fixed regressors;
essentially the same proof works here under Assumptions SGLS.1–SGLS.3, as we
will show more generally in Chapter 12.
The statistic in expression (7.52) is the di¤erence between the transformed sum
of squared residuals from the restricted and unrestricted models, but it is just as easy
to calculate expression (7.52) directly. Gallant (1987, Chapter 5) has found that an
F statistic has better finite sample properties. The F statistic in this context is
defined as
F ¼
X
N
i¼1
~
uu
0
i
^
WW
À1
~
uu
i
À
X
N
i¼1
^
uu
0
i

^
WW
À1
^
uu
i
!

X
N
i¼1
^
uu
0
i
^
WW
À1
^
uu
i
!"#
½ðNG À KÞ=Q ð7:53Þ
Why can we treat this equation as having an approximate F distribution? First,
for NG À K large, F
Q; NGÀK
@
a
w
2

Q
=Q. Therefore, dividing expression (7.52) by Q
gives us an approximate F
Q; NGÀK
distribution. The presence of the other two
terms in equation (7.53) is to improve the F-approximation. Since Eðu
0
i
W
À1
u
i
Þ¼
trfEðW
À1
u
i
u
0
i
Þg ¼ trfEðW
À1
WÞg ¼ G, it follows that ðNGÞ
À1
P
N
i¼1
u
0
i

W
À1
u
i
!
p
1; re-
placing u
0
i
W
À1
u
i
with
^
uu
0
i
^
WW
À1
^
uu
i
does not a¤ect this consistency result. Subtracting o¤
K as a degrees-of-freedom adjustment changes nothing asymptotically, and so
ðNG À KÞ
À1
P

N
i¼1
^
uu
0
i
^
WW
À1
^
uu
i
!
p
1. Multiplying expression (7.52) by the inverse of this
quantity does not a¤ect its asymptotic distribution.
7.7 Seemingly Unrelated Regressions, Revisited
We now return to the SUR system in assumption (7.2). We saw in Section 7.3 how to
write this system in the form (7.9) if there are no cross equation restrictions on the
b
g
. We also showed that the system OLS estimator corresponds to estimating each
equation separately by OLS.
As mentioned earlier, in most applications of SUR it is reasonable to assume that
Eðx
0
ig
u
ih
Þ¼0, g; h ¼ 1; 2; ; G, which is just Assumption SGLS.1 for the SUR

structure. Under this assumption, FGLS will consistently estimate the b
g
.
OLS equation by equation is simple to use and leads to standard inference for each
b
g
under the OLS homoskedasticity assumption Eðu
2
ig
jx
ig
Þ¼s
2
g
, which is standard
in SUR contexts. So why bother using FGLS in such applications? There are two
answers. First, as mentioned in Section 7.5.2, if we can maintain assumption (7.50) in
addition to Assumption SGLS.1 (and SGLS.2), FGLS is asymptotically at least as
Estimating Systems of Equations by OLS and GLS 163
e‰cient as system OLS. Second, while OLS equation by equation allows us to easily
test hypotheses about the coe‰cients within an equation, it does not provide a con-
venient way for testing cross equation restrictions. It is possible to use OLS for testing
cross equation restrictions by using the variance matrix (7.26), but if we are willing to
go through that much trouble, we should just use FGLS.
7.7.1 Comparison between OLS and FGLS for SUR Systems
There are two cases where OLS equation by equation is algebraically equivalent to
FGLS. The first case is fairly straightforward to analyze in our setting.
theorem 7.5 (Equivalence of FGLS and OLS, I): If
^
WW is a diagonal matrix, then

OLS equation by equation is identical to FGLS.
Proof: If
^
WW is diagonal, then
^
WW
À1
¼ diagð
^
ss
À2
1
; ;
^
ss
À2
G
Þ. With X
i
defined as in the
matrix (7.10), straightforward algebra shows that
X
0
i
^
WW
À1
X
i
¼

^
CC
À1
X
0
i
X
i
and X
0
i
^
WW
À1
y
i
¼
^
CC
À1
X
0
i
y
i
where
^
CC is the block diagonal matrix with
^
ss

2
g
I
k
g
as its gth block. It follows that the
FGLS estimator can be written as
^
bb ¼
X
N
i¼1
^
CC
À1
X
0
i
X
i
!
À1
X
N
i¼1
^
CC
À1
X
0

i
y
i
!
¼
X
N
i¼1
X
0
i
X
i
!
À1
X
N
i¼1
X
0
i
y
i
!
which is the system OLS estimator.
In applications,
^
WW would not be diagonal unless we impose a diagonal structure.
Nevertheless, we can use Theorem 7.5 to obtain an asymptotic equivalance result
when W is diagonal. If W is diagonal, then the GLS and OLS are algebraically iden-

tical (because GLS uses W). We know that FGLS and GLS are
ffiffiffiffiffi
N
p
-asymptotically
equivalent for any W. Therefore, OLS and FGLS are
ffiffiffiffiffi
N
p
-asymptotically equivalent
if W is diagonal, even though they are not algebraically equivalent (because
^
WW is not
diagonal).
The second algebraic equivalence result holds without any restrictions on
^
WW.Itis
special in that it assumes that the same regressors appear in each equation.
theorem 7.6 (Equivalence of FGLS and OLS, II): If x
i1
¼ x
i2
¼ÁÁÁ¼x
iG
for all i,
that is, if the same regressors show up in each equation (for all observations), then
OLS equation by equation and FGLS are identical.
In practice, Theorem 7.6 holds when the population model has the same explanatory
variables in each equation. The usual proof of this result groups all N observations
Chapter 7164

for the first equation followed by the N observations for the second equation, and so
on (see, fo r example, Greene, 1997, Chapter 17). Problem 7.5 asks you to prove
Theorem 7.6 in the current setup, where we have ordered the observations to be
amenable to asymptotic analysis.
It is important to know that when every equation contains the same regressors in an
SUR system, there is still a good reason to use a SUR software routine in obtaining
the estimates: we may be interested in testing joint hypotheses involving parameters
in di¤erent equations. In order to do so we need to estimate the variance matrix of
^
bb
(not just the variance matrix of each
^
bb
g
, which only allows tests of the coe‰cients
within an equation). Estimating each equation by OLS does not directly yield the
covariances between the estimators from di¤erent equations. Any SUR routine will
perform this operation automatically, then compute F statistics as in equation (7.53)
(or the chi-square alternative, the Wald statistic).
Example 7.3 (SUR System for Wages and Fringe Benefits): We use the data on
wages and fringe benefits in FRINGE.RAW to estimate a two-equation system for
hourly wage and hourly benefits. There are 616 workers in the data set. The FGLS
estimates are given in Table 7.1, with asymptotic standard errors in parentheses
below estimated coe‰cients.
The estimated coe‰cients generally have the signs we expect. Other things equal,
people with more education have higher hourly wage and benefits, males have higher
predicted wages and benefits ($1.79 and 27 cents higher, respectively), and people
with more tenure have higher earnings and benefits, although the e¤ect is diminishing
in both cases. (The turning point for hrearn is at about 10.8 years, while for hrbens it
is 22.5 years.) The coe‰cients on experience are interesting. Experience is estimated

to have a dimininshing e¤ect for benefits but an increasing e¤ect for earnings, although
the estimated upturn for earnings is not until 9.5 years.
Belonging to a union implies higher wages and benefits, with the benefits coe‰cient
being especially statistically significant ðtA 7:5Þ.
The errors across the two equations appear to be positively correlated, with an
estimated correlation of about .32. This result is not surprising: the same unobserv-
ables, such as ability, that lead to higher earnings, also lead to higher benefits.
Clearly there are significant di¤erences between males and females in both earn-
ings and benefits. But what about between whites and nonwhites, and married and
unmarried people? The F-type statistic for joint significance of married and white in
both equations is F ¼ 1:83. We are testing four restrictions ðQ ¼ 4Þ, N ¼ 616, G ¼ 2,
and K ¼ 2ð13Þ¼26, so the degrees of freedom in the F distribution are 4 and 1,206.
The p-value is about .121, so these variables are jointly insignificant at the 10 per-
cent level.
Estimating Systems of Equations by OLS and GLS 165
If the regressors are di¤erent in di¤erent equations, W is not diagonal, and the
conditions in Section 7.5.2 hold, then FGLS is generally asymptotically more e‰cient
than OLS equation by equation. One thing to remember is that the e‰ciency of
FGLS comes at the price of assuming that the regressors in each equation are
uncorrelated with the errors in each equation. For SOLS and FGLS to be di¤erent,
the x
g
must vary across g .Ifx
g
varies across g, certain explanatory variables have
been intentionally omitted from some equations. If we are interested in, say, the first
equation, but we make a mistake in specifying the second equation, FGLS will gen-
erally produce inconsistent estimators of the parameters in all equations. However,
OLS estimation of the first equation is consistent if Eðx
0

1
u
1
Þ¼0.
The previous discussion reflects the trade-o¤ between e‰ciency and robustness that
we often encounter in estimation problems.
Table 7.1
An Estimated SUR Model for Hourly Wages and Hourly Benefits
Explanatory Variables hrearn hrbens
educ .459
(.069)
.077
(.008)
exper À.076
(.057)
.023
(.007)
exper
2
.0040
(.0012)
À.0005
(.0001)
tenure .110
(.084)
.054
(.010)
tenure
2
À.0051

(.0033)
À.0012
(.0004)
union .808
(.408)
.366
(.049)
south À.457
(.552)
À.023
(.066)
nrtheast À1.151
(0.606)
À.057
(.072)
nrthcen À.636
(.556)
À.038
(.066)
married .642
(.418)
.058
(.050)
white 1.141
(0.612)
.090
(.073)
male 1.785
(0.398)
.268

(.048)
intercept À2.632
(1.228)
À.890
(.147)
Chapter 7166
7.7.2 Systems with Cross Equation Restrictions
So far we have studied SUR under the assumption that the b
g
are unrelated across
equations. When systems of equations are used in economics, especially for modeling
consumer and producer theory, there are often cross equation restrictions on the
parameters. Such models can still be written in the general form we have covered,
and so they can be estimated by system OLS and FGLS. We still refer to such sys-
tems as SUR systems, even though the equations are now obviously related, and
system OLS is no longer OLS equation by equation.
Example 7.4 (SUR with Cross Equation Rest rictions): Consider the two-equation
population model
y
1
¼ g
10
þ g
11
x
11
þ g
12
x
12

þ a
1
x
13
þ a
2
x
14
þ u
1
ð7:54Þ
y
2
¼ g
20
þ g
21
x
21
þ a
1
x
22
þ a
2
x
23
þ g
24
x

24
þ u
2
ð7:55Þ
where we have imposed cross equation restrictions on the parameters in the two
equations because a
1
and a
2
show up in each equation. We can put this model into
the form of equation (7.9) by appropriately defining X
i
and b. For example, define
b ¼ðg
10
; g
11
; g
12
; a
1
; a
2
; g
20
; g
21
; g
24
Þ

0
, which we know must be an 8 Â 1 vector because
there are 8 parameters in this system. The order in which these elements appear in b
is up to us, but once b is defined, X
i
must be chosen accordingly. For each observa-
tion i, define the 2 Â 8 matrix
X
i
¼
1 x
i11
x
i12
x
i13
x
i14
00 0
00 0x
i22
x
i23
1 x
i21
x
i24

Multiplying X
i

by b gives the equations (7.54) and (7.55).
In applications such as the previous example, it is fairly straightforward to test the
cross equation restrictions, especially using the sum of squared residuals statistics
[equation (7.52) or (7.53)]. The unrestricted model simply allows each explanatory
variable in each equation to have its own coe‰cient. We would use the unrestricted
estimates to obtain
^
WW, and then obtain the restri cted estimates using
^
WW.
7.7.3 Singular Variance Matrices in SUR Systems
In our treatment so far we have assumed that the variance matrix W of u
i
is non-
singular. In consumer and producer theory applications this assumption is not always
true in the original structural equations, because of additivity constraints.
Example 7.5 (Cost Share Equations): Suppose that, for a given year, each firm in
a particular industry uses three inputs, capital (K ), labor (L), and materials (M ).
Estimating Systems of Equations by OLS and GLS 167

×