Tải bản đầy đủ (.pdf) (27 trang)

Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 6 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (199.66 KB, 27 trang )

6 Additional Single-Equation Topics
6.1 Estimation with Generated Regressors and Instruments
6.1.1 OLS with Generated Regressors
We often need to draw on results for OLS estimation when one or more of the
regressors have been estimated from a first-stage procedure. To illustrate the issues,
consider the model
y ¼ b
0
þ b
1
x
1
þÁÁÁþb
K
x
K
þ gq þu ð6:1Þ
We observe x
1
; ; x
K
, but q is unobserved. However, suppose that q is related to
observable data through the function q ¼ f ðw; dÞ, where f is a known function and
w is a vector of observed variables, but the vector of parameters d is unknown (which
is why q is not observed). Often, but not always, q will be a linear function of w and
d. Suppose that we can consistently estimate d, and let
^
dd be the estimator. For each
observation i,
^
qq


i
¼ f ðw
i
;
^
ddÞ e¤ectively estimates q
i
. Pagan (1984) calls
^
qq
i
a generated
regressor. It seems reasonable that, replacing q
i
with
^
qq
i
in running the OLS regression
y
i
on 1; x
i1
; x
i2
; ; x
ik
;
^
qq

i
; i ¼ 1; ; N ð6:2Þ
should produce consistent estimates of all parameters, including g. The question is,
What assumptions are su‰cient?
While we do not cover the asymptotic theory needed for a careful proof until
Chapter 12 (which treats nonlinear estimation), we can provide some intu ition here.
Because plim
^
dd ¼ d, by the law of large numbers it is reasonable that
N
À1
X
N
i¼1
^
qq
i
u
i
!
p
Eðq
i
u
i
Þ; N
À1
X
N
i¼1

x
ij
^
qq
i
!
p
Eðx
ij
q
i
Þ
From this relation it is easily shown that the usual OLS assumption in the population—
that u is uncorrelated with ðx
1
; x
2
; ; x
K
; qÞ—su‰ces for the two-step procedure to
be consistent (along with the rank condition of Assumption OLS.2 applied to the
expanded vector of explanatory variables). In other words, for consistency, replacing
q
i
with
^
qq
i
in an OLS regression causes no problems.
Things are not so simple when it comes to inference: the standard errors and test

statistics obtained from regression (6.2) are generally invalid because they ignore the
sampling variation in
^
dd.Since
^
dd is also obtained using data—usually the same sample
of data —uncertainty in the estimate should be accounted for in the second step.
Nevertheless, there is at least one important case where the sampling variation of
^
dd
can be ignored, at least asymptotically: if
E½‘
d
f ðw; dÞ
0
u¼0 ð6:3Þ
g ¼ 0 ð6:4Þ
then the
ffiffiffiffiffi
N
p
-limiting distribution of the OLS estimators from regression (6.2) is the
same as the OLS estimators when q replaces
^
qq. Con dition (6.3) is implied by the zero
conditional mean condition
Eðu jx; wÞ¼0 ð6:5Þ
which usually hol ds in generated regressor contexts.
We often want to test the null hypothesis H
0

: g ¼ 0 before including
^
qq in the final
regression. Fortunately, the usual t statistic on
^
qq has a limiting standard normal dis-
tribution under H
0
, so it can be used to test H
0
. It simply requires the usual homo-
skedasticity assumption, Eðu
2
jx; qÞ¼s
2
. The heteroskedasticity-robust statistic
works if heteroskedasticity is present in u under H
0
.
Even if condition (6.3) holds, if g 0 0, then an adjustment is needed for the
asymptotic variances of all OLS estimators that are due to estimation of d. Thus,
standard t statistics, F statistics, and LM statistics will not be asymptotically valid
when g 0 0. Using the methods of Chapter 3, it is not di‰cult to derive an ad-
justment to the usual variance matrix estimate that accounts for the variability in
^
dd (and also allows for heteroskedasticity). It is not true that replacing q
i
with
^
qq

i
simply introduces heteroskedasticity into the error term; this is not the correct way
to think about the generated regressors issue. Accounting for the fact that
^
dd depends
on the same random sample used in the second-stage estimation is much di¤erent
from having heteroskedasticity in the error. Of course, we might want to use
a heteroskedasticity-robust standard error for testing H
0
: g ¼ 0 because
heteroskedasticity in the population err or u can always be a problem. However, just
as with the usual OLS standard error, this is generally justified only under H
0
: g ¼ 0.
A general formula for the asymptotic variance of 2SLS in the presence of gen-
erated regressors is given in the appendix to this chapter; this covers OLS with gen-
erated regressors as a special case. A general framework for handling these problems
is given in Newey (1984) and Newey and McFadden (1994), but we must hold o¤
until Chapter 14 to give a careful treatment.
6.1.2 2SLS with Generated Instruments
In later chapters we will need results on 2SLS estimation when the instruments have
been estimated in a preliminary stage. Write the population model as
Chapter 6116
y ¼ xb þ u ð6:6Þ
Eðz
0
uÞ¼0 ð6:7Þ
where x is a 1 Â K vector of explanatory variables and z is a 1 Â L ðL b KÞ vector of
intrumental variables. Assume that z ¼ gðw; lÞ, where gðÁ; lÞ is a known function but
l needs to be estimated. For each i, define the generated instruments

^
zz
i
1 gðw
i
;
^
llÞ.
What can we say about the 2SLS estimator when the
^
zz
i
are used as instruments?
By the same reasoning for OLS with generated regressors, consistency follows
under weak conditions. Further, under conditions that are met in many applications,
we can ignore the fact that the instruments were estimated in using 2SLS for infer-
ence. Su‰cient are the assumptions that
^
ll is
ffiffiffiffiffi
N
p
-consistent for l and that
E½‘
l
gðw; lÞ
0
u¼0 ð6:8Þ
Under condition (6.8), which holds when Eðu jwÞ¼0, the
ffiffiffiffiffi

N
p
-asymptotic distribu-
tion of
^
bb is the same whether we use l or
^
ll in constructing the instruments. This fact
greatly simplifies calculation of asymptotic standard errors and test statistics. There-
fore, if we have a choice, there are practical reasons for using 2SLS with generated
instruments rather than OLS with generated regressors. We will see some examples in
Part IV.
One consequence of this discussion is that, if we add the 2SLS homoskedasticity
assumption (2SLS.3), the usual 2SLS standard errors and test statistics are asymp-
totically valid. If Assumption 2SLS.3 is violated, we simply use the heteroskedasticity-
robust standard errors and test statistics. Of course, the finite sample properties of the
estimator using
^
zz
i
as instruments could be notably di¤erent from those using z
i
as
instruments, especially for small sample sizes. Determining whether this is the case
requires either more sophisticated asymptotic approximations or simulations on a
case-by-case basis.
6.1.3 Generated Instruments and Regressors
We will encounter examples later where some instruments and some regressors are
estimated in a first stage. Generally, the asymptotic variance needs to be adjusted
because of the generated regressors, although there are some special cases where the

usual variance matrix estimators are valid. As a general example, consider the model
y ¼ xb þ gf ðw; dÞþu; E ðu jz ; wÞ¼0
and we estimate d in a first stage. If g ¼ 0, then the 2SLS estimator of ðb
0
; gÞ
0
in the
equation
Additional Single-Equation Topics 117
y
i
¼ x
i
b þ g
^
ff
i
þ error
i
using instruments ðz
i
;
^
ff
i
Þ, has a limiting distribution that does not depend on the
limiting distribution of
ffiffiffiffiffi
N
p

ð
^
dd ÀdÞ under conditions (6.3) and (6.8). Therefore, the
usual 2SLS t statistic for
^
gg, or its heteroskedsticity-robust version, can be used to test
H
0
: g ¼ 0.
6.2 Some Specification Tests
In Chapters 4 and 5 we covered what is usually called classical hypothesis testi ng for
OLS and 2SLS. In this section we cover some tests of the assumptions underlying
either OLS or 2SLS. These are easy to compute and should be routinely reported in
applications.
6.2.1 Testing for Endogeneity
We start with the linear model and a single possibly endogenous variable. For nota-
tional clarity we now denote the dependent variable by y
1
and the potentially endog-
enous explanatory variable by y
2
. As in all 2SLS contexts, y
2
can be continuous or
binary, or it may have continuous and discrete characteristics; there are no restric-
tions. The population model is
y
1
¼ z
1

d
1
þ a
1
y
2
þ u
1
ð6:9Þ
where z
1
is 1 Â L
1
(including a constant), d
1
is L
1
 1, and u
1
is the unobserved dis-
turbance. The set of all exogenous variables is denoted by the 1 ÂL vector z, where
z
1
is a strict subset of z. The maintained exogeneity assumption is
Eðz
0
u
1
Þ¼0 ð6:10Þ
It is important to keep in mind that condition (6.10) is assumed throughout this

section. We also assume that equation (6.9) is identified when Eðy
2
u
1
Þ0 0, which
requires that z have at least one element not in z
1
(the order condition); the rank
condition is that at least one element of z not in z
1
is partially correlated with y
2
(after netting out z
1
). Under these assumptions, we now wish to test the null hypothesis
that y
2
is actually exoge nous.
Hausman (1978) suggested comparing the OLS and 2SLS estimators of b
1
1
ðd
0
1
; a
1
Þ
0
as a formal test of endogeneity: if y
2

is uncorrelated with u
1
, the OLS and
2SLS estimators should di¤er only by sampling error. This reasoning leads to the
Hausman test for endogeneity.
Chapter 6118
The original form of the statistic turns out to be cumbersome to compute because
the matrix appearing in the quadratic form is singular, except when no exogenous
variables are present in equation (6.9). As pointed out by Hausman (1978, 1983),
there is a regression-based form of the test that turns out to be asymptotically
equivalent to the original form of the Hausman test. In addition, it extends easily to
other situations, including some nonlinear models that we cover in Chapters 15, 16,
and 19.
To derive the regression-based test, write the linear projection of y
2
on z in error
form as
y
2
¼ zp
2
þ v
2
ð6:11Þ
Eðz
0
v
2
Þ¼0 ð6:12Þ
where p

2
is L Â1. Since u
1
is uncorrelated with z, it follows from equations (6.11)
and (6.12) that y
2
is endogenous if and only if Eðu
1
v
2
Þ0 0. Thus we can test whether
the structural error, u
1
, is correlated with the reduced form error, v
2
. Write the linear
projection of u
1
onto v
2
in error form as
u
1
¼ r
1
v
2
þ e
1
ð6:13Þ

where r
1
¼ Eðv
2
u
1
Þ=Eðv
2
2
Þ,Eðv
2
e
1
Þ¼0, and Eðz
0
e
1
Þ¼0 (since u
1
and v
2
are each
orthogonal to z). Thus, y
2
is exogenous if and only if r
1
¼ 0.
Plugging equation (6.13) into equation (6.9) gives the equation
y
1

¼ z
1
d
1
þ a
1
y
2
þ r
1
v
2
þ e
1
ð6:14Þ
The key is that e
1
is uncorrelated with z
1
, y
2
, and v
2
by construction. Therefore, a test
of H
0
: r
1
¼ 0 can be done using a standard t test on the variable v
2

in an OLS re-
gression that includes z
1
and y
2
. The problem is that v
2
is not observed. Nevertheless,
the reduced form parameters p
2
are easily estimated by OLS. Let
^
vv
2
denote the OLS
residuals from the first-stage reduced form regression of y
2
on z—remember that z
contains all exogenous variables. If we replace v
2
with
^
vv
2
we have the equation
y
1
¼ z
1
d

1
þ a
1
y
2
þ r
1
^
vv
2
þ error ð6:15Þ
and d
1
, a
1
, and r
1
can be consistently estimated by OLS. Now we can use the results
on generated regressors in Section 6.1.1: the usual OLS t statistic for
^
rr
1
is a valid test
of H
0
: r
1
¼ 0, provided the homoskedasticity assumption Eðu
2
1

jz; y
2
Þ¼s
2
1
is sat-
isfied under H
0
. (Remember, y
2
is exogenous under H
0
.) A heteroskedasticity-robust
t statistic can be used if heteroskedasticity is suspected under H
0
.
Additional Single-Equation Topics 119
As shown in Problem 5.1, the OLS estimates of d
1
and a
1
from equation (6.15) are
in fact identical to the 2SLS estimates. This fact is convenient because, along with
being computationally simple, regression (6.15) allows us to compare the magnitudes
of the OLS and 2SLS estimates in order to determine whether the di¤erences are
practically significant, rather than just finding statistically significant evidence of
endogeneity of y
2
. It also provides a way to verify that we have computed the statistic
correctly.

We should remember that the OLS standard errors that would be reported from
equation (6.15) are not valid unless r
1
¼ 0, because
^
vv
2
is a generated regressor. In
practice, if we reject H
0
: r
1
¼ 0, then, to get the appropriate standard errors and
other test statistics, we estimate equation (6.9) by 2SLS.
Example 6.1 (Testing for End ogeneity of Education in a Wage Equation): Consider
the wage equation
logðwageÞ¼d
0
þ d
1
exper þ d
2
exper
2
þ a
1
educ þ u
1
ð6:16Þ
for working women, where we believe that educ and u

1
may be correlat ed. The
instruments for educ are parents’ education and husband’s education. So, we first
regress educ on 1, exper, exper
2
, motheduc, fatheduc, and huseduc and obtain the
residuals,
^
vv
2
. Then we simply include
^
vv
2
along with unity, exper, exper
2
, and educ in
an OLS regression and obtain the t statistic on
^
vv
2
. Using the data in MROZ.RAW
gives the result
^
rr
1
¼ :047 and t
^
rr
1

¼ 1:65. We find evidence of endogeneity of educ at
the 10 percent significance level against a two-sided alternative, and so 2SLS is
probably a good idea (assuming that we trust the instruments). The correct 2SLS
standard errors are given in Example 5.3.
Rather than comparing the OLS and 2SLS estimates of a particular linear combi-
nation of the parameters—as the original Hausman test does—it often makes sense
to compare just the estimates of the paramete r of interest, which is usually a
1
.If,
under H
0
, Assumptions 2SLS.1–2SLS.3 hold with w replacing z, where w includes
all nonredundant elements in x and z, obtaining the test is straightforward. Under
these assumptions it can be shown that Avarð
^
aa
1; 2SLS
À
^
aa
1; OLS
Þ¼Avarð
^
aa
1; 2SLS
ÞÀ
Avarð
^
aa
1; OLS

Þ. [This conclusion essentially hol ds because of Theorem 5.3; Problem
6.12 asks you to show this result formally. Hausman (1978), Newey and McFadden
(1994, Section 5.3), and Section 14.5.1 contain more general treatments.] Therefore,
the Hausman t statistic is simply ð
^
aa
1; 2SLS
À
^
aa
1; OLS
Þ=f½seð
^
aa
1; 2SLS
Þ
2
À½seð
^
aa
1; OLS
Þ
2
g
1=2
,
where the stand ard errors are the usual ones computed under homoskedasticity. The
denominator in the t statistic is the standard error of ð
^
aa

1; 2SLS
À
^
aa
1; OLS
Þ. If there is
Chapter 6120
heteroskedasticity under H
0
, this standard error is invalid because the asymptotic
variance of the di¤erence is no longer the di¤erence in asymptotic variances.
Extending the regression-based Hausman test to several potentially endogenous
explanatory variables is straightforward. Let y
2
denote a 1 Â G
1
vector of possible
endogenous variables in the population model
y
1
¼ z
1
d
1
þ y
2
a
1
þ u
1

; Eðz
0
u
1
Þ¼0 ð6:17Þ
where a
1
is now G
1
 1. Again, we assume the rank condition for 2SLS. Write the
reduced form as y
2
¼ zP
2
þ v
2
, where P
2
is L Â G
1
and v
2
is the 1 Â G
1
vector of
population reduced form errors. For a generic observation let
^
vv
2
denote the 1 Â G

1
vector of OLS residuals obtained from each reduced form. (In other words, take each
element of y
2
and regress it on z to obtain the RF residuals; then collect these in the
row vector
^
vv
2
.) Now, estimate the model
y
1
¼ z
1
d
1
þ y
2
a
1
þ
^
vv
2
r
1
þ error ð6:18Þ
and do a standard F test of H
0
: r

1
¼ 0, which tests G
1
restrictions in the unrestricted
model (6.18). The restricted model is obtained by setting r
1
¼ 0, which means we
estimate the original model (6.17) by OLS. The test can be made robust to hetero-
skedasticity in u
1
(since u
1
¼ e
1
under H
0
) by applying the heteroskedasticity-robust
Wald statistic in Chapter 4. In some regression packages, such as Stata=, the robust
test is implemented as an F-type test.
An alternative to the F test is an LM-type test. Let
^
uu
1
be the OLS residuals from
the regression y
1
on z
1
; y
2

(the residuals obtained under the null that y
2
is exogenous).
Then, obtain the usual R-squared (assuming that z
1
contains a constant), say R
2
u
,
from the regression
^
uu
1
on z
1
; y
2
;
^
vv
2
ð6:19Þ
and use NR
2
u
as asymptotically w
2
G
1
. This test again maintains homoskedasticity under

H
0
. The test can be made heteroskedasticity-robust using the method described in
equation (4.17): take x
1
¼ðz
1
; y
2
Þ and x
2
¼
^
vv
2
. See also Wooldridge (1995b).
Example 6.2 (Endogeneity of Education in a Wage Equation, continued): We add
the interaction term blackÁeduc to the log(wage) equation estimated by Card (1995);
see also Problem 5.4. Write the model as
logðwageÞ¼a
1
educ þ a
2
blackÁeduc þ z
1
d
1
þ u
1
ð6:20Þ

where z
1
contains a constant, exper, exper
2
, black, smsa, 1966 regional dummy vari-
ables, and a 1966 SMSA indicator. If educ is correlated with u
1
, then we also expect
Additional Single-Equation Topics 121
blackÁeduc to be correlated with u
1
.Ifnearc4, a binary indicator for whether a worker
grew up near a four-year college, is valid as an instrumental variable for educ, then a
natural instrumental variable for blackÁeduc is blackÁnearc4. Note that blackÁnearc4 is
uncorrelated with u
1
under the conditional mean assumption Eðu
1
jzÞ¼0, where z
contains all exogenous variables.
The equation estimated by OLS is
log
^
ððwageÞ¼ 4:81
ð0:75Þ
þ :071
ð:004Þ
educ þ :018
ð:006Þ
blackÁeduc À :419

ð:079Þ
black þÁÁÁ
Therefore, the return to education is estimated to be about 1.8 percentage points
higher for blacks than for nonblacks, even though wages are substantially lower for
blacks at all but unrealistically high levels of education. (It takes an estimated 23.3
years of education before a black worker earns as much as a nonblack worker.)
To test whether educ is exogenous we must test whether educ and blackÁeduc are
uncorrelated with u
1
. We do so by first regressing educ on all instrumental variables:
those elements in z
1
plus nearc4 and blackÁnearc4. (The interaction blackÁnearc4
should be included because it might be partially correlated with educ.) Let
^
vv
21
be the
OLS residuals from this regression. Similarly, regress blackÁeduc on z
1
, nearc4, and
blackÁnearc4, and save the residuals
^
vv
22
. By the way, the fact that the dependent
variable in the second reduced form regression, blackÁeduc, is zero for a large fraction
of the sample has no bearing on how we test for endogeneity.
Adding
^

vv
21
and
^
vv
22
to the OLS regression and computing the joint F test yields F ¼
0:54 and p-value ¼ 0.581; thus we do not reject exogeneity of educ and blackÁeduc.
Incidentally, the reduced form regressions confirm that educ is partially corre-
lated with nearc4 (but not blackÁnearc4) and blackÁeduc is partially correlated with
blackÁnearc4 (but not nearc4). It is easily seen that th ese findings mean that the rank
condition for 2SLS is satisfied—see Problem 5.15c. Even though educ does not ap-
pear to be endogenous in equation (6.20), we estimate the equation by 2SLS:
log
^
ððwageÞ¼ 3:84
ð0:97Þ
þ :127
ð:057Þ
educ þ :011
ð:040Þ
blackÁeduc À :283
ð:506Þ
black þÁÁÁ
The 2SLS point estimates certainly di¤er from the OLS estimates, but the standard
errors are so large that the 2SLS and OLS estimates are not statistically di¤erent.
6.2.2 Testing Overidentifying Restrictions
When we have more instruments than we need to identify an equation, we can test
whether the additional instruments are valid in the sense that they are uncorrelated
with u

1
. To explain the various procedures, write the equat ion in the form
Chapter 6122
y
1
¼ z
1
d
1
þ y
2
a
1
þ u
1
ð6:21Þ
where z
1
is 1 Â L
1
and y
2
is 1 Â G
1
. The 1 Â L vector of all exogenous variables is
again z; partition this as z ¼ðz
1
; z
2
Þ where z

2
is 1 Â L
2
and L ¼ L
1
þ L
2
. Because the
model is overidentified, L
2
> G
1
. Under the usual identification conditions we could
use any 1 Â G
1
subset of z
2
as instruments for y
2
in estimating equation (6.21) (re-
member th e elements of z
1
act as their own instruments). Following his general
principle, Hausman (1978) suggested comparing the 2SLS estimator using all instru-
ments to 2SLS using a subset that just identifies equation (6.21). If all instruments are
valid, the estimates should di¤er only as a result of sampling error. As with testing for
endogeneity, constructing the original Hausman statistic is computationally cumber-
some. Instead, a simple regression-based procedure is available.
It turns out that, under homoskedasticity, a test for validity of the overidentifi-
cation restrictions is obtained as NR

2
u
from the OLS regression
^
uu
1
on z ð6:22Þ
where
^
uu
1
are the 2SLS residuals using all of the instruments z and R
2
u
is the usual R-
squared (assuming that z
1
and z contain a constant; otherwise it is the uncentered R-
squared). In other words, simply estimate regression (6.21) by 2SLS and obtain the
2SLS residuals,
^
uu
1
. Then regress these on all exogen ous variables (including a con-
stant). Under the null that Eðz
0
u
1
Þ¼0 and Assumption 2SLS.3, NR
2

u
@
a
w
2
Q
1
, where
Q
1
1 L
2
À G
1
is the number of overidentifying restrictions.
The usefulness of the Hausman test is that, if we reject the null hypothesis, then our
logic for choosing the IVs must be reexamined. If we fail to reject the null, then we
can have some confidence in the overall set of instruments used. Of course, it could also
be that the test has low power for detecting endogeneity of some of the instruments.
A heteroskedasticity-robust version is a little more complicated but is still easy to
obtain. Let
^
yy
2
denote the fitted values from the first-stage regressions (each element of
y
2
onto z). Now, let h
2
be any 1 ÂQ

1
subset of z
2
. (It does not matter which elements
of z
2
we choose, as long as we choose Q
1
of them.) Regress each element of h
2
onto
ðz
1
;
^
yy
2
Þ and collect the residuals,
^
rr
2
ð1 Â Q
1
Þ. Then an asymptotic w
2
Q
1
test statistic is
obtained as N ÀSSR
0

from the regression 1 on
^
uu
1
^
rr
2
. The proof that this method
works is very similar to that for the heteroskedasticity-robust test for exclusion
restrictions. See Wooldridge (1995b) for details.
Example 6.3 (Overidentifying Restrictions in the Wage Equation): In estimating
equation (6.16) by 2SLS, we used (motheduc, fatheduc, huseduc) as instruments for
educ. Therefore, there are two overidentifying restrictions. Letting
^
uu
1
be the 2SLS
residuals from equation (6.16) using all instruments, the test statistic is N times the R-
squared from the OLS regression
Additional Single-Equation Topics 123
^
uu
1
on 1; exper; exper
2
; motheduc; fatheduc; huseduc
Under H
0
and homoskedasticity, NR
2

u
@
a
w
2
2
. Using the data on working women in
MROZ.RAW gives R
2
u
¼ :0026, and so the overidentification test statistic is about
1.11. The p-value is about .574, so the overidentifying restrictions are not rejected at
any reasonable level.
For the heteroskedasticity-robust version, one approach is to obtain the residuals,
^
rr
1
and
^
rr
2
, from the OLS regressions motheduc on 1, exper, exper
2
, and e
^
dduc and
fatheduc on 1, exper, exper
2
, and e
^

dduc, where e
^
dduc are the first-stage fitted values
from the regression educ on 1, exper, exper
2
, motheduc, fatheduc, and huseduc. Then
obtain N À SSR from the OLS regression 1 on
^
uu
1
Á
^
rr
1
,
^
uu
1
Á
^
rr
2
. Using only the 428
observations on working women to obtain
^
rr
1
and
^
rr

2
, the value of the robust test sta-
tistic is about 1.04 with p-value ¼ :595, which is similar to the p-value for the non-
robust test.
6.2.3 Testing Functional Form
Sometimes we need a test with power for detecting neglected nonlinearities in models
estimated by OLS or 2SLS. A useful approach is to add nonlinear functions, such as
squares and cross products, to the original model. This approach is easy when all
explanatory variables are exogenous: F statistics and LM statistics for exclusion
restrictions are easily obtained. It is a little tricky for models with endogenous ex-
planatory variables because we need to choose instruments for the additional non-
linear functions of the endogenous variables. We postpone this topic until Chapter 9
when we discuss simultaneous equation models. See also Wooldridge (1995b).
Putting in squares and cross products of all exogenous variables can consume
many degrees of freedom. An alternative is Ramsey’s (1969) RESET, which has
degrees of freedom that do not depend on K. Write the model as
y ¼ xb þ u ð6:23Þ
Eðu jxÞ¼0 ð6:24Þ
[You should convince yourself that it makes no sense to test for functional form if we
only assume that Eðx
0
uÞ¼0. If equation (6.23) defines a linear projection, then, by
definition, functional form is not an issue.] Under condition (6.24) we know that any
function of x is uncorrelated with u (hence the previous suggestion of putting squares
and cross products of x as additional regressors). In particular, if condition (6.24)
holds, then ðxb Þ
p
is uncorrelated with u for any integer p. Since b is not observed, we
replace it with the OLS estimator,
^

bb. Define
^
yy
i
¼ x
i
^
bb as the OLS fitted values and
^
uu
i
as the OLS residuals. By definition of OLS, the sample covariance between
^
uu
i
and
^
yy
i
is zero. But we can test whether the
^
uu
i
are su‰ciently correlated with low-order poly-
Chapter 6124
nomials in
^
yy
i
, say

^
yy
2
i
,
^
yy
3
i
, and
^
yy
4
i
, as a test for neglected nonlinearity. There are a
couple of ways to do so. Ramsey suggests adding these terms to equation (6.23) and
doing a standard F test [which would have an approximate F
3; NÀKÀ3
distribution
under equation (6.23) and the homoskedasticity assumption Eðu
2
jxÞ¼s
2
]. Another
possibility is to use an LM test: Regress
^
uu
i
onto x
i

,
^
yy
2
i
,
^
yy
3
i
, and
^
yy
4
i
and use N times
the R-squared from this regression as w
2
3
. The methods discussed in Chapter 4 for
obtaining heteroskedasticity-robust statistics can be applied here as well. Ram sey’s
test uses generated regressors, but the null is that each generated regressor has zero
population coe‰cient, and so the usual limit theory applies. (See Section 6.1.1.)
There is some misunderstanding in the testing literature about the merits of
RESET. It has been claimed that RESET can be used to test for a multitude of
specification problems, including omitted variables and heteroskedasticity. In fact,
RESET is generally a poor test for either of these problems. It is easy to write down
models where an omitted variable, say q, is highly correlated with each x, but RESET
has the same distribution that it has under H
0

. A leading case is seen when Eðq jxÞ is
linear in x. Then Eðy jxÞ is linear in x [even though Eðy jxÞ0 Eðy jx; qÞ, and the
asymptotic power of RESET equals its asymptotic size. See Wooldridge (1995b) and
Problem 6.4a. The following is an empirical illustration.
Example 6.4 (Testing for Neglected Nonlinearities in a Wage Equation): We use
OLS and the data in NLS80.RAW to estimate the equation from Example 4.3:
logðwageÞ¼b
0
þ b
1
exper þ b
2
tenure þ b
3
married þb
4
south
þ b
5
urban þ b
6
black þb
7
educ þ u
The null hypothesis is that the expected value of u given the explanatory variables
in the equation is zero. The R-squared from the regression
^
uu on x,
^
yy

2
, and
^
yy
3
yields
R
2
u
¼ :0004, so the chi-square statistic is .374 with p-value A:83. (Adding
^
yy
4
only
increases the p-value.) Therefore, RESET provides no evidence of functional form
misspecification.
Even though we already know IQ shows up very significantly in the equation
(t statistic ¼ 3.60—see Example 4.3), RESET does not, and should not be expected
to, detect the omitted variable problem. It can only test whether the expected value
of y given the variables actually in the regression is linear in those variables.
6.2.4 Testing for Heteroskedasticity
As we have seen for both OLS and 2SLS, heteroskedasticity does not a¤ect the con-
sistency of the estimators, and it is only a minor nuisance for inference. Nevertheless,
sometimes we want to test for the presence of heteroskedasticity in order to justify use
Additional Single-Equation Topics 125
of the usual OLS or 2SLS statistics. If heteroskedasticity is present, more e‰cient
estimation is possible.
We begin with the case where the explanatory variables are exogenous in the sense
that u has zero mean given x:
y ¼ b

0
þ xb þ u; Eðu jxÞ¼0
The reason we do not assume the weaker assumption Eðx
0
uÞ¼0 is that the fol-
lowing class of tests we derive—which encompasses all of the widely used tests for
heteroskedasticity—are not valid unless Eðu jxÞ¼0 is maintained under H
0
. Thus
we maintain that the mean Eðy jxÞ is correctly specified, and then we test the con-
stant conditional variance assumption. If we do not assume correct specification of
Eðy jxÞ, a significant heteroskedasticity test might just be detecting misspecified
functional form in Eðy jxÞ; see Problem 6.4c.
Because Eðu jxÞ¼0, the null hypothesis can be stated as H
0
:Eðu
2
jxÞ¼s
2
.
Under the alternative, Eðu
2
jxÞ depends on x in some way. Thus it makes sense to
test H
0
by looking at covariances
Cov½hðxÞ; u
2
ð6:25Þ
for some 1 Â Q vector function hðxÞ. Under H

0
, the covariance in expression (6.25)
should be zero for any choice of hðÁÞ.
Of course a general way to test zero correlation is to use a regression. Putting i
subscripts on the variables, write the model
u
2
i
¼ d
0
þ h
i
d þ v
i
ð6:26Þ
where h
i
1 hðx
i
Þ; we make the standard rank assumption that Varðh
i
Þ has rank Q,so
that there is no perfect collinearity in h
i
. Under H
0
,Eðv
i
jh
i

Þ¼Eðv
i
jx
i
Þ¼0, d ¼ 0,
and d
0
¼ s
2
. Thus we can apply an F test or an LM test for the null H
0
: d ¼ 0
in equation (6.26). One thing to notice is that v
i
cannot have a normal distribution
under H
0
: because v
i
¼ u
2
i
À s
2
; v
i
b Às
2
. This does not matter for asymptotic anal-
ysis; the OLS regression from equation (6.26) gives a consi stent,

ffiffiffiffiffi
N
p
-asymptotically
normal estimator of d whether or not H
0
is true. But to apply a standard F or LM
test, we must assume that, under H
0
,Eðv
2
i
jx
i
Þ is constant: that is, the errors in
equation (6.26) are homoskedastic. In terms of the original error u
i
, this assumption
implies that
Eðu
4
i
jx
i
Þ¼constant 1 k
2
ð6:27Þ
under H
0
. This is called the homokurtosis (constant conditional fourth moment) as-

sumption. Homokurtosis always holds when u is independent of x, but there are
Chapter 6126
conditional distributions for which Eðu jxÞ¼0 and Varðu jxÞ¼s
2
but Eðu
4
jxÞ
depends on x.
As a practical matter, we cannot test d ¼ 0 in equation (6.26) directly because u
i
is
not observed. Since u
i
¼ y
i
À x
i
b and we have a consistent estimator of b, it is natu-
ral to replace u
2
i
with
^
uu
2
i
, where the
^
uu
i

are the OLS residuals for observation i. Doing
this step and applying, say, the LM principle, we obtain NR
2
c
from the regression
^
uu
2
i
on 1; h
i
; i ¼ 1; 2; ; N ð6:28Þ
where R
2
c
is just the usual centered R-squared. Now, if the u
2
i
were used in place of
the
^
uu
2
i
, we know that, under H
0
and condition (6.27), NR
2
c
@

a
w
2
Q
, where Q is the di-
mension of h
i
.
What adjustment is needed because we have estimated u
2
i
? It turns out that, be-
cause of the structure of these tests, no adjustment is needed to the asymptotics. (This
statement is not generally true for regressions where the dependent variable has been
estimated in a first stage; the current setup is special in that regard.) After tedious
algebra, it can be shown that
N
À1=2
X
N
i¼1
h
0
i
ð
^
uu
2
i
À

^
ss
2
Þ¼N
À1=2
X
N
i¼1
ðh
i
À m
h
Þ
0
ðu
2
i
À s
2
Þþo
p
ð1Þð6:29Þ
see Problem 6.5. Along with condition (6.27), this equation can be shown to justify
the NR
2
c
test from regression (6.28).
Two popular tests are special cases. Koenker’s (1981) version of the Breusch and
Pagan (1979) test is obtained by taking h
i

1 x
i
, so that Q ¼ K. [The original version
of the Breusch-Pagan test relies heavily on normality of the u
i
, in particular k
2
¼ 3s
2
,
so that Koenker’s version based on NR
2
c
in regression (6.28) is preferred.] White’s
(1980b) test is obtained by taking h
i
to be all nonconstant, unique elements of x
i
and
x
0
i
x
i
: the levels, squares, and cross products of the regressors in the conditional mean.
The Breusch-Pagan and White tests have degrees of freedom that depend on the
number of regressors in Eðy jxÞ. Sometimes we want to conserve on degrees of free-
dom. A test that combines features of the Breusch-Pagan and White tests, but which
has only two dfs, takes
^

hh
i
1 ð
^
yy
i
;
^
yy
2
i
Þ, where the
^
yy
i
are the OLS fitted values. (Recall
that these are linear functions of the x
i
.) To justify this test, we must be able to re-
place hðx
i
Þ with hðx
i
;
^
bbÞ. We discussed the generated regressors problem for OLS in
Section 6.1.1 and concluded that, for testing purposes, using estimates from earlier
stages causes no complications. This is the case here as well: NR
2
c

from
^
uu
2
i
on 1,
^
yy
i
,
^
yy
2
i
,
i ¼ 1; 2; ; N has a limiting w
2
2
distribution under the null, along with condition
(6.27). This is easily seen to be a special case of the White test because ð
^
yy
i
;
^
yy
2
i
Þ con-
tains two linear combinations of the squares and cross products of all elements in x

i
.
Additional Single-Equation Topics 127
A simple modification is available for relaxing the auxiliary hom okurtosis as-
sumption (6.27). Following the work of Wooldridge (1990)—or, working directly
from the representation in equation (6.29), as in Problem 6.5—it can be shown that
N À SSR
0
from the regression (without a constant)
1onðh
i
À hÞð
^
uu
2
i
À
^
ss
2
Þ; i ¼ 1; 2; ; N ð6:30Þ
is distributed asymptotically as w
2
Q
under H
0
[there are Q regressors in regression
(6.30)]. This test is very similar to the heteroskedasticity-robust LM stati stics derived
in Chapter 4. It is sometimes called a heterokurtosis-robust test for heteroskedasticity.
If we allow some elements of x

i
to be endogenous but assume we have instruments
z
i
such that E ðu
i
jz
i
Þ¼0 and the rank condition holds, then we c an test H
0
:Eðu
2
i
jz
i
Þ
¼ s
2
(which implies Assumption 2SLS.3). Let h
i
1 hðz
i
Þ be a 1 ÂQ function of the
exogenous variables. The statistics are computed as in either regression (6.28) or
(6.30), depending on whether the homokurtosis is maintained, where the
^
uu
i
are the
2SLS residuals. There is, however, one caveat. For the validity of the asymptotic

variances that these regressions implicitly use, an additional assumption is needed
under H
0
: Covðx
i
; u
i
jz
i
Þ must be constant. This covariance is zero when z
i
¼ x
i
,so
there is no additional assumption when the regressors are exogenous. Without the
assumption of constant condition al covariance, the tests for heteroskedasticity are
more complicated. For details, see Wooldridge (1990).
You should remember that h
i
(or
^
hh
i
) must only be a function of exogenous vari-
ables and estimated parameters; it should not depend on endogenous elements of x
i
.
Therefore, when x
i
contains endogenous variables, it is not valid to use x

i
^
bb and
ðx
i
^
bbÞ
2
as elements of
^
hh
i
.Itis valid to use, say,
^
xx
i
^
bb and ð
^
xx
i
^
bbÞ
2
, where the
^
xx
i
are the
first-stage fitted values from regressing x

i
on z
i
.
6.3 Single-Equation Methods under Other Sampling Schemes
So far our treatment of OLS and 2SLS has been explicitly for the case of random
samples. In this section we briefly discuss some issues that arise for other sampling
schemes that are sometimes assumed for cross section data.
6.3.1 Pooled Cross Sections over Time
A data structure that is useful for a variety of purposes, including policy analysis, is
what we will call pooled cross sections over time. The idea is that during each year a
new random sample is taken from the relevant population. Since distributions of
variables tend to change over time, the identical distribution assumption is not usu-
ally valid, but the independence assumption is. This approach gives rise to indepen-
Chapter 6128
dent, not identically distributed (i.n.i. d.) observations. It is important not to confuse a
pooling of independent cross sections with a di¤erent data structure, panel data,
which we treat starting in Chapter 7. Briefly, in a panel data set we follow the same
group of individuals, firms, cities, and so on over time. In a pooling of cross sections
over time, there is no replicability over time. (Or, if units appear in mo re than one
time period, their recurrence is treated as coincidental and ignored.)
Every method we have learned for pure cross section analysis can be applied to
pooled cross sections, including corrections for heteroskedasticity, specification test-
ing, instrumental variables, and so on. But in using pooled cross sections, we should
usually include year (or other time period) dummies to account for aggregate changes
over time. If year dummies appear in a model, and it is estimated by 2SLS, the year
dummies are their own instruments, as the passage of time is exogenous. For an ex-
ample, see Problem 6.8. Time dummies can also appear in tests for heteroskedasticity
to determine whether the unconditional error variance has changed over time.
In some cases we interact some explanatory variables with the time dummies to

allow partial e¤ects to change ove r time. This procedure can be very useful for policy
analysis. In fact, much of the recent literature in policy analyis using natural experi-
ments can be cast as a pooled cross section analysis with appropriately chosen
dummy variables and interactions.
In the simplest case, we have two time periods, say year 1 and year 2. There are
also two groups, which we will call a control group and an experimental group or
treatment group. In the natural experiment literature, people (or firms, or cities, and
so on) find themselves in the treatment group essentially by accident. For example, to
study the e¤ects of an unexpected change in unemployment insurance on unemploy-
ment duration, we choose the treatment group to be unemployed individuals from a
state that has a change in unemployment compensation. The control group could be
unemployed workers from a neighboring state. The two time periods chosen would
straddle the pol icy change.
As another example, the treatment group might consist of houses in a city under-
going unexpected property tax reform, and the control group would be houses in a
nearby, similar town that is not subject to a property tax change. Again, the two (or
more) years of data would include the period of the policy change. Treatment means
that a house is in the city undergoing the regime change.
To formalize the discussion, call A the control group, and let B denote the treat-
ment group; the dummy variable dB equals unity fo r those in the treatment group
and is zero otherwise. Letting d2 denote a dummy variable for the second (post-policy-
change) time period, the simplest equation for analyzing the impact of the policy
change is
Additional Single-Equation Topics 129
y ¼ b
0
þ d
0
d2 þb
1

dB þ d
1
d2 ÁdB þu ð6:31Þ
where y is the outcome variable of interest. The period dummy d2 captures aggregate
factors that a¤ect y over time in the same way for both groups. The presence of dB
by itself captures possible di¤erences between the treatment and control groups be-
fore the policy change occurs. The coe‰cient of interest, d
1
, multiplies the interaction
term, d2 Á dB (which is simply a dummy variable equal to unity for those observations
in the treatment group in the second year).
The OLS estimator,
^
dd
1
, has a very interesting interpretation. Let y
A; 1
denote the
sample average of y for the control group in the first year, and let
y
A; 2
be the average
of y for the control group in the second year. Define
y
B; 1
and y
B; 2
similarly. Then
^
dd

1
can be expressed as
^
dd
1
¼ðy
B; 2
À y
B; 1
ÞÀðy
A; 2
À y
A; 1
Þð6:32Þ
This estimator has been labeled the di¤erence-in-di¤erences (DID) estimator in the
recent program evaluation literature, although it has a long history in analysis of
variance.
To see how e¤ective
^
dd
1
is for estimating policy e¤ects, we can compare it with some
alternative estimators. One possibility is to ignore the control group completely and
use the change in the mean over time for the treatment group,
y
B; 2
À y
B; 1
, to measure
the policy e¤ect. The problem with this estimator is that the mean response can

change over time for reasons unrelated to the policy change. Another possibility is to
ignore the first time period and compute the di¤erence in means for the treatment
and control groups in the second time period,
y
B; 2
À y
A; 2
. The problem with this pure
cross section approach is that there might be systematic, unmeasured di¤erences in
the treatment and control groups that have nothing to do with the treatment; attrib-
uting the di¤erence in averages to a particular policy might be misleading.
By comparing the time changes in the means for the treatment and control groups,
both group-specific and time-specific e¤ects are allowed for. Nevertheless, unbiased-
ness of the DID estimator still requires that the policy change not be systematically
related to other factors that a¤ect y (and are hidden in u).
In most applications, additional covariates appear in equation (6.31); for example,
characteristics of unemployed people or housing characteri stics. These account for
the possibility that the random samples within a group have systematically di¤er-
ent characteristics in the two time periods. The OLS estimator of d
1
no longer has
the simple representation in equation (6.32), but its interpretation is essentially
unchanged.
Chapter 6130
Example 6.5 (Length of Time on Workers’ Compensation): Meyer, Viscusi, and
Durbin (1995) (hereaft er, MVD) study the length of time (in weeks) that an injured
worker receives workers’ compensation. On July 15, 1980, Kentucky raised the cap
on weekly earnings that were cov ered by workers’ compensation. An increase in the
cap has no e¤ect on the benefit for low-income workers, but it makes it less costly
for a high-income worker to stay on workers’ comp. Therefore, the control group is

low-income workers, and the treatment group is high-income workers; high-income
workers are defined as those for whom the pre-policy-change cap on benefits is
binding. Using random samples both before and after the policy change, MVD are
able to test whether more generous workers’ compensation causes people to stay out
of work longer (everything else fixed). MVD start with a di¤erence-in-di¤erences
analysis, using log(durat) as the dependent variable. The variable afchnge is the
dummy variable for observations after the policy change, and highearn is the dummy
variable for high earners. The estimated equation is
logð
^
dduratÞ¼ 1:126
ð0:031Þ
þ :0077
ð:0447Þ
afchnge þ :256
ð:047Þ
highearn
þ :191
ð:069Þ
afchngeÁhighearn ð6:33Þ
N ¼ 5; 626; R
2
¼ :021
Therefore,
^
dd
1
¼ :191 ðt ¼ 2:77Þ, which implies that the average duration on workers’
compensation increased by about 19 percent due to the higher earnings cap. The co-
e‰cient on afchnge is small and statistically insignificant: as is expected, the increase

in the earnings cap had no e¤ect on duration for low-earnings workers. The coe‰-
cient on highearn shows that, even in the absence of any change in the earnings cap,
high earners spent much more time—on the order of 100 Á½expð:256ÞÀ1¼29:2
percent—on workers’ compensation.
MVD also add a variety of controls for gender, marital status, age, industry, and
type of injury. These allow for the fact that the kind of people and type of injuries
di¤er systematically in the two years. Perhaps not surprisingly, controlling for these
factors has little e¤ect on the estimate of d
1
; see the MVD article and Problem 6.9.
Sometimes the two groups consist of people or cities in di¤erent states in the
United States, often close geographically. For example, to assess the impact of
changing alcohol taxes on alcohol consumption, we can obtain random samples on
individuals from two states for two years. In state A, the control group, there was no
Additional Single-Equation Topics 131
change in alcohol taxes. In state B, taxes increased between the two years. The out-
come variable would be a measure of alcohol consumption, and equation (6.31) can
be estimated to determine the e¤ect of the tax on alcohol consumption. Other factors,
such as age, educat ion, and gender can be controlled for, although this procedure is
not necessary for consistency if sampling is random in both years and in both states.
The basic equation (6.31) can be easily modified to allow for continuous, or at least
nonbinary, ‘‘treatments.’’ An example is given in Problem 6.7, where the ‘‘treatment’’
for a particular home is its distance from a garbage incinerator site. In other words,
there is not really a control group: each unit is put somewhere on a continuum of
possible treatments. The analysis is similar because the treatment dummy, dB,is
simply replaced with the nonbinary treatment.
For a survey on the natural experiment methodology, as well as several additional
examples, see Meyer (1995).
6.3.2 Geographically Stratified Samples
Various kinds of stratified sampling, where units in the sample are represented with

di¤erent frequencies than they are in the population, are also common in the social
sciences. We treat general kinds of stratification in Chapter 17. Here, we discuss some
issues that arise with geographical stratification, where random samples are taken
from separate geographical units.
If the geographically stratified sample can be treated as being independent but not
identically distributed, no substantive modificati ons are needed to apply the previous
econometric methods. However, it is prudent to allow di¤erent intercepts across
strata, and even di¤erent slopes in some cases. For example, if people are sampled
from states in the United States, it is often important to include state dummy vari-
ables to allow for systematic di¤erences in the response and explanatory variables
across states.
If we are interested in the e¤ects of variables measured at the strata level, and the
individual observations are correlated because of unobserved strata e¤ects, estima-
tion and inference are much more complicated. A model with strata-level covariates
and within-strata correlation is
y
is
¼ x
is
b þ z
s
g þ q
s
þ e
is
ð6:34Þ
where i is for individual and s is for stratum. The covariates in x
is
change with the
individual, while z

s
changes only at the strata level. That is, there is correlation in the
covariates across individuals within the same stratum. The variable q
s
is an unob-
served stratum e¤ect. We would typically assume that the observations are inde-
pendently distributed across strata, that the e
is
are independent across i, and that
Chapter 6132
Eðe
is
jX
s
; z
s
; q
s
Þ¼0 for all i and s—where X
s
is the set of explanatory variables for
all units in stratum s—and q
s
is an unobserved stratum e¤ect.
The presence of the unobservable q
s
induces correlation in the composite error
u
is
¼ q

s
þ e
is
within each stratum. If we are interested in the coe‰cients on the
individual-specific variables, that is, b, then there is a simple solution: include stra-
tum dummies along with x
is
. That is, we estimate the model y
is
¼ a
s
þ x
is
b þ e
is
by
OLS, where a
s
is the stratum-specific intercept.
Things are more interesting when we want to estimate g. The OLS estimators of b
and g in the regression of y
is
on x
is
, z
s
are still unbiased if Eðq
s
jX
s

; z
s
Þ¼0, but
consistency and asymptotic normality are tricky, because, with a small number of
strata and many observations within each stratum, the asymptotic analysis makes
sense only if th e number of observations within each stratum grows, usually with the
number of strata fixed. Because the observations within a stratum are correlated, the
usual law of large numbers and central limit theorem cannot be applied. By means of
a simulation study, Moulton (1990) shows that ignoring the within-group correlation
when obtaining standard errors for
^
gg can be very misleading. Moulton also gives
some corrections to the OLS standard errors, but it is not clear what kind of asymp-
totic analysis justifies them.
If the strata are, say, states in the United States, and we are interested in the e¤ect
of state-level policy variables on economic behavior, one way to proceed is to use
state-level data on all variables. This avoids the within-stratum correlation in the
composite error in equation (6.34). A drawback is that state policies that can be
taken as exogenous at the individual level are often endogenous at the aggregate
level. However, if z
s
in equation (6.34) contains policy variables, perhaps we should
question whether these would be uncorrelated with q
s
.Ifq
s
and z
s
are correlated,
OLS using individual-level data would be biased and inconsistent.

Related issues arise when aggregate-level variables are used as instruments in
equations describing individual behavior. For example, in a birth weight equation,
Currie and Cole (1993) use measures of state-level AFDC benefits as instruments for
individual women’s participation in AFDC. (Therefore, the binary endogenous ex-
planatory variable is at the ind ividual level, while the instruments are at the state
level.) If state-level AFDC benefits are exogenous in the birth weight equation, and
AFDC participation is su‰ciently correlat ed with state benefit levels—a question
that can be checked using the first-stage regression—then the IV approach will yield
a consistent estimator of the e¤ect of AFDC participation on birth weight.
Mo‰tt (1996) discusses assumptions under which using aggregate-level IVs yields
consistent estimators. He gives the example of using observations on workers from
two cities to estimate the impact of job training programs. In each city, some people
Additional Single-Equation Topics 133
received some job training whil e others did not. The key element in x
is
is a job training
indicator. If, say, city A exogenously o¤ered more job training slots than city B,a
city dummy variable can be used as an IV for whether each worker received training.
See Mo‰tt (1996) and Problem 5.13b for an interpretation of such estimators.
If there are unobserved group e¤ects in the error term, then at a minimum, the
usual 2SLS standard errors will be inappropriate. More problematic is that aggregate-
level variables might be correlated with q
s
. In the birth weight example, the level of
AFDC benefits might be correlated with unobserved health care quality variables
that are in q
s
. In the job training example, city A may have spent more on job train-
ing because its workers are, on average, less productive than the workers in city B.
Unfortunately, controlling for q

s
by putting in strata dummies and applying 2SLS
does not work: by definition, the instruments only vary across strata—not within
strata—and so b in equation (6.34) would be unidentified. In the job traini ng exam-
ple, we would put in a dummy variable for city of residence as an explanatory vari-
able, and therefore we could not use this dummy variable as an IV for job training
participation: we would be short one instrument.
6.3.3 Spatial Dependence
As the previous subsection suggests, cross section data that are not the result of
independent sampling can be di‰cult to handle. Spatial correlation, or, more gen-
erally, spatial dependence, typically occurs when cross section units are large relative
to the population, such as when data are collected at the county, state, province, or
country level. Outcomes from adjacent units are likely to be correlated. If the corre-
lation arises mainly through the explanatory variables (as opposed to unobservables),
then, practically speaking, nothing needs to be done (although the asymptotic anal-
ysis can be complicated). In fact, sometimes covariates for one county or state appear
as explanatory variables in the equation for neighboring units, as a way of capturing
spillover e¤ects. This fact in itself causes no real di‰culties.
When the unobservables are correlated across nearby geographical units, OLS can
still have desirable properties—often unbiasedness, consistency, and asymptotic nor-
mality can be established—but the asymptotic arguments are not nearly as unified as
in the random sampling case, and estimating asymptotic variances becomes di‰cult.
6.3.4 Cluster Samples
Cluster sampling is another case where cross section observations are correlated, but
it is somewhat easier to handle. The key is that we randomly sample a large number
of clusters, and each cluster consists of relatively few units (compared with the overall
sample size). While we allow the units within each cluster to be correlated, we assume
Chapter 6134
independence across clusters. An example is studying teenage peer e¤ects using a
large sample of neighborhoods (the clusters) with relatively few teenagers per neigh-

borhood. Or, using siblings in a large sample of families. The asymptotic analysis is
with fixed cluster sizes with the number of clusters getting large. As we will see in
Section 11.5, handling within-cluster correlation in this context is relatively straight-
forward. In fact, when the explanatory variables are exogenous, OLS is consistent
and asymptotically normal, but the asymptotic variance matrix needs to be adjusted.
The same holds for 2SLS.
Problems
6.1. a. In Problem 5.4d, test the null hypothesis that educ is exogenous.
b. Test the the single overidentifying restriction in this example.
6.2. In Problem 5.8b, test the null hypothesis that educ and IQ are exogenous in the
equation estimated by 2SLS.
6.3. Consider a model for individual data to test whether nutrition a¤ects produc -
tivity (in a developing country):
logðproducÞ¼d
0
þ d
1
exper þ d
2
exper
2
þ d
3
educ þ a
1
calories þ a
2
protein þ u
1
ð6:35Þ

where produc is some measure of worker productivity, calories is caloric intake per
day, and protein is a measure of protein intake per day. Assume here that exper,
exper
2
, and educ are all exogenous. The variables calories and protein are possibly
correlated with u
1
(see Strauss and Thomas, 1995, for discussion). Possible instru -
mental variables for calories and protein are regional prices of various goods such as
grains, meats, breads, dairy products, and so on.
a. Under what circumstances do prices make good IVs for calories and proteins?
What if prices reflect quality of food?
b. How many prices are needed to identify equation (6.35)?
c. Suppose we have M prices, p
1
; ; p
M
. Explain how to test the null hypothesis
that calories and protein are exogenous in equation (6.35).
6.4. Consider a structural linear model with unobserved variable q:
y ¼ xb þ q þ v; Eðv jx; qÞ¼0
Additional Single-Equation Topics 135
Suppose, in addition, that Eð q jxÞ¼xd for some K Â 1 vector d; thus, q and x are
possibly correlated.
a. Show that Eðy jxÞ is linear in x. What consequences does this fact have for tests of
functional form to detect the presence of q? Does it matter how strongly q and x are
correlated? Explain.
b. Now add the assumptions Varðv jx; qÞ¼s
2
v

and Varðq jxÞ¼s
2
q
. Show that
Varðy jxÞ is constant. [Hint: Eðqv jxÞ¼0 by iterated expectations.] What does this
fact imply about using tests for heteroskedasticity to detect omitted variables?
c. Now write the equation as y ¼ xb þu, where Eðx
0
uÞ¼0 and Varðu jx Þ¼s
2
.If
Eðu jxÞ0 EðuÞ, argue that an LM test of the form (6.28) will detect ‘‘hete ro-
skedasticity’’ in u, at least in large samples.
6.5. a. Verify equation (6.29) under the assumptions Eðu jxÞ¼0 and Eðu
2
jxÞ¼s
2
.
b. Show that, under the additional assumption (6.27),
E½ðu
2
i
À s
2
Þ
2
ðh
i
À m
h

Þ
0
ðh
i
À m
h
Þ ¼ h
2
E½ðh
i
À m
h
Þ
0
ðh
i
À m
h
Þ
where h
2
¼ E½ðu
2
À s
2
Þ
2
.
c. Explain why parts a and b imply that the LM statistic from regression (6.28) has a
limiting w

2
Q
distribution.
d. If condition (6.27) does not hold, obtain a consistent estimator of
E½ðu
2
i
À s
2
Þ
2
ðh
i
À m
h
Þ
0
ðh
i
À m
h
Þ. Show how this leads to the heterokurtosis-robust
test for heteroskedasticity.
6.6. Using the test for heteroskedasticity based on the auxiliary regression
^
uu
2
on
^
yy,

^
yy
2
, test the log(wage) equation in Example 6.4 for heteroskedasticity. Do you detect
heteroskedasticity at the 5 percent level?
6.7. For this problem use the data in HPRICE.RAW, which is a subset of the data
used by Kiel and McClain (1995). The file conta ins housing prices and characteristics
for two years, 1978 and 1981, for homes sold in North Andover, Massachusetts. In
1981 construction on a garbage incinerator began. Rumors about the incinerator
being built were circulating in 1979, and it is for this reason that 1978 is used as the
base year. By 1981 it was very clear that the incinerator would be operating soon.
a. Using the 1981 cross section, estimate a bivariate, constant elasticity model relat-
ing housing price to distance from the incinerator. Is this regressi on appropriate for
determining the causal e¤ects of incinerator on housing prices? Explain.
b. Pooling the two years of data, consider the model
Chapter 6136
logðpriceÞ¼d
0
þ d
1
y81 þ d
2
logðdistÞþd
3
y81 Á logðdistÞþu
If the incinerator has a negative e¤ect on housing prices for homes closer to the
incinerator, what sign is d
3
? Estimate this model and test the null hypothesis that
building the incinerator had no e¤ect on housing prices.

c. Add the variables log(intst), ½logðintstÞ
2
, log(area), log(land ), age, age
2
, rooms,
baths to the model in part b, and test for an incinerator e¤ect. What do you conclude?
6.8. The data in FERTIL1.RAW are a pooled cross section on more than a thou-
sand U.S. women for the even years between 1972 and 1984, inclusive; the data set is
similar to the one used by Sander (1992). These data can be used to study the rela-
tionship between women’s education and fertility.
a. Use OLS to estimate a model relating number of children ever born to a woman
(kids) to years of education, age, region, race, and type of environment reared in.
You should use a quadratic in age and should include year dummies. What is the
estimated relationship between fertility and education? Holding other factors fixed,
has there been any notable secular change in fertility over the time period?
b. Reestimate the model in part a, but use motheduc and fatheduc as instruments for
educ. First check that these instruments are su‰ciently partially correlated with educ.
Test whether educ is in fact exogenous in the fertility equation.
c. Now allow the e¤ect of education to change over time by including interaction
terms such as y74Áeduc, y76 Áeduc, and so on in the model. Use interactions of time
dummies and parents’ education as instruments for the interaction terms. Test that
there has been no change in the relationship between fertility and education over
time.
6.9. Use the data in INJURY.RAW for this question.
a. Using the data for Kentucky, reestimate equation (6.33) adding as explanatory
variables male, married, and a full set of industry- and injury-type dummy variables.
How does the estimate on afchngeÁhighearn change when these other factors are
controlled for? Is the estimate still statistically significant?
b. What do you make of the small R-squared from part a? Does this mean the
equation is useless?

c. Estimate equation (6.33) using the data for Michigan. Compare the estimate on the
interaction term for Michigan and Kentucky, as well as their statistical significance.
6.10. Consider a regression model with interactions and squares of some explana-
tory variables: Eðy jxÞ¼zb, where z contains a constant, the elements of x, and
quadratics and interactions of terms in x.
Additional Single-Equation Topics 137
a. Let m ¼ EðxÞ be the population mean of x, and let x be the sample average based
on the N available observations. Let
^
bb be the OLS estimator of b using the N obser-
vations on y and z. Show that
ffiffiffiffiffi
N
p
ð
^
bb À bÞ and
ffiffiffiffiffi
N
p
ð
x À mÞ are asymptotically un-
correlated. [Hint: Write
ffiffiffiffiffi
N
p
ð
^
bb À bÞ as in equation (4.8), and ignore the o
p

(1) term.
You will need to use the fact that Eðu jxÞ¼0:]
b. In the model of Problem 4.8, use part a to argue that
Avarð
^
aa
1
Þ¼Avarð
~
aa
1
Þþb
2
3
Avarðx
2
Þ¼Avarð
~
aa
1
Þþb
2
3
ðs
2
2
=NÞ
where a
1
¼ b

1
þ b
3
m
2
,
~
aa
1
is the estimator of a
1
if we knew m
2
, and s
2
2
¼ Varðx
2
Þ.
c. How would you obtain the correct asymptotic standard error of
^
aa
1
, having run the
regression in Problem 4.8d? [ Hint: The standard error you get from the regression is
really seð
~
aa
1
Þ. Thus you can square this to estimate Avarð

~
aa
1
Þ, then use the preceding
formula. You need to estimate s
2
2
,too.]
d. Apply the result from part c to the model in Problem 4.8; in particular, find the
corrected asymptotic standard error for
^
aa
1
, and compare it with the uncorrected one
from Problem 4.8d. (Both can be nonrobust to heteroskedasticity.) What do you
conclude?
6.11. The following wage equation represents the populations of working people in
1978 and 1985:
logðwageÞ¼b
0
þ d
0
y85 þ b
1
educ þ d
1
y85Áeduc þb
2
exper
þ b

3
exper
2
þ b
4
union þ b
5
female þ d
5
y85Áfemale þ u
where the explanatory variables are standard. The variable union is a dummy vari-
able equal to one if the person belongs to a union and zero ot herwise. The variable
y85 is a dummy variable equal to one if the observation comes from 1985 and zero if
it comes from 1978. In the file CPS78_85.RAW there are 550 workers in the sample
in 1978 and a di¤erent set of 534 people in 1985.
a. Estimate this equation and test whether the return to education has changed ove r
the seven-year period.
b. What has happened to the gender gap over the period?
c. W ages are measured in nominal dollars. What coe‰cients would change if we
measure wage in 1978 dollars in both years? [Hint: Use the fact that for all 1985
observations, logðwage
i
=P85Þ¼logðwage
i
ÞÀlogðP85Þ, where P85 is the common
deflator; P85 ¼ 1:65 according to the Consumer Price Index.]
d. Is there evidence that the variance of the error has changed over time?
Chapter 6138
e. With wages measured nominally, and holding other factors fixed, what is the
estimated increase in nominal wage for a male with 12 years of education? Propose a

regression to obtain a confidence interval for this estimate. (Hint: You must replace
y85Áeduc with something else.)
6.12. In the linear model y ¼ xb þ u, assume that Assumptions 2SLS.1 and 2SLS.3
hold with w in place of z, where w contains all nonredundant elements of x and z.
Further, assume that the rank conditions hold for OLS and 2SLS. Show that
Avar½
ffiffiffiffiffi
N
p
ð
^
bb
2SLS
À
^
bb
OLS
Þ ¼ Avar½
ffiffiffiffiffi
N
p
ð
^
bb
2SLS
À bÞ À Avar½
ffiffiffiffiffi
N
p
ð

^
bb
OLS
À bÞ
[Hint: First, Avar½
ffiffiffiffiffi
N
p
ð
^
bb
2SLS
À
^
bb
OLS
Þ ¼ V
1
þ V
2
ÀðC þC
0
Þ, where V
1
¼ Avar Á
½
ffiffiffiffiffi
N
p
ð

^
bb
2SLS
À bÞ, V
2
¼ Avar½
ffiffiffiffiffi
N
p
ð
^
bb
OLS
À bÞ, and C is the asymptotic covariance
between
ffiffiffiffiffi
N
p
ð
^
bb
2SLS
À bÞ and
ffiffiffiffiffi
N
p
ð
^
bb
OLS

À bÞ. You can stack the formulas for the
2SLS and OLS estimators and show that C ¼ s
2
½Eðx
Ã0
x
Ã
Þ
À1
Eðx
Ã0
xÞ½Eðx
0
xÞ
À1
¼
s
2
½Eðx
0
xÞ
À1
¼ V
2
. To show the second equality, it will be helpful to use Eðx
Ã0
xÞ¼
Eðx
Ã0
x

Ã
Þ:]
Appendix 6A
We derive the asymptotic distribution of the 2SLS estimator in an equation with
generated regressors and generated instruments. The tools needed to make the proof
rigorous are introduced in Chapter 12, but the key components of the proof can be
given here in the context of the linear model. Write the model as
y ¼ xb þ u; Eðu jvÞ¼0
where x ¼ fðw; dÞ , d is a Q Â 1 vector, and b is K Â1. Let
^
dd be a
ffiffiffiffiffi
N
p
-consistent e s-
timator of d. The instruments for each i are
^
zz
i
¼ gðv
i
;
^
llÞ where gðv; lÞ is a 1 ÂL
vector, l is an S Â 1 vector of parameters, and
^
ll is
ffiffiffiffiffi
N
p

-consistent for l. Let
^
bb be the
2SLS estimator from the equation
y
i
¼
^
xx
i
b þ error
i
where
^
xx
i
¼ fðw
i
;
^
ddÞ, using instruments
^
zz
i
:
^
bb ¼
X
N
i¼1

^
xx
0
i
^
zz
i
!
X
N
i¼1
^
zz
0
i
^
zz
i
!
À1
X
N
i¼1
^
zz
0
i
^
xx
i

!
2
4
3
5
À1
X
N
i¼1
^
xx
0
i
^
zz
i
!
X
N
i¼1
^
zz
0
i
^
zz
i
!
À1
X

N
i¼1
^
zz
0
i
y
i
!
Write y
i
¼
^
xx
i
b þðx
i
À
^
xx
i
Þb þ u
i
, where x
i
¼ fðw
i
; dÞ. Plugging this in and multi-
plying through by
ffiffiffiffiffi

N
p
gives
Additional Single-Equation Topics 139

×