Tải bản đầy đủ (.pdf) (12 trang)

Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 3 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (133.86 KB, 12 trang )

3Basic Asymptotic Theory
This chapter summarizes some definitions and limit theorems that are important for
studying large-sample theory. Most claims are stated without proof, as several re-
quire tedious epsilon-delta arguments. We do prove some results that build on fun-
damental definitions and theorems. A good, general reference for background in
asymptotic analysis is White (1984). In Chapter 12 we introduce further asymptotic
methods that are required for studying nonlinear models.
3.1 Convergence of Deterministic Sequences
Asymptotic analysis is concerned with the various kinds of convergence of sequences
of estimators as the sample size grows. We begin with some definitions regarding
nonstochastic sequences of numbers. When we apply these results in econometrics, N
is the sample size, and it runs through all positive integers. You are assumed to have
some familiarity with the notion of a limit of a sequence.
definition 3.1: (1) A sequence of nonrandom numbers f a
N
: N ¼ 1; 2; g con-
verges to a (has limit a) if for all e > 0, there exists N
e
such that if N > N
e
then
ja
N
À aj < e. We write a
N
! a as N ! y .
(2) A sequence fa
N
: N ¼ 1; 2; g is bounded if and only if there is some b < y
such that ja
N


ja b for all N ¼ 1; 2; : Otherwise, we say that fa
N
g is unbounded.
These definitions apply to vectors and matrices element by element.
Example 3.1: (1) If a
N
¼ 2 þ1=N, then a
N
! 2. (2) If a
N
¼ðÀ1Þ
N
, then a
N
does
not have a limit, but it is bounded. (3) If a
N
¼ N
1=4
, a
N
is not bounded. Because a
N
increases without bound, we write a
N
! y.
definition 3.2: (1) A sequence fa
N
g is OðN
l

Þ (at most of order N
l
)ifN
Àl
a
N
is
bounded. When l ¼ 0, fa
N
g is bounded, and we also write a
N
¼ Oð1Þ (big oh one).
(2) fa
N
g is oðN
l
Þ if N
Àl
a
N
! 0. When l ¼ 0, a
N
converges to zero, and we also
write a
N
¼ oð1Þ (little oh one).
From the definitions, it is clear that if a
N
¼ oðN
l

Þ, then a
N
¼ OðN
l
Þ; in particular,
if a
N
¼ oð1Þ, then a
N
¼ Oð1Þ. If each element of a sequence of vectors or matrices
is OðN
l
Þ, we say the sequence of vectors or matrices is OðN
l
Þ, and similarly for
oðN
l
Þ.
Example 3.2: (1) If a
N
¼ logðNÞ, then a
N
¼ oðN
l
Þ for any l > 0. (2) If a
N
¼
10 þ
ffiffiffiffiffi
N

p
, then a
N
¼ OðN
1=2
Þ and a
N
¼ oðN
ð1=2þgÞ
Þ for any g > 0.
3.2 Convergence in Probability and Bounded in Probability
definition 3.3: (1) A sequence of random variables fx
N
: N ¼ 1; 2; g converges in
probability to the constant a if for all e > 0,
P½jx
N
À aj > e!0asN ! y
We write x
N
!
p
a and say that a is the probability limit (plim) of x
N
: plim x
N
¼ a.
(2) In the special case where a ¼ 0, we also say that fx
N
g is o

p
ð1Þ (little oh p one).
We also write x
N
¼ o
p
ð1Þ or x
N
!
p
0.
(3) A sequence of random variables fx
N
g is bounded in probability if and only if
for every e > 0, there exists a b
e
< y and an integer N
e
such that
P½jx
N
jb b
e
 < e for all N b N
e
We write x
N
¼ O
p
ð1Þ (fx

N
g is big oh p one).
If c
N
is a nonrandom sequence, then c
N
¼ O
p
ð1Þ if and only if c
N
¼ Oð1Þ; c
N
¼ o
p
ð1Þ
if and only if c
N
¼ oð1Þ. A simple, and very useful, fact is that if a sequence converges
in probability to any real number, then it is bounded in probability.
lemma 3.1: If x
N
!
p
a, then x
N
¼ O
p
ð1Þ. This lemma also holds for vectors and
matrices.
The proof of Lemma 3.1 is not di‰cult; see Problem 3.1.

definition 3.4: (1) A random sequence fx
N
: N ¼ 1; 2; g is o
p
ða
N
Þ, where fa
N
g is
a nonrandom, positive sequence, if x
N
=a
N
¼ o
p
ð1Þ. We write x
N
¼ o
p
ða
N
Þ.
(2) A random sequence fx
N
: N ¼ 1; 2; g is O
p
ða
N
Þ, where fa
N

g is a non-
random, positive sequence, if x
N
=a
N
¼ O
p
ð1Þ. We write x
N
¼ O
p
ða
N
Þ.
We could have started by defining a sequence fx
N
g to be o
p
ðN
d
Þ for d A R if
N
Àd
x
N
!
p
0, in which case we obtain the definition of o
p
ð1Þ when d ¼ 0. This is where

the one in o
p
ð1Þ comes from. A similar remark holds for O
p
ð1Þ.
Example 3.3: If z is a random variable, then x
N
1
ffiffiffiffiffi
N
p
z is O
p
ðN
1=2
Þ and x
N
¼
o
p
ðN
d
Þ for any d >
1
2
.
lemma 3.2: If w
N
¼ o
p

ð1Þ, x
N
¼ o
p
ð1Þ, y
N
¼ O
p
ð1Þ, and z
N
¼ O
p
ð1Þ, then (1) w
N
þ
x
N
¼ o
p
ð1Þ; (2) y
N
þ z
N
¼ O
p
ð1Þ; (3) y
N
z
N
¼ O

p
ð1Þ; and (4) x
N
z
N
¼ o
p
ð1Þ.
In derivations, we will write relationships 1 to 4 as o
p
ð1Þþo
p
ð1Þ¼o
p
ð1Þ,O
p
ð1Þþ
O
p
ð1Þ¼O
p
ð1Þ,O
p
ð1ÞÁO
p
ð1Þ¼O
p
ð1Þ, and o
p
ð1ÞÁO

p
ð1Þ¼o
p
ð1Þ, respectively. Be-
Chapter 336
cause a o
p
ð1Þ sequence is O
p
ð1Þ, Lemma 3.2 also implies that o
p
ð1ÞþO
p
ð1Þ¼O
p
ð1Þ
and o
p
ð1ÞÁo
p
ð1Þ¼o
p
ð1Þ.
All of the previous definitions apply element by element to sequences of random
vectors or matrices. For example, if fx
N
g is a sequence of random K Â1 random
vectors, x
N
!

p
a, where a is a K Â 1 nonrandom vector, if and only if x
Nj
!
p
a
j
,
j ¼ 1; ; K. This is equivalent to kx
N
À ak!
p
0, where kbk1 ðb
0

1=2
denotes the
Euclidean length of the K Â1 vector b.Also,Z
N
!
p
B, where Z
N
and B are M Â K,
is equivalent to kZ
N
À Bk!
p
0, where kAk1 ½trðA
0

AÞ
1=2
and trðCÞ denotes the trace
of the square matrix C.
A result that we often use for studying the large-sample properties of estimators for
linear models is the following. It is easily proven by repeated application of Lemma
3.2 (see Problem 3.2).
lemma 3.3: Let fZ
N
: N ¼ 1; 2; g be a sequence of J ÂK matrices such that Z
N
¼
o
p
ð1Þ, and let fx
N
g be a sequence of J Â1 random vectors such that x
N
¼ O
p
ð1Þ.
Then Z
0
N
x
N
¼ o
p
ð1Þ.
The next lemma is known as Slutsky’s theorem.

lemma 3.4: Let g: R
K
! R
J
be a function continuous at some point c A R
K
. Let
fx
N
: N ¼ 1; 2; g be sequence of K Â 1 random vectors such that x
N
!
p
c. Then
gðx
N
Þ!
p
gðcÞ as N ! y. In other words,
plim gðx
N
Þ¼gðplim x
N
Þð3:1Þ
if gðÁÞ is continuous at plim x
N
.
Slutsky’s theorem is perhaps the most useful feature of the plim operator: it shows
that the plim passes through nonlinear functions, provided they are continuous. The
expectations operator does not have this feature, and this lack makes finite samp le

analysis di‰cult for many estimators. Lemma 3.4 shows that plims behave just like
regular limits when applying a continuous function to the sequence.
definition 3.5: Let ðW; F; PÞ be a probability space. A sequence of events fW
N
:
N ¼ 1; 2; gH F is said to occur with probability approaching one (w.p.a.1) if and
only if PðW
N
Þ!1asN ! y.
Definition 3.5 allows that W
c
N
, the complement of W
N
, can occur for each N, but its
chance of occuring goes to zero as N ! y.
corollary 3.1: Let fZ
N
: N ¼ 1; 2; g be a sequence of random K Â K matrices,
and let A be a nonrandom, invertib le K ÂK matrix. If Z
N
!
p
A then
Basic Asymptotic Theory 37
(1) Z
À1
N
exists w.p.a.1;
(2) Z

À1
N
!
p
A
À1
or plim Z
À1
N
¼ A
À1
(in an appropriate sense).
Proof: Because the determinant is a continuous function on the space of all square
matrices, detðZ
N
Þ!
p
detðAÞ. Because A is nonsingular, detðAÞ0 0. Therefore, it
follows that P½detðZ
N
Þ0 0!1asN ! y. This completes the proof of part 1.
Part 2 requires a convention about how to define Z
À1
N
when Z
N
is nonsingular. Let
W
N
be the set of o (outcomes) such that Z

N
ðoÞ is nonsingular for o A W
N
; we just
showed that PðW
N
Þ!1asN ! y. Define a new sequence of matrices by
~
ZZ
N
ðoÞ1 Z
N
ðoÞ when o A W
N
;
~
ZZ
N
ðoÞ1 I
K
when o B W
N
Then Pð
~
ZZ
N
¼ Z
N
Þ¼PðW
N

Þ!1asN ! y. Then, because Z
N
!
p
A,
~
ZZ
N
!
p
A. The
inverse operator is continuous on the space of invertible matrices, so
~
ZZ
À1
N
!
p
A
À1
.
This is what we mean by Z
À1
N
!
p
A
À1
; the fact that Z
N

can be singular with vanishing
probability does not a¤ect asymptotic analysis.
3.3 Convergence in Distribution
definition 3.6: A sequence of random variables fx
N
: N ¼ 1; 2; g converges in
distribution to the continuous random variable x if and only if
F
N
ðxÞ!F ðxÞ as N ! y for all x A R
where F
N
is the cumulative distribution function (c.d.f.) of x
N
and F is the (continu-
ous) c.d.f. of x. We write x
N
!
d
x.
When x @ Normalðm; s
2
Þ we write x
N
!
d
Normalðm; s
2
Þ or x
N

@
a
Normalðm; s
2
Þ
(x
N
is asymptotically normal ).
In Definition 3.6, x
N
is not required to be continuous for any N. A good example
of where x
N
is discrete for all N but has an asymptotically normal distribution is
the Demoivre-Laplace theorem (a special case of the central limit theorem given in
Section 3.4), which says that x
N
1 ðs
N
À NpÞ=½Npð1 À pÞ
1=2
has a limiting standard
normal distribution, where s
N
has the binomial ðN; pÞ distribution.
definition 3.7: A sequence of K Â 1 random vectors fx
N
: N ¼ 1; 2; g converges
in distribution to the continuous random vector x if and only if for any K Â 1 non-
random vector c such that c

0
c ¼ 1, c
0
x
N
!
d
c
0
x, and we write x
N
!
d
x.
When x @ Normalðm; VÞ the requirement in Definition 3.7 is that c
0
x
N
!
d
Normalðc
0
m; c
0
VcÞ for every c A R
K
such that c
0
c ¼ 1; in this case we write x
N

!
d
Normalðm; VÞ or x
N
@
a
Normalðm; VÞ. For the derivations in this book, m ¼ 0.
Chapter 338
lemma 3.5: If x
N
!
d
x, where x is any K Â1 random vector, then x
N
¼ O
p
ð1Þ.
As we will see throughout this book, Lemma 3.5 turns out to be very useful for
establishing that a sequence is bounded in probability. Often it is easiest to first verify
that a sequence converges in distribution.
lemma 3.6: Let fx
N
g be a sequence of K Â 1 random vectors such that x
N
!
d
x.If
g: R
K
! R

J
is a continuous function, then gðx
N
Þ!
d
gðxÞ.
The usefulness of Lemma 3.6, which is called the continuous mapping theorem,
cannot be overstated. It tells us that once we know the limiting distribution of x
N
,we
can find the limiting distribution of many interesting functions of x
N
. This is espe-
cially useful for determining the asymptotic distribution of test statistics once the
limiting distribution of an estim ator is known; see Section 3.5.
The continuity of g is not necessary in Lemma 3.6, but some restrictions are
needed. We will only need the form stated in Lemma 3.6.
corollary 3.2: If fz
N
g is a sequence of K Â 1 random vectors such that z
N
!
d
Normalð0; VÞ then
(1) For any K ÂM nonrandom matrix A, A
0
z
N
!
d

Normalð0; A
0
VAÞ.
(2) z
0
N
V
À1
z
N
!
d
w
2
K
(or z
0
N
V
À1
z
N
@
a
w
2
K
).
lemma 3.7: Let fx
N

g and fz
N
g be sequences of K Â1 random vectors. If z
N
!
d
z
and x
N
À z
N
!
p
0, then x
N
!
d
z.
Lemma 3.7 is called the asymptotic equivalence lemma. In Section 3.5.1 we discuss
generally how Lemma 3.7 is used in econometrics. We use the asymptotic equiva-
lence lemma so frequently in asymptotic analysis that after a while we will not even
mention that we are using it.
3.4 Limit Theorems for Random Samples
In this section we state two classic limit theorems for independent, identically dis-
tributed (i.i.d.) sequences of random vectors. These apply when sampling is done
randomly from a population.
theorem 3.1: Let fw
i
: i ¼ 1; 2; g be a sequence of independent, identically dis-
tributed G Â1 random vecto rs such that Eðjw

ig
jÞ < y, g ¼ 1; ; G. Then the
sequence satisfies the weak law of large numbers (WLLN): N
À1
P
N
i¼1
w
i
!
p
m
w
, where
m
w
1 Eðw
i
Þ.
Basic Asymptotic Theory 39
theorem 3.2 (Lindeberg-Levy): Let fw
i
: i ¼ 1; 2; g be a sequence of independent,
identically distributed G Â1 random vectors such that Eð w
2
ig
Þ < y, g ¼ 1; ; G, and
Eð w
i
Þ¼0. Then fw

i
: i ¼ 1; 2; g satisfies the central limit theorem (CLT); that is,
N
À1=2
X
N
i¼1
w
i
!
d
Normalð0; BÞ
where B ¼ Varðw
i
Þ¼Eðw
i
w
0
i
Þ is necessarily positive semidefinite. For our purposes,
B is almost always positive definite.
3.5 Limiting Behavior of Estimators and Test Statistics
In this section, we apply the previous concepts to sequences of estimators. Because
estimators depend on the random outcomes of data, they are properly viewed as
random vectors.
3.5.1 Asymptotic Properties of Estimators
definition 3.8: Let f
^
yy
N

: N ¼ 1; 2; g be a sequence of estimators of the P Â1
vector y A Y, where N indexes the sample size. If
^
yy
N
!
p
y ð3:2Þ
for any value of y, then we say
^
yy
N
is a consistent estimator of y.
Because there are other notions of convergence, in the th eoretical literature condi-
tion (3.2) is often referred to as weak consistency. This is the only kind of consistency
we will be concerned with, so we simply call condition (3.2) consistency. (See White,
1984, Chapter 2, for other kinds of convergence.) Since we do not know y, the con-
sistency definition requires condition (3.2) for any possible value of y.
definition 3.9: Let f
^
yy
N
: N ¼ 1; 2; g be a sequence of estimators of the P Â1
vector y A Y. Suppose that
ffiffiffiffiffi
N
p
ð
^
yy

N
À yÞ!
d
Normalð0; VÞð3:3Þ
where V is a P Â P positive semidefinite matrix. Then we say that
^
yy
N
is
ffiffiffiffi
N
p
-
asymptotically normally distributed and V is the asymptotic variance of
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ,
denoted Avar
ffiffiffiffiffi
N
p
ð
^
yy
N

À yÞ¼V.
Even though V=N ¼ Varð
^
yy
N
Þ holds only in special cases, and
^
yy
N
rarely has an exact
normal distribution, we treat
^
yy
N
as if
Chapter 340
^
yy
N
@ Normalðy; V=NÞð3:4Þ
whenever statement (3.3) holds. For this reason, V=N is called the asymptotic vari-
ance of
^
yy
N
, and we write
Avarð
^
yy
N

Þ¼V=N ð3:5Þ
However, the only sense in which
^
yy
N
is approximately normally distributed with
mean y and variance V=N is contained in statement (3.3), and this is what is needed
to perform inference about y. Statement (3.4) is a heuristic statement that leads to the
appropriate inference.
When we discuss consistent estimation of asymptotic variances—a topic that will
arise often—we should technically focus on estimation of V 1 Avar
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ.In
most cases, we will be able to find at least one, and usually more than one, consistent
estimator
^
VV
N
of V. Then the corresponding estimator of Avarð
^
yy
N
Þ is
^

VV
N
=N, and we
write
Av
^
aarð
^
yy
N
Þ¼
^
VV
N
=N ð3:6Þ
The division by N in equation (3.6) is practically very important. What we call the
asymptotic variance of
^
yy
N
is estimated as in equation (3.6). Unfortunately, there has
not been a consistent usage of the term ‘‘asymptotic variance’’ in econometrics.
Taken literally, a statement such as ‘‘
^
VV
N
=N is consistent for Avarð
^
yy
N

Þ’’ is not very
meaningful because V=N converges to 0 as N ! y; typically,
^
VV
N
=N !
p
0 whether
or not
^
VV
N
is not consistent for V. Nevertheless, it is useful to have an admittedly
imprecise shorthand. In what follows, if we say that ‘‘
^
VV
N
=N consistently estimates
Avarð
^
yy
N
Þ,’’ we mean that
^
VV
N
consistently estimates Avar
ffiffiffiffiffi
N
p

ð
^
yy
N
À yÞ.
definition 3.10: If
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ @
a
Normalð0; VÞ where V is positive definite with
jth diagonal v
jj
, and
^
VV
N
!
p
V, then the asymptotic standard error of
^
yy
Nj
, denoted
seð

^
yy
Nj
Þ,isð
^
vv
Njj
=NÞ
1=2
.
In other words, the asymptotic standard error of an estimator, which is almost
always reported in applied work, is the square root of the appropriate diagonal ele-
ment of
^
VV
N
=N. The asymptotic standard errors can be loosely thought of as estimating
the standard deviations of the elements of
^
yy
N
, and they are the appropriate quantities
to use when forming (asymptotic) t statistics and confidence intervals. Obtaining
valid asymptotic standard errors (after verifying that the estimator is asymptotically
normally distributed) is often the biggest challenge when using a new estimator.
If statement (3.3) holds, it follows by Lemma 3.5 that
ffiffiffiffiffi
N
p
ð

^
yy
N
À yÞ¼O
p
ð1Þ,or
^
yy
N
À y ¼ O
p
ðN
À1=2
Þ, and we say that
^
yy
N
is a
ffiffiffiffi
N
p
-consistent estimator of y.
ffiffiffiffiffi
N
p
-
Basic Asymptotic Theory 41
consistency certainly implies that plim
^
yy

N
¼ y, but it is much stronger because it tells
us that the rate of convergence is almost the square root of the sample size N:
^
yy
N
À y ¼ o
p
ðN
Àc
Þ for any 0 a c <
1
2
. In this book, almost every consistent estimator
we will study—and every one we consider in any detail—is
ffiffiffiffiffi
N
p
-asymptotically nor-
mal, and therefore
ffiffiffiffiffi
N
p
-consistent, under reasonable assumptions.
If one
ffiffiffiffiffi
N
p
-asymptotically normal estimator has an asymptotic variance that is
smaller than another’s asymptotic variance (in the matrix sense), it makes it easy to

choose between the estimators based on asymptotic considerations.
definition 3.11: Let
^
yy
N
and
~
yy
N
be estimators of y each satisfying statement (3.3),
with asymptotic variances V ¼ Avar
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ and D ¼ Avar
ffiffiffiffiffi
N
p
ð
~
yy
N
À yÞ (these
generally depend on the value of y, but we suppress that consideration here). (1)
^
yy

N
is
asymptotically e‰cient relative to
~
yy
N
if D ÀV is positive semidefinite for all y ; (2)
^
yy
N
and
~
yy
N
are
ffiffiffiffi
N
p
-equivalent if
ffiffiffiffiffi
N
p
ð
^
yy
N
À
~
yy
N

Þ¼o
p
ð1Þ.
When two estimators are
ffiffiffiffiffi
N
p
-equivalent, they have the same limiting distribution
(multivariate normal in this case, with the same asymptotic variance). This conclu-
sion follows immediately from the asymptotic equivalence lemma (Lemma 3.7).
Sometimes, to find the limiting distribution of, say,
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ, it is easiest to first
find the limiting distribution of
ffiffiffiffiffi
N
p
ð
~
yy
N
À yÞ, and then to show that
^
yy

N
and
~
yy
N
are
ffiffiffiffiffi
N
p
-equivalent. A good example of this approac h is in Chapter 7, where we find the
limiting distribution of the feasible generalized least squares estimator, after we have
found the limiting distribution of the GLS estimator.
definition 3.12: Partition
^
yy
N
satisfying statement (3.3) into vectors
^
yy
N1
and
^
yy
N2
.
Then
^
yy
N1
and

^
yy
N2
are asymptotically independent if
V ¼
V
1
0
0V
2

where V
1
is the asymptotic variance of
ffiffiffiffiffi
N
p
ð
^
yy
N1
À y
1
Þ and similarly for V
2
. In other
words, the asymptotic variance of
ffiffiffiffiffi
N
p

ð
^
yy
N
À yÞ is block diagonal.
Throughout this section we have been careful to index estimators by the sample
size, N. This is useful to fix ideas on the nature of asymptotic analysis, but it is cum-
bersome when applying asymptotics to particular estimation methods. After this
chapter, an estimator of y will be denoted
^
yy, which is understood to depend on the
sample size N. When we write, for example,
^
yy !
p
y, we mean convergence in proba-
bility as the sample size N goes to infinity.
Chapter 342
3.5.2 Asymptotic Properties of Test Statistics
We begin with some important definitions in the large-sample analysis of test statistics.
definition 3.13: (1) The asymptotic size of a testing procedure is defined as the
limiting probability of rejecting H
0
when it is true. Mathematically, we can write this
as lim
N!y
P
N
(reject H
0

jH
0
), where the N subscript indexes the sample size.
(2) A test is said to be consistent against the alternative H
1
if the null hypothesis
is rejected with probability approaching one when H
1
is true: lim
N!y
P
N
(reject
H
0
jH
1
Þ¼1.
In practice, the asymptotic size of a test is obtained by finding the limiting distribu-
tion of a test statistic—in our case, normal or chi-square, or simple modifications of
these that can be used as t distributed or F distributed—and then choosing a critical
value based on this distribution. Thus, testing using asymptotic methods is practically
the same as testing using the classical linear model.
A test is consistent against alternative H
1
if the probability of rejecting H
1
tends to
unity as the sample size grows without bound. Just as consistency of an estimator is a
minimal requirement, so is consistency of a test statistic. Consistency rarely allows us

to choose among tests: most tests are consistent against alternatives that they are
supposed to have power against. For consistent tests with the same asymptotic size,
we can use the notion of local power analysis to choose among tests. We will cover
this briefly in Chapter 12 on nonlinear estimation, where we introduce the notion of
local alternatives—that is, alternatives to H
0
that converge to H
0
at rate 1=
ffiffiffiffiffi
N
p
.
Generally, test statistics will have desirable asymptotic properties when they are
based on estimators with good asymptotic properties (such as e‰ciency).
We now derive the limiting distribution of a test statistic that is used very often in
econometrics.
lemma 3.8: Suppose that statement (3.3) holds, where V is positive definite. Then
for any nonstochastic matrix Q ÂP matrix R, Q a P, with rank ðR Þ¼Q ,
ffiffiffiffiffi
N
p

^
yy
N
À yÞ @
a
Normalð0; RVR
0

Þ
and
½
ffiffiffiffiffi
N
p

^
yy
N
À yÞ
0
½RVR
0

À1
½
ffiffiffiffiffi
N
p

^
yy
N
À yÞ @
a
w
2
Q
In addition, if plim

^
VV
N
¼ V then
½
ffiffiffiffiffi
N
p

^
yy
N
À yÞ
0
½R
^
VV
N
R
0

À1
½
ffiffiffiffiffi
N
p

^
yy
N

À yÞ
¼ð
^
yy
N
À yÞ
0
R
0
½Rð
^
VV
N
=NÞR
0

À1

^
yy
N
À yÞ @
a
w
2
Q
Basic Asymptotic Theory 43
For testing the null hypothesis H
0
: Ry ¼ r, where r is a Q Â1 nonrandom vector,

define the Wald statistic for testing H
0
against H
1
: Ry 0 r as
W
N
1 ðR
^
yy
N
À rÞ
0
½Rð
^
VV
N
=NÞR
0

À1
ðR
^
yy
N
À rÞð3:7Þ
Under H
0
, W
N

@
a
w
2
Q
. If we abuse the asymptotics and treat
^
yy
N
as being distributed
as Normalðy;
^
VV
N
=NÞ, we get equation (3.7) exactly.
lemma 3.9: Suppose that statement (3.3) holds, where V is positive definite. Let c: Y
! R
Q
be a continuously di¤erentiable function on the parameter space Y H R
P
,
where Q a P, and assume that y is in the interior of the parameter space. Define
CðyÞ1 ‘
y
cðyÞ as the Q Â P Jacobian of c. Then
ffiffiffiffiffi
N
p
½cð
^

yy
N
ÞÀcðyÞ @
a
Normal½0; CðyÞVCðyÞ
0
ð3:8Þ
and
f
ffiffiffiffiffi
N
p
½cð
^
yy
N
ÞÀcðyÞg
0
½CðyÞVCðyÞ
0

À1
f
ffiffiffiffiffi
N
p
½cð
^
yy
N

ÞÀcðyÞg @
a
w
2
Q
Define
^
CC
N
1 Cð
^
yy
N
Þ. Then plim
^
CC
N
¼ CðyÞ. If plim
^
VV
N
¼ V, then
f
ffiffiffiffiffi
N
p
½cð
^
yy
N

ÞÀcðyÞg
0
½
^
CC
N
^
VV
N
^
CC
0
N

À1
f
ffiffiffiffiffi
N
p
½cð
^
yy
N
ÞÀcðyÞg @
a
w
2
Q
ð3:9Þ
Equation (3.8) is very useful for obtaining asymptoti c stan dard errors for nonlin-

ear functions of
^
yy
N
. The appropriate estimator of Avar½cð
^
yy
N
Þ is
^
CC
N
ð
^
VV
N
=NÞ
^
CC
0
N
¼
^
CC
N
½Avarð
^
yy
N
Þ

^
CC
0
N
. Thus, once Avarð
^
yy
N
Þ and the estimated Jacobian of c are ob-
tained, we can easily obtain
Avar½cð
^
yy
N
Þ ¼
^
CC
N
½Avarð
^
yy
N
Þ
^
CC
0
N
ð3:10Þ
The asymptotic standard errors are obtained as the square roots of the diagonal
elements of equation (3.10). In the scalar case

^
gg
N
¼ cð
^
yy
N
Þ, the asymptotic standard
error of
^
gg
N
is ½‘
y

^
yy
N
Þ½Avarð
^
yy
N
Þ‘
y

^
yy
N
Þ
0


1=2
.
Equation (3.9) is useful for testing nonlinear hypotheses of the form H
0
: cðyÞ¼0
against H
1
: cðyÞ0 0. The Wald statistic is
W
N
¼
ffiffiffiffiffi
N
p

^
yy
N
Þ
0
½
^
CC
N
^
VV
N
^
CC

0
N

À1
ffiffiffiffiffi
N
p

^
yy
N
Þ¼cð
^
yy
N
Þ
0
½
^
CC
N
ð
^
VV
N
=NÞ
^
CC
0
N


À1

^
yy
N
Þð3:11Þ
Under H
0
, W
N
@
a
w
2
Q
.
The method of establishing equation (3.8), given that statement (3.3) holds, is often
called the delta method, and it is used very often in econometrics. It gets its name
from its use of calculus. The argument is as follows. Because y is in the interior of Y,
and because plim
^
yy
N
¼ y,
^
yy
N
is in an open, convex subset of Y containing y with
Chapter 344

probability approaching one, therefore w.p.a.1 we can use a mean value expansion

^
yy
N
Þ¼cðyÞþ

CC
N
Áð
^
yy
N
À yÞ, where

CC
N
denotes the matrix CðyÞ with rows eval-
uated at mean values between
^
yy
N
and y. Because these mean values are trapped be-
tween
^
yy
N
and y, they converge in probability to y. Therefore, by Slutsky’s theorem,

CC

N
!
p
CðyÞ, and we can write
ffiffiffiffiffi
N
p
½cð
^
yy
N
ÞÀcðyÞ ¼

CC
N
Á
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ
¼ CðyÞ
ffiffiffiffiffi
N
p
ð
^

yy
N
À yÞþ½

CC
N
À CðyÞ
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ
¼ CðyÞ
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞþo
p
ð1ÞÁO
p
ð1Þ¼CðyÞ
ffiffiffiffiffi
N

p
ð
^
yy
N
À yÞþo
p
ð1Þ
We can now apply the asymptotic equivalence lemma and Lemma 3.8 [with R 1
CðyÞ to get equation (3.8).
Problems
3.1. Prove Lemma 3.1.
3.2. Using Lemma 3.2, prove Lemma 3.3.
3.3. Explain why, under the assumptions of Lemma 3.4, gðx
N
Þ¼O
p
ð1Þ.
3.4. Prove Corollary 3.2.
3.5. Let fy
i
: i ¼ 1; 2; g be an independent, identically distributed sequence with
Eðy
2
i
Þ < y. Let m ¼ Eðy
i
Þ and s
2
¼ Varðy

i
Þ.
a. Let
y
N
denote the sample average based on a sample size of N. Find
Var½
ffiffiffiffiffi
N
p
ð
y
N
À mÞ.
b. What is the asymptotic variance of
ffiffiffiffiffi
N
p
ð
y
N
À mÞ?
c. What is the asymptotic variance of
y
N
? Compare this with Varðy
N
Þ.
d. What is the asymptotic standard deviation of
y

N
?
e. How would you obtain the asymptotic standard error of
y
N
?
3.6. Give a careful (albeit short) proof of the following statement: If
ffiffiffiffiffi
N
p
ð
^
yy
N
À yÞ¼
O
p
ð1Þ,then
^
yy
N
À y ¼ o
p
ðN
Àc
Þ for any 0 a c <
1
2
.
3.7. Let

^
yy be a
ffiffiffiffiffi
N
p
-asymptotically normal estimator for the scalar y > 0. Let
^
gg ¼ logð
^
yyÞ be an estimator of g ¼ logðyÞ.
a. Why is
^
gg a consistent estimator of g?
Basic Asymptotic Theory 45
b. Find the asymptotic variance of
ffiffiffiffiffi
N
p
ð
^
gg À gÞ in terms of the asymptotic variance of
ffiffiffiffiffi
N
p
ð
^
yy ÀyÞ.
c. Suppose that, for a sample of data,
^
yy ¼ 4 and seð

^
yyÞ¼2. What is
^
gg and its
(asymptotic) standard error?
d. Consider the null hypothesis H
0
: y ¼ 1 . What is the asymptotic t statistic for
testing H
0
, given the numbers from part c?
e. Now state H
0
from part d equivalently in terms of g, and use
^
gg and seð
^
ggÞ to test
H
0
. What do you conclude?
3.8. Let
^
yy ¼ð
^
yy
1
;
^
yy

2
Þ
0
be a
ffiffiffiffiffi
N
p
-asymptotically normal estimator for y ¼ðy
1
; y
2
Þ
0
,
with y
2
0 0. Let
^
gg ¼
^
yy
1
=
^
yy
2
be an estimator of g ¼ y
1
=y
2

.
a. Show that plim
^
gg ¼ g.
b. Find Avarð
^
ggÞ in terms of y and Avarð
^
yyÞ using the delta method.
c. If, for a sample of data,
^
yy ¼ðÀ1:5;:5Þ
0
and Avarð
^
yyÞ is estimated as
1 À:4
À:42

,
find the asymptotic standard error of
^
gg.
3.9. Let
^
yy and
~
yy be two consistent,
ffiffiffiffiffi
N

p
-asymptotically normal estimators of the
P Â1 parameter vector y, with Avar
ffiffiffiffiffi
N
p
ð
^
yy À yÞ¼V
1
and Avar
ffiffiffiffiffi
N
p
ð
~
yy À yÞ¼V
2
.
Define a Q  1 parameter vector by g ¼ gðyÞ, where gðÁÞ is a continuously di¤er-
entiable function. Show that, if
^
yy is asymptotically more e‰cient than
~
yy, then
^
gg 1

^
yyÞ is asymptoti cally e‰cient relative to

~
gg 1 gð
~
yyÞ.
Chapter 346

×