Tải bản đầy đủ (.pdf) (31 trang)

Handbook of Empirical Economics and Finance _14 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (764.41 KB, 31 trang )


P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
384 Handbook of Empirical Economics and Finance
Conditional on h, the MLE of ␦

= (␲


, ␳, ␤


)

is identical to the GLS,
ˆ


GLS
=

N

i=1

˜
X

i

∗−1



˜
X
i

−1

N

i=1

˜
X

i

∗−1
y

i

, (13.43)
where

˜
X
i
=

x



i
00


0

y

i−1
X
i

. (13.44)
When h is unknown, one can use a two-step procedure. In the first step, we
regress y
i1
on x

i
to obtain ˆ␴
2
v

and apply GMM to obtain ˆ␴
2

. In the second
step, we substitute estimated

ˆ
h for h in Equation 13.43. However, the feasible
GLS is not as efficient as GLS (for detail, see Hsiao, Pesaran, and Tahmiscoglu
2002).
13.7 Models with Both Individual- and Time-Specific
Additive Effects
When time-specific effects also appear in v
it
as in Equation 13.2, the estimators
ignoring the presence of ␭
t
like those discussed in Sections 13.13 to 13.6 are
no longer consistent when T is finite. For notational ease and without loss of
generality, we illustrate the fundamental issues of dynamic model with both
individual- and time-specific additive effects model by restricting ␤

= 0

in
Equation 13.1, thus the model becomes
y
it
= ␳y
i,t−1
+ v
it
, (13.45)
v
it
= ␣

i
+ ␭
t
+ ⑀
it
,i= 1, ,N,t= 1, ,T,y
i0
observable. (13.46)
The panel data estimators discussed in Sections 13.5 and 13.6 assume no
presence of ␭
t
(i.e., ␭
t
= 0∀t). When ␭
t
are indeed present, those estimators
are not consistent if T is finite when N →∞. For instance, the consistency
of GMM (Equation 13.33) is based on the assumption that
1
N

N
i=1
y
i,t−j
v
it
converges to the population moments (Equation 13.32). However, if ␭
t
are

also present as in Equation 13.46, this condition is likely to be violated. To see
this, taking first difference of Equation 13.45 yields
y
it
= ␳y
i,t−1
+ v
it
= ␳y
i,t−1
+ ␭
t
+ ⑀
it
, (13.47)
i = 1, ,N,
t = 2, ,T.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
Dynamic Panel Data Models 385
Although
E(y
i,t−j
v
it
) = 0 for j = 2, ,t, (13.48)
the sample moment, as N −→ ∞ ,
1
N

N

i=1
y
i,t−j
v
it
=
1
N
N

i=1
y
i,t−j
␭
t
+
1
N
N

i=1
y
i,t−j
⑀
it
(13.49)
converges to ¯y
t−j

␭
t
, which in general is not equal to zero, in particular, if y
it
has mean different from zero,
5
where ¯y
t
=
1
N

N
i=1
y
it
.
To obtain consistent estimators of ␳, we need to take explicit account of
the presence of ␭
t
in addition to ␣
i
.If␣
i
and ␭
t
are random and satisfy
Equation 13.4, because Ey
i0
v

it
= 0, we either have to write Equation 13.45
conditional on y
i0
or to complete the system (Equation 13.45) by deriving the
marginal distribution of y
i0
. By continuous substitutions, we have
y
i0
=
1 − ␳
m
1 − ␳

i
+
m−1

j=0

−j

j
+
m−1

j=0

i,−j


j
= v
i0
, (13.50)
assuming the process started at period −m.
Under Equation 13.4, Ey
i0
= Ev
i0
= 0, Var (y
i0
) = ␴
2
0
, E(v
i0
v
it
) =
1−␳
m
1−␳

2

=
c, Ev
it
v

jt
= d. Stacking the T + 1 time series observations for the ith indi-
vidual into a vector, y

i
= (y
i0
, ,y
iT
)

and y

i,−1
= (0,y
i0
, ,y
i,T−1
)

,v

i
=
(v
i0
, ,v
iT
)


.Let y

= (y


1
, ,y


N
)

,y

−1
= (y


1,−1
, ,y


N,−1
),v

= (v


1
, ,v



N
)

,
then
y

= y

−1
␳ + v

, (13.51)
Ev

= 0

,
Ev

v


= ␴
2

I
N



␻ 0


0

I
T

+ ␴
2

I
N


0 c

e


T
c

e

T
e


T
e


T

+␴
2

e

N
e

N


d

0


0

I
T

, (13.52)
␻ =


2
0
− d

2

,d

=
d

2

,c

=
c

2

, (13.53)
where ⊗ denotes the kronecker product. The system (Equation 13.51) has a
fixed number of unknowns (␳, ␴
2

, ␴
2

, ␴
2


, ␴
2
0
,c,d)asN and T increase. There-
fore, the MLE (or quasi-MLE or GLS) of Equation 13.51 is consistent and
asymptotically normally distributed.
5
For instance, if y
it
is also a function of exogenous variables as Equation 13.1.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
386 Handbook of Empirical Economics and Finance
When ␣
i
and ␭
t
are fixed constants, we note that first differencing only
eliminates ␣
i
from the specification. The time-specific effects, ␭
t
, remain
at Equation 13.47. To further eliminate ␭
t
, we note that the cross-sectional
mean y
t

=
1
N

N
i=1
y
it
is equal to
y
t
= ␳y
t−1
+ ␭
t
+ ⑀
t
, (13.54)
where ⑀
t
=
1
N

N
i=1
⑀
it
. Taking deviation of Equation 13.47 from Equa-
tion 13.54 yields

y

it
= ␳y

i,t−1
+ ⑀

it
,
i = 1, ,N,
t = 2, ,T, (13.55)
wherey

it
= (y
it
−y
t
) and ⑀

it
= (⑀
it
−⑀
t
).Thesystem (Equation 13.55)
no longer involves ␣
i
and ␭

t
.
Since
E[y
i,t−j
⑀

it
] = 0 for
j = 2, ,t,
t = 2, ,T,
(13.56)
the
1
2
T(T −1) orthogonality conditions can be represented as
E(W
i
˜⑀


i
) = 0

, (13.57)
where ˜⑀


i
= (⑀


i2
, , ⑀

iT
)

,
W
i
=










q

i2
0

··· 0

0


q

i3
··
.
.
.
.
.
.
.
.
.
0

0

q

iT











,i= 1, ,N,
and q

it
= (y
i0
,y
i1
, ,y
i,t−2
)

,t= 2, 3, ,T. Following Arellano and Bond
(1991), we can propose a generalized method of moments (GMM) estimator,
6
˜␳
GMM
=

1
N
N

i=1
˜y



i,−1
W


i

ˆ

−1

1
N
N

i=1
W
i
˜y


i,−1

−1

1
N
N

i=1
˜y




i,−1
W


i

ˆ

−1

1
N
N

i=1
W
i
˜y


i

, (13.58)
6
For ease of exposition, we have only considered the GMM that makes use of orthogonality
conditions. Foradditionalmoments conditions suchas homoscedasticity orinitial observations
see, e.g., Ahn and Schmidt (1995), Blundell and Bond (1998).

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013

Dynamic Panel Data Models 387
where ˜y


i
= (y

i2
, , y

iT
)

, ˜y


i−1
= (y

i1
, , y

i,T−1
)

, and
ˆ
 =
1
N

2

N

i=1
W
i
ˆ
˜⑀


i

N

i=1
W
i
ˆ
˜⑀


i


(13.59)
and 
ˆ
˜⑀



i
= ˜y


i
− ˜y


i,−1
˜␳, and ˜␳ denotes some initial consistent estimator of ␳,
say a simple instrumental variable estimator.
The asymptotic covariance matrix of ˜␳
GMM
can be approximated by
asy. cov (˜␳
GMM
) =

N

i=1
˜y



i,−1
W
i


ˆ

−1

N

i=1
W
i
˜y


i,−1

−1
. (13.60)
To implement the likelihood approach, we need to complete the system
(Equation 13.55) by deriving the marginal distribution of y

i1
through con-
tinuous substitution,
y

i1
=
m−1

j=0
⑀


i,1−j

j
= ˜⑀

i1
,i= 1, ,N. (13.61)
Let y


i
= (y

i1
, , y

iT
), y


i
= (0, , y

i,T−1
), ˜⑀



i

= (˜⑀

i1
, , ⑀

iT
),
the system
y


i
= y


i,−1
␳ + ˜⑀


i
, (13.62)
does not involve ␣
i
and ␭
t
. The MLE conditional on ␻ =
Var (y

i1
)


2

is identical
to the GLS
ˆ␳
GLS
=

N

i=1
y



i,−1
˜
A
−1
y


i,−1

−1

N

i=1

y



i,−1
˜
A
−1
y


i

. (13.63)
where
˜
A =








␻ −10 0··· 00
−12−10··· · ·
0 −12−1 ··· · ·
·····2 −1
0 ··· −12









. (13.64)
The GLS isconsistent and asymptotically normally distributed with covari-
ance matrix equal to
Var(ˆ␳
GLS
) = ␴
2


N

i=1
y



i,−1
˜
A
−1
y



i,−1

−1
. (13.65)

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
388 Handbook of Empirical Economics and Finance
Remark 13.7 The GLS with ␭

present is basically of the same form as
the GLS without the time-specific effects (i.e., ␭

= 0

) (Hsiao, Pesaran, and
Tahmiscioglu 2002), (Equation 13.25). However, there is an important dif-
ference between the two. The estimator (Equation 13.63) uses y

i,t−1
as the
regressor for the equation y

it
(Equation 13.62), not uses y
i,t−1
as the regres-
sor for the equation y
it

(Equation 13.47). If there are indeed common shocks
that affect all the cross-sectional units, then the estimator Equation 13.25
is inconsistent while Equation 13.63 is consistent (for detail, see Hsiao and
Tahmiscioglu 2008). Note also that even though when there are no time-
specific effects, Equation 13.63 remains consistent, although it will not be
as efficient as Equation 13.25.
Remark13.8 Theestimator(Equation13.63)andtheestimatorEquation13.58
remain consistent and asymptotically normally distributed when the effects
are random because the transformation (Equation 13.54) effectively removes
the individual- and time-specific effects from the specification. However, if
the effects are indeed random,then the MLE or GLS of Equation 13.51 is more
efficient.
Remark 13.9 The GLS (Equation 13.63) assumes known ␻.If␻ is unknown,
one may substitute it by a consistent estimator ˆ␻, then apply the feasible
GLS. However, there is an important difference between the GLS and the
feasible GLS in a dynamic setting. The feasible GLS is not asymptotically
equivalent to the GLS when T is finite. However, if both N and T →∞and
lim (
N
T
) = c > 0, then the FGLS will be asymptotically equivalent to the GLS.
(Hsiao and Tahmiscioglu 2008).
Remark 13.10 The MLE or GLS of Equation 13.63 can also be derived by
treating ␭
t
as fixed parameters in the system (Equation 13.47). Through
continuous substitution, we have
y
i1
= ␭


1
+ ˜⑀
i1
, (13.66)
where ␭

1
=

m
j=0

j
␭
1−j
and ˜⑀
i1
=

m
j=0

j
⑀
i,1−j
. Let y


i

= (y
i1
, ,
y
iT
), y


i,−1
= (0, y
i1
, , y
i,T−1
), ⑀


i
= (˜⑀
i1
, , ⑀
iT
), and ␭


=
(␭

1
, ␭
2

, , ␭
T
), we may write
y

=
NT × 1




y

1
.
.
.
y

N




=




y


1,−1
.
.
.
y

N,−1




␳ + (e

N
⊗ I
T
)␭

+




⑀

1
.
.
.

⑀

N




= y

−1
␳ + (e

N
⊗ I
T
)␭

+ ⑀

, (13.67)
If ⑀
it
is i.i.d. normal with mean 0 and variance ␴
2

, then ⑀


i
is independently

normally distributed across i with mean 0

and covariance matrix ␴
2

˜
A, and
␻ =
Var (˜⑀
i1
)

2

.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
Dynamic Panel Data Models 389
The log-likelihood function of y

takes the form
log L =−
NT
2
log ␴
2


N

2
log |
˜
A|−
1
2␴
2

[y

− y

−1
␳ − (e
N
⊗ I
T
)␭

]

(I
N

˜
A
−1
)[y

− y


−1
␳ − (e

N
⊗ I
T
)␭

]. (13.68)
Taking partial derivative of Equation 13.68 with respect to ␭

and solving for
␭

yields
ˆ␭

= (N
−1
e


N
⊗ I
T
)(y

− y


−1
␳). (13.69)
Substituting Equation 13.69 into Equation 13.68 yields the concentrated log-
likelihood function.
log L
c
=−
NT
2
log ␴
2


N
2
log |
˜
A|

1
2␴
2

(y


− y


−1

␳)

(I
N

˜
A
−1
)(y


− y

−1
␳). (13.70)
Maximizing Equation 13.69 conditional on ␻ yields Equation 13.63.
Remark 13.11 When ␳ approaches to 1 and ␴
2

is large relative to ␴
2

, the
GMM estimator of the form (Equation 13.68) suffers from the weak instru-
mental variables issues and performs poorly (e.g., Binder, Hsiao, and Pesaran
2005). On the other hand, the performance of the likelihood or GLS estimator
(Equation 13.63) is not affected by these problems.
Remark 13.12 Hahn and Moon (2006) propose a bias corrected estimator as
˜␳
b

= ˜␳

cv
+
1
T
(1 + ˜␳

cv
). (13.71)
They show that when N/T → c, as both N and T tend to infinity where
0 < c < ∞,

NT(˜␳
b
− ␳) ⇒ N(0, 1 − ␳
2
). (13.72)
The limited Monte Carlo studies conducted by Hsiao and Tahmiscioglu
(2008) to investigate the finite sample properties of the feasible GLS (FGLS),
GMM, bias corrected (BC) estimator of Hahn and Moon (2006) have shown
that in terms of bias and root mean square errors, FGLS dominates. However,
the BC rapidly improves as T increase. In terms of the closeness of actual
size to the nominal size, again FGLS dominates and rapidly approaches the
nominal size when N or T increases. The GMM also has actual sizes close
to nominal sizes except for the cases when ␳ is close to unity (here ␳ = 0.8).
The BC has significant size distortion, presumably because of the correction
of bias being based on ˆ␳

cv

and the use of asymptotic covariance matrix which
is significantly downward biased in finite sample.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
390 Handbook of Empirical Economics and Finance
Remark 13.13 Hsiao and Tahmiscioglu (2008) also compared the FGLS and
GMM with and without the correction of time-specific effects in the presence
of both individual- and time-specific effects or in the presence of individual-
specific effects only. It is interesting to note that when both individual- and
time-specific effects are present, the biases and root mean squares errors are
largeforestimatorsassumingno time-specific effects. On the other hand, even
in the case of no time-specific effects in the true data generating process, there
is hardly any efficiency loss for the FGLS or GMM that makes the correction
of presumed presence of time-specific effects. Therefore, if an investigator is
not sure if the assumption of cross-sectional independence is valid or not, it
might be advisable to use estimators that take account both individual- and
time-specific effects.
13.8 Estimation of Multiplicative Models
In this section we consider the estimation of Equation 13.1, where v
it
is as-
sumed to be of the form
v
it
= ␣
i

t
+ ⑀

it
. (13.73)
When ␣
i
is independently distributed across i with mean 0 and variance ␴
2

and␭
t
isindependentlydistributed overt withmean0 andvariance␴
2

,Ev
it
=
0,Ev
2
it
= ␴
2

+ ␴
2


2

= ␴
2
v

, and Ev
it
v
is
= 0 for t = s, Ev
it
v
js
= 0 for i = j.In
other words, Equation 13.1 has error terms that are uncorrelated over time
and across individuals, with constant variance ␴
2
v
. Hence the least squares
estimator is consistent and asymptotically normally distributed either N or
T or both tend to infinity.
When ␣
i
and ␭
t
are treated as fixed constants, the MLE are inconsistent if
T is finite for the same basic reason as the additive model (Equation 13.2).
Ahn, Lee, and Schmidt (2001), Bai (2007), Kiefer (1980), etc., have proposed a
nonlinear GMM and iterative LS estimators for the static model with multi-
plicative effects. Their nonlinear GMM approach can be similarly generalized
to obtain a consistent estimator of ␳ (e.g., Hsiao 2008).
Let ␪
t
= ␭
t

/␭
t−1
, then
(y
it
− ␪
t
y
i,t−1
) = ␳(y
i,t−1
− ␪
t
y
i,t−2
) + (⑀
it
− ␪
t

i,t−1
),t= 2, ,T. (13.74)
It follows that
E[y
i,t−j
(⑀
it
− ␪
t


i,t−1
)] = 0, for j = 2, ,t. (13.75)

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
Dynamic Panel Data Models 391
Let
W
i
=
T(T −1)
2
× (T −1)








q

i2
0

·· 0

0


q

i3
·· 0

0

0

·· ·
·····
0

···q

iT








,
 =
(T −1) × (T −1)







2
0 ·· 0
0 ␪
3
·· ·
·····
····␪
T





,
q


it
= (y
i0
, ,y
i,t−2
),t= 2, ,T,


i
= (⑀

i2
, , ⑀
iT
)

, ⑀

i,−1
= (⑀
i1
, , ⑀
i,T−1
)

.
Then a GMM estimator of ␳ and  can be obtained from the moment condi-
tions
E[W
i
(⑀

i
− ⑀

i,−1
)] = 0

. (13.76)
The nonlinear GMM estimators of ␳ and  amount to applying nonlinear
three-stage least squares to the system

y

i
= [␳I
T−1
+ ]y

i,−1
− ␳y

i,−2
+ ⑀

i
− ⑀

i,−1
,i= 1, ,N, (13.77)
using W
i
as instruments, where y

i
= (y
i2
, ,y
iT
)

,y


i,−1
= (y
i1
, ,y
i,T−1
)

,
and y

i,−2
= (y
i0
, ,y
i,T−2
)

.
ThenonlinearGMMestimatorsof␳ and ␪
t
areconsistentandasymptotically
normally distributed as N →∞. From the ␪
t
, we can solve for ␭
t
through the
normalization rule ␭
1
= 1or


T
t=1

2
t
= 1. From ␳ and ␭
t
, we obtain
ˆ␣
i
=
1
T

t=1
ˆ␭
2
t

T

t=1
ˆ␭
t
y
it
− ˆ␳
T


t=1
ˆ␭
t
y
i,t−1

,i= 1, ,N. (13.78)
The estimator (Equation 13.78) is consistent if T →∞.
The implementationofnonlinear GMM is quitecomplicated,Pesaran (2006,
2007) notes that
¯y
t
= ␳ ¯y
t−1
+ ¯␣␭
t
+ ¯⑀
t
, (13.79)
where
¯y
t
=
1
N
N

i=1
y
it

, ¯␣ =
1
N
N

i=1

i
, ¯⑀
t
=
1
N
N

i=1

it
.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
392 Handbook of Empirical Economics and Finance
When N →∞, ¯⑀
t
−→ 0. Assuming ¯␣ = 0, substituting ␭
t
= ¯␣
−1
(¯y

t
− ␳ ¯y
t−1
)
into Equation 13.45 yields,
y
it
= ␳y
i,t−1
+ ␥
1i
¯y
t
+ ␥
2i
¯y
t−1
+ ⑀
it
(13.80)
Therefore, Pesaran (2006, 2007) suggests estimating the cross-sectional mean
augment regression (Equation 13.80) and shows that as both N and T →∞,
the least squares estimator of Equation 13.80 yields consistent and asymptot-
ically normally distributed ˆ␳.
13.9 Test of Additive versus Multiplicative Model
Multiplicative model implies departure from additivity in their effects on
outcomes. It is shown by Bai (2007) that the additive model is embedded
into the model of multiple common factors with heterogeneous response by
letting



i
=


i
1

, ␭

t
=

1

t

,
then Equation 13.2 becomes
v
it
= ␣


i


t
+ ⑀
it

. (13.81)
When N −→ ∞ , one may solve ␭

t
from Equation 13.79 that yields
ˆ


t
= (¯␣

¯␣


)

¯␣


t
− ␳ ¯y
t−1
), (13.82)
where ( ¯␣

¯␣


)


denotes the generalized inverse of ( ¯␣

¯␣


). Substituting Equa-
tion 13.82 into Equation 13.45 again yields Equation 13.80. Therefore, the
Pesaran cross-sectional mean augmented regression of Equation 13.80 is con-
sistent whether the unobserved heterogeneity is additive or multiplicative,
but Equation 13.80is inefficient if the unobserved heterogeneities are additive
compared to Equation 13.58 or Equation 13.63. However, if the underlying
model is multiplicative, Equation 13.80 is consistent, butnotEquation13.58or
Equation 13.63. Therefore, a Hausman type specification test can be proposed
to test the null:
H
0
: Equation 13.2 holds
versus
H
1
: Equation 13.2 does not hold
by considering the test statistic
ˆ␳
A
− ˆ␳
m

Var ( ˆ␳
m
) − Var(ˆ␳

A
)
∼ N(0, 1), (13.83)

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
Dynamic Panel Data Models 393
where ˆ␳
A
denotes the efficient estimator of Equation 13.1 under the additive
assumption (Equation 13.2) and ˆ␳
m
is the estimator (Equation 13.1) under the
multiplicative assumption (Equation 13.73).
13.10 Concluding Remarks
In this chapter we review three fundamental issues of modeling dynamic
paneldatain thepresenceof unobservedheterogeneityacrossindividualsand
over time—the fixed effects of modeling unobserved individual- and time-
specific heterogeneity versus random effects; additive versus multiplicative
effects and the likelihood versus methods of moments approach.
Wehavenotdiscussedissues ofmodeling multivariatedynamicpanel mod-
els (e.g., Binder, Hsiao, and Pesaran (2005), panel unit root tests (e.g., Breitung
and Pesaran 2008; Moon and Perron 2004; Phillips and Sul 2003); parameter
heterogeneity (e.g., Hsiao and Pesaran 2008), etc. However, inprinciple, those
issues can also be put in these perspectives.
The advantage of the fixed effects specification is that there is no need to
specify the relations between the unobserved effects and observed condi-
tional (or explanatory) variables. The disadvantages are that (1) unless both
cross-sectionaldimensionandtime dimension of panels arelarge, the fixed ef-
fects specification introduces incidental parameters issues on the individual-

specific effects, ␣
i
, if the time dimension is fixed and on the time-specific ef-
fects, ␭
t
if the cross-sectional dimension is small; (2) the impact of
time-invariant but individual-specific variables such as gender or socio-
demographic background variables with the presence of additive individual-
specific effects and the impact of time-specific but individual invariant such
as price and some macro-variables with the presence of additive time-specific
effects are unidentified; and (3) the fixed effects inference only makes use of
within-group variation. The between group information is ignored.
The advantages of random-effects specification are (1) there are no inciden-
tal parameter issues; (2) the impacts of observed individual-specific but time-
invariant and individual-invariant but time-varying variables can be iden-
tified; (3) both the within-group and between group information are used
for inference. Since the between group variation in general is much larger
than the within group variation, the RE specification can lead to much more
efficient use of sample information. The disadvantage is that the relationship
between the unobserved effects and observed conditional variables need to
be specified. In short, the advantages of random effects specification are the
disadvantage of fixed effects specification and the advantages of fixed effects
specification are the disadvantages of random effects specification.
Statistical inference procedures for additive effects models are simpler than
themultiplicativeeffects models.However,ifthe data generatingprocesscalls
for a multiplicative effects specification, statistical inference proceduresbased
on additive effects specification will be misleading. On the other hand, if the

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013

394 Handbook of Empirical Economics and Finance
effects are additive, statistical procedures based on multiplicative effects will
also be misleading. In this chapter, we have proposed a testing procedure for
additive versus multiplicative effects.
Inference procedures based on the likelihood and moments approaches are
reviewed. The likelihood approach uses a fixed number of moment condi-
tions. The moment conditions used in the moments approach increase at the
order of square of time series dimension of the panel. In finite sample the mo-
ments approach is likely to generate larger bias than the likelihood approach
as shown in the Monte Carlo by Binder, Hsiao and Pesaran (2005), Hsiao and
Tahmiscioglu (2008), Hsiao, Pesaran, and Tahmiscioglu (2002), Ziliak (1997),
etc. Moreover, if the observed outcomes in the time dimension is persistent
(when the coefficient of lagged variables, ␳, is close to one) or if the variance
of individual-specific effects is large relative to overall variance, the moments
approach either breaks down or suffers from the weak instrumental variables
issue, but the performance of the likelihood approach is not affected.
13.11 Acknowledgment
I would like to thank a referee for helpful comments.
References
Ahn, S. C., and P. Schmidt. 1995. Efficient Estimation of Models for Dynamic Panel
Data. Journal of Econometrics 68:5–27.
Ahn, S. G., Y. H. Lee, and P. Schmidt. 2001. GMM Estimation of Linear Panel Data
Models with Time-Varying Individual Effects. Journal of Econometrics 101:219–
255.
Amemiya, T., and W. A. Fuller. 1967. A Comparative Study of Alternative Estimators
in a Distributed-Lag Model. Econometrica 35:509–529.
Amemiya, T., and T. E. MaCurdy. 1986. Instrumental-Variable Estimation of an Error-
Components Model. Econometrica 54, 869–880.
Anderson, T. W. and C. Hsiao. 1981. Estimation of Dynamic Models with Error Com-
ponents. Journal of American Statistical Association 76:598–606.

Anderson, T. W., and C. Hsiao. 1982. Formulation and Estimation of Dynamic Models
Using Panel Data. Journal of Econometrics 18:47–82.
Arellano, M., and S. R. Bond. 1991. Some Tests of Specification for Panel Data: Monte
CarloEvidence andanApplication toEmploymentEquations. Reviewof Economic
Studies 58:277–297.
Arellano,M.,and O.Bover. 1995.AnotherLook attheInstrumental VariableEstimation
of Error-Components Models. Journal of Econometrics 68:29–51.
Bai, J. 2009. Panel Data Models with Interactive Fixed Effects. Econometrica, 77, 1229–
1279.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
Dynamic Panel Data Models 395
Bhargava, A., and D. Sargan. 1983. Estimating Dynamic Random Effects Models from
Panel Data Covering Short Time Periods. Econometrica 51:1635–1659.
Binder, M., C. Hsiao, and M. H. Pesaran. 2005. Estimation and Inference in Short Panel
Vector Autoregressions with Unit Roots and Cointegration. Econometric Theory
21:795–837.
Blundell,R., andS.Bond. 1998.InitialConditions andMomentRestrictions inDynamic
Panel Data Models. Journal of Econometrics 87:115–143.
Breitung, J., and M. H. Pesaran. 2008. Unit Roots and Cointegration in Panels. In The
Econometrics of Panel Data. Berlin: Springer. pp. 279–322.
Breusch, T., G. E. Mizon, and P. Schmidt. 1989. Efficient Estimation Using Panel Data.
Econometrica 57:695–700.
Cheng, L. K., and Y. K. Kwan. 2000. The Location of Foreign Direct Investment in
Chinese Regions – Further Analysis of Labor Quality. In The Role of Foreign Direct
Investment in East Asian Economic Development. T. Ito and A.O. Krueger (eds).
Chicago: Chicago University Press. pp. 213–238.
Hahn, J., and G. Kuersteiner. 2002. Asymptotically Unbiased Inference for a Dynamic
Panel Model with Fixed Effects When Both n and T are Large. Econometrica

70:1639–1659.
Hahn, J., and H. R. Moon. 2006. Reducing Bias of MLE in a Dynamic Panel Model.
Econometric Theory 22:499–512.
Harris, M. N., L. Matyas, and P. Sevestre. 2008. Dynamic Models for Short Panels. In
The Econometrics of Panel Data. 3rd ed. L. Matyas and P. Sevestre (eds). Berlin:
Springer. pp. 249–278.
Hayakawa, K. 2009. On the Effect of Mean-Nonstationary Initial Conditions in Dy-
namic Panel Data Models. Journal of Econometrics (forthcoming).
Hsiao, C. 2003. Analysis of Panel Data. 2nd ed. Econometric Society Monograph 36,
New York: Cambridge University Press.
Hsiao, C. 2007. Panel Data Analysis – Advantages and Challenges. Test 16:1–22.
Hsiao, C. 2008. Dynamic Panel Data Models with Interactive Effects. mimeo.
Hsiao, C., and M. H. Pesaran. 2008. Random Coefficients Models. The Economet-
rics of Panel Data, 3rd ed. L. Matayas and P. Sevestre (eds). Berlin: Springer.
pp. 187–216.
Hsiao, C., M. H. Pesaran, and A. K. Tahmiscioglu. 2002. Maximum Likelihood Estima-
tion of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods.
Journal of Econometrics 109:107–150.
Hsiao, C., and A. K. Tahmiscioglu. 2008. Estimation of Dynamic Panel Data Models
with Both Individual- and Time-Specific Effects. Statistical Planning and Statistics
Inference 138:2698–2721.
Kiefer, N. 1980. Estimation of Fixed Effect Models for Time Series of Cross-Sections
with Arbitrary Intertemporal Covariance. Journal of Econometrics 14:195–202.
Moon, H. R. and B. Perron. 2004. Testing for a Unit Root in Panels with Dynamic
Factors. Journal of Econometrics 122:81–126.
Nerlove, M.2002. Essays inPanel Data Econometrics.Cambridge: Cambridge University
Press.
Neyman, J., and E. Scott. 1948. Consistent Estimates Based on Partially Consistent
Observations. Econometrica 16:1–32.
Pesaran, M. H. 2006. Estimation and Inference in Large Heterogeneous Panels with

Cross-Section Dependence. Econometrica 74:967–1012.
Pesaran, M. H. 2007. A Simple Panel Unit Root Test in the Presence of Cross-Section
Dependence. Journal of Applied Econometrics 22:265–312.

P1: BINAYA KUMAR DASH
November 3, 2010 16:25 C7035 C7035˙C013
396 Handbook of Empirical Economics and Finance
Phillips, P.C.B., and D. Sul. 2003. Dynamic Panel Estimation and Homogeneity Testing
Under Cross-Section Dependence. Econometrics Journal 6: 217–259.
Ziliak, J. P. 1997. Efficient Estimation with Panel Data When Instruments Are Prede-
termined: An Empirical Comparison of Moment-Condition Estimators, Journal
of Business and Economic Statistics 15:419–431.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
14
A Unified Estimation Approach for Spatial
Dynamic Panel Data Models: Stability,
Spatial Co-integration, and Explosive Roots
Lung-fei Lee and Jihai Yu
CONTENTS
14.1 Introduction 397
14.2 The Model 398
14.2.1 The DGP 398
14.2.2 Data Transformation 401
14.2.3 The Log-Likelihood Function 403
14.3 Asymptotic Properties of QMLE 404
14.3.1 Consistency 405
14.3.2 Asymptotic Distribution 407
14.3.3 Bias Correction 409

14.3.4 Testing 410
14.4 Monte Carlo Results 411
14.5 Conclusion 417
Appendices 418
References 432
14.1 Introduction
In recent decades, there is growing literature on the estimation of dynamic
panel data models (see Phillips and Moon 1999; Hahn and Kuersteiner 2002;
Alvarez and Arellano 2003; Hahn and Newey 2004, etc.). For the panel data
with spatial interactions, Kapoor, Kelejian, and Prucha (2007) extend the
asymptotic analysis of the method of moments estimators to a spatial panel
model with error components, where T is finite. Baltagi, Song, Jung, and
Koh (2007) consider the testing of spatial and serial dependence in an ex-
tended model, where serial correlation on each spatial unit over time and
spatial dependence across spatial units are allowed in the disturbances. Su
and Yang (2007) study the dynamic panel data with spatial error and random
397

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
398 Handbook of Empirical Economics and Finance
effects. These panel models specify the spatial correlation by including spa-
tially correlated disturbances but do not incorporate a spatial autoregres-
sive term in the regression equation. With large n and moderate or large T,
Korniotis (2005) studies a time-space recursive model where only an individ-
ual time lag and a spatial time lag are present but not a contemporaneous
spatial lag. A general model could be the spatial dynamic panel data (SDPD)
where a contemporaneous spatial lag is also included. Yu, de Jong, and Lee
(2007, 2008) and Yu and Lee (2010) study, respectively, the spatial cointegra-
tion, stable, and unit root SDPD models, where the individual time lag,spatial

time lag and contemporaneous spatial lag are all included.
When the SDPD model has time dummy effects, we might need to trans-
form the data to reduce the possible bias caused by the estimation of time
effects (see Lee and Yu, 2010a), especially, when n is proportional to T,orn
is small relative to T.Yu, de Jong, and Lee (2007) have a different bias cor-
rection procedures from that of the stable case in Yu, de Jong, and Lee (2008).
In this chapter, we propose a data transformation approach based on a spa-
tial difference operator, which can eliminate the time dummy effects as well
as possible unstable and/or explosive components. After the data transfor-
mation, we can estimate the model by the method of maximum likelihood
(ML) or quasi-maximum likelihood (QML) similar to Yu, de Jong, and Lee
(2008), where there are neither time dummy effects, nor unstable and explo-
sive components. We derive the asymptotics for the ML estimator (MLE) and
QML estimator (QMLE). We propose a bias correction procedure that can be
applied to different types of DGPs.
This chapter isorganized as follows. In Section 14.2, the model is presented.
We show that the stochastic process can be decomposed into stable, unstable
or explosive, and time components. A spatial difference operator motivated
by the spatial co-integration can provide a unified data transformation to
eliminate the time component and the possible unstable or explosive compo-
nents. We explain our method of estimation, which is a concentrated QML.
Section 14.3 establishes the consistency and asymptotic distribution of the
QMLE of the unified transformation approach. A bias correction procedure
is also proposed. A Monte Carlo study is conducted in Section 14.4 to inves-
tigate finite sample performance of the estimators under different DGPs, and
also the power of hypothesis testing of spatial co-integration using this uni-
fied approach. Section 14.5 concludes the chapter. Some useful lemmas and
proofs are collected in the appendices.
14.2 The Model
14.2.1 The DGP

Consider the general SDPD model:
Y
nt
= ␭
0
W
n
Y
nt
+␥
0
Y
n,t−1
+␳
0
W
n
Y
n,t−1
+X
nt

0
+c
n0
+␣
t0
l
n
+V

nt
,t= 1, 2, ,T,
(14.1)

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
A Unified Estimation Approach for Spatial Dynamic Panel Data Models 399
where Y
nt
= (y
1t
,y
2t
, ,y
nt
)

and V
nt
= (v
1t
,v
2t
, ,v
nt
)

are n ×1 column vec-
tors, and v
it

is i.i.d. acrossi and t withzero mean and variance ␴
2
0
. W
n
is an n×n
nonstochastic spatial weights matrix, X
nt
is an n × k matrix of nonstochastic
regressors, c
n0
is an n × 1 column vector of individual fixed effects, ␣
t0
is a
scalar of time effect, and l
n
is an n ×1 column vector of ones.
1
Therefore, the
total number of parameters in this model is equal to the sum of the number
of individuals n and the number of time periods T, plus the dimension of the
common parameters (␥, ␳, ␤

, ␭, ␴
2
)

which is k + 4. In practice, W
n
is usually

row-normalized with zero diagonals. A row-normalized W
n
has the property
W
n
l
n
= l
n
. The row-normalization of W
n
ensures that all the weights are be-
tween 0 and 1 and weighting operations can be interpreted as an average
of the neighboring values. In this chapter, the row-normalization feature is
imposed for our estimation approach.
Define S
n
(␭) = I
n
− ␭W
n
and S
n
≡ S
n
(␭
0
) = I
n
− ␭

0
W
n
. Then, presuming
that S
n
is invertible and denoting A
n
= S
−1
n
(␥
0
I
n
+ ␳
0
W
n
), Equation 14.1 can
be rewritten as
Y
nt
= A
n
Y
n,t−1
+ S
−1
n

X
nt

0
+ S
−1
n
c
n0
+ ␣
t0
S
−1
n
l
n
+ S
−1
n
V
nt
. (14.2)
In the SDPD model,whenalltheeigenvaluesof A
n
aresmallerthan1, we have
the stable case. When some eigenvalues of A
n
are equal to 1 but not allbeing 1,
we have the spatial co-integration case. When some of them are greater than
1, we have the explosive case. Let ϖ

n
= diag{ϖ
n1
, ϖ
n2
, , ϖ
nn
} be the n × n
diagonal eigenvalues matrix of W
n
such that W
n
= R
n
ϖ
n
R
−1
n
, where R
n
is the
corresponding eigenvector matrix. As A
n
= S
−1
n
(␥
0
I

n
+␳
0
W
n
),the eigenvalues
matrix of A
n
is D
n
= (I
n
− ␭
0
ϖ
n
)
−1
(␥
0
I
n
+ ␳
0
ϖ
n
) such that A
n
= R
n

D
n
R
−1
n
.
When W
n
is row-normalized, all the eigenvalues are less than or equal to 1 in
the absolute value, where it has definitely some eigenvalues being 1. Let m
n
be the number of unit eigenvalues of W
n
and let the first m
n
eigenvalues of W
n
be the unity.Hence, D
n
can be decomposed into two parts, one corresponding
to the unit eigenvalues of W
n
, and the other corresponding to the eigenvalues
of W
n
which are smaller than 1. Define J
n
= diag{1

m

n
, 0, ···, 0} with 1
m
n
being an m
n
× 1 vector of ones and
˜
D
n
= diag{0, ···, 0,d
n,m
n
+1
, ···,d
nn
},
where |d
ni
| < 1, for i = m
n
+ 1, ···,n,are assumed.
2
As J
n
·
˜
D
n
= 0,we

have A
h
n
= (

0
+␳
0
1−␭
0
)
h
R
n
J
n
R
−1
n
+ B
h
n
where B
h
n
= R
n
˜
D
h

n
R
−1
n
for any h = 1, 2, ···.
Hence, depending on the value of

0
+␳
0
1−␭
0
,wehave three cases. As |␭
0
| < 1,
which will be maintained under the Assumption 1 and 3 (see Section 14.3),
we have the stable case when ␥
0
+␳
0
+␭
0
< 1; the spatial co-integration case
when ␥
0
+␳
0
+␭
0
= 1 but ␥

0
= 1; and theexplosive case when ␥
0
+␳
0
+␭
0
> 1.
For the stable case, theratesof convergence of QMLEs are

nT,asshownin
Yu,deJong, and Lee (2008). For the spatial co-integration case where Y
nt
and
1
Due to the presence of fixed individual and time effects, the X
nt
will not include time invariant
or individual invariant regressors.
2
We note that d
ni
= (␥
0
+ ␳
0
ϖ
ni
)/(1 − ␭
0

ϖ
ni
). Hence, if ␥
0
+ ␭
0
+ ␳
0
< 1, we have d
ni
< 1as

ni
|≤1. Some additional conditions are needed to ensure that d
ni
> −1. See Appendix A.1.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
400 Handbook of Empirical Economics and Finance
W
n
Y
nt
are spatially co-integrated, Yu, de Jong, and Lee (2007) show that the
QMLEs for such a model are

nT consistent and asymptotically normal, but,
the presenceoftheunstablecomponentswillmaketheestimators’asymptotic
variance matrix singular. Consequently, a linear combination of the spatial

and dynamic effects estimates can converge at a higher rate.
3
In addition to
the above stable case and the spatial co-integration case, we may also have an
explosive case in the event that some eigenvalues of A
n
are greater than unity
in the absolute value.
4
In this chapter, we propose a unified transformation
approach that can be used to estimate all three cases, namely, stable, spatial
co-integrated, and explosive cases.
In earlier studies of the SDPD model, Yu, de Jong, and Lee (2007, 2008)
consider the QMLE of the model with only the individual fixed effects. Sub-
sequently, Lee and Yu (2010a) study the SDPD model with additional time
effect when the process is stable. They propose a data transformation based
on the deviation from cross-sectional mean, I
n

1
n
l
n
l

n
,toeliminatethe time ef-
fects. That approach may be applied to study the unstable SDPD models with
time effects but might not be able to eliminate unstable or explosive compo-
nents. In thischapter, we reportthe use of a spatial difference operator, I

n
−W
n
,
which may not only eliminate the time dummy effects, but also the possible
unstable or explosive components, generated from the spatial co-integration
or explosive roots. This implies that the spatial difference transformation can
be applied to DGPs with stability, spatial co-integration, or explosive roots.
The asymptotics of the resulting estimates can then be easily established for
these DGPs. Thus, the transformation I
n
− W
n
provides a unified estimation
procedure for SDPD models.
5
Denote W
u
n
= R
n
J
n
R
−1
n
. Then, for t ≥ 0, Y
nt
can be decomposed into a sum
of a possible stable part, a possible unstable or explosive part, and a time

effect part (see Appendix A.2 for proof)
Y
nt
= Y
u
nt
+ Y
s
nt
+ Y

nt
, (14.3)
3
When ␥
0
+ ␭
0
+ ␳
0
= 1 and ␥
0
= 1, the asymptotic properties of estimators are considered in
Yu and Lee (2010). The QML estimate of the dynamic coefficient is

nT
3
consistent and the
estimates of other parameters are


nT consistent, and they are all asymptotically normal. Also,
the sum of the contemporaneous and dynamic spatial effects will converge at

nT
3
rate.
4
For the autoregressive AR(1) process in time series, asymptotic properties of the ordinary least
square estimator have been investigated in White (1958, 1959), Anderson (1959), Nielsen (2001,
2005) and Phillips and Magdalinos (2007). For the SDPD due to its complexity, properties of a
possible QMLE have not been investigated.
5
We note that the spatial difference operator can be applied to cross-sectional units. However, its
function is different from the time difference operator for a time series. The spatial difference
operator does not eliminate pure time series unit root or explosive roots. Thus, the unified
approach cannot be applied to the pure unit root SDPD models in Yu and Lee (2010).

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
A Unified Estimation Approach for Spatial Dynamic Panel Data Models 401
where
Y
s
nt
=


h=0
B
h

n
S
−1
n
(c
n0
+ X
n,t−h

0
+ V
n,t−h
),
Y
u
nt
= W
u
n



0
+ ␳
0
1 − ␭
0

t+1
Y

n,−1
+
1
(1 − ␭
0
)

t

h=0


0
+ ␳
0
1 − ␭
0

h
(c
n0
+ X
n,t−h

0
+ V
n,t−h
)

,

Y

nt
=
1
(1 − ␭
0
)
l
n
t

h=0

t−h,0
(

0
+ ␳
0
1 − ␭
0
)
h
.
The Y
u
nt
can be an unstable component when


0
+␳
0
1−␭
0
= 1, which occurs when

0
+ ␳
0
+ ␭
0
= 1 and ␭
0
= 1. When ␥
0
+ ␳
0
+ ␭
0
> 1, it implies

0
+␳
0
1−␭
0
> 1 and,
hence, Y
u

nt
can be explosive. The Y

nt
can be rather complicated as it depends
on what exactly the time dummies represent. The Y
nt
can be explosive when

t0
represents some explosive functions of t, even when

0
+␳
0
1−␭
0
were smaller
than 1. Without a specific time structure for ␣
t0
,itisdesirable to eliminate
this component for the estimation. The Y
s
nt
can be a stable component unless

0
+␳
0
+␭

0
is much larger than 1. If the sum ␥
0
+␳
0
+␭
0
were too big, some
of the eigenvalues d
ni
in Y
s
nt
might become larger than 1.
14.2.2 Data Transformation
Both the deviation from the cross-sectional mean I
n

1
n
l
n
l

n
and the spatial
difference operator I
n
−W
n

can eliminate the Y

nt
component in Y
nt
. The trans-
formation I
n
− W
n
can be motivated via a feature of spatial co-integration
below. Because (I
n
− W
n
)l
n
= 0, (I
n
− W
n
)Y

nt
= 0. The (I
n
− W
n
)Y
nt

does not
involve time dummies. In addition, because W
u
n
= R
n
J
n
R
−1
n
,itfollows that
(I
n
− W
n
)W
u
n
= R
n
(I
n
− D
n
)J
n
R
−1
n

= 0, and (I
n
− W
n
)Y
u
nt
= 0. Therefore,
(I
n
−W
n
)Y
nt
= (I
n
−W
n
)Y
s
nt
. That is, the transformation I
n
−W
n
can eliminate
not only time dummies but also the unstable component. Therefore, after the
(I
n
− W

n
) transformation, we will end up with the following equation:
(I
n
− W
n
)Y
nt
= ␭
0
W
n
(I
n
− W
n
)Y
nt
+ ␥
0
(I
n
− W
n
)Y
n,t−1
+ ␳
0
W
n

(I
n
− W
n
)Y
n,t−1
+(I
n
− W
n
)X
nt

0
+ (I
n
− W
n
)c
n0
+ (I
n
− W
n
)V
nt
. (14.4)
The variance of (I
n
− W

n
)V
nt
is ␴
2
0

n
, where 
n
= (I
n
− W
n
)(I
n
− W
n
)

. This
transformed equation has less degrees of freedom than n. Denote the degree
of freedom of Equation 14.4 as n

. Then, n

is the rank of the variance matrix
of (I
n
− W

n
)V
nt
, which is the number of nonzero eigenvalues of 
n
. Hence,

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
402 Handbook of Empirical Economics and Finance
n

= n − m
n
is also the number of non-unit eigenvalues
6
of W
n
. Thus, the
transformed variables do not have time effects and are all stable even when

0
+ ␥
0
+ ␳
0
is equal to or greater than 1.
Let [F
n
,H

n
]betheorthonormal matrix of eigenvectors and 
n
be the diago-
nalmatrixof nonzeroeigenvaluesof 
n
suchthat
n
F
n
= F
n

n
and
n
H
n
= 0.
That is, the columns of F
n
consist of eigenvectors of nonzero eigenvalues and
those of H
n
are for zero-eigenvalues of 
n
. The F
n
is an n × n


matrix and 
n
is an n

× n

diagonal matrix. Denote W

n
= 
−1/2
n
F

n
W
n
F
n

1/2
n
which is an
n

× n

matrix. As is derived in Appendix A.3, we have
Y


nt
= ␭
0
W

n
Y

nt
+ ␥
0
Y

n,t−1
+ ␳
0
W

n
Y

n,t−1
+ X

nt

0
+ c

n0

+ V

nt
, (14.5)
where Y

nt
= 
−1/2
n
F

n
(I
n
−W
n
)Y
nt
and other variables are defined accordingly.
Note that this transformed Y

nt
is an n

dimensional vector. Thus, at each t,
after the removal of the time dummy variables as well as the unstable or
explosive components in Y
nt
, the remaining observations at period t have

only n

degrees of freedom. While the sum of the coefficients ␭
0
+ ␥
0
+ ␳
0
of
this transformed equation can be equal to or greater than 1, the eigenvalues of
W

n
are exactly those eigenvalues of W
n
not equal to the unity (see Appendix
A.4) but less than 1 in the absolute value. It follows that the eigenvalues of
A

n
= (I
n

−␭
0
W

n
)
−1

(␥
0
I
n

+␳
0
W

n
)are allless than 1 intheabsolutevalues even
when ␭
0
+␥
0
+␳
0
= 1 with |␭
0
| < 1 and |␥
0
| < 1. For the explosive case with

0
+␥
0
+␳
0
> 1, the eigenvalue of A


n
can be less than 1 only if

0
+␭
0
1−␥
0
<
1
ϖ
max 1
,
where ϖ
max 1
is the maximum positive eigenvalue of W
n
less than the unity
(see Appendix A.1). Hence, the transformed model (Equation 14.5) is a stable
one as long as ␭
0
+ ␥
0
+ ␳
0
is not much bigger than 1.
7
The transformation I
n
− W

n
for the case with ␥
0
+ ␳
0
+ ␭
0
= 1 but ␥
0
= 1
has an interpretation as a spatial co-integrating matrix for elements of Y
nt
.
Denote time difference as Y
nt
= Y
nt
− Y
n,t−1
. The reduced form Equation
14.2) implies that Y
nt
= (A
n
− I
n
)Y
n,t−1
+ S
−1

n
(X
nt

0
+c
n0
+ V
nt
+␣
t0
l
n
). For
the case␭
0
+␥
0
+␳
0
= 1 with␥
0
= 1, A
n
−I
n
= (I
n
−␭
0

W
n
)
−1
(␥
0
I
n
+␳
0
W
n
)−I
n
=
(1−␥
0
)(I
n
−␭
0
W
n
)
−1
(W
n
−I
n
).Hence, we have a vector error correction model

(VECM) representation of Equation 14.2 as
Y
nt
= (1 − ␥
0
)(I
n
− ␭
0
W
n
)
−1
(W
n
− I
n
)Y
n,t−1
+ S
−1
n
(X
nt

0
+ c
n0
+ V
nt

+ ␣
t0
l
n
).
The matrix I
n
− W
n
= R
n
(I
n
− ϖ
n
)R
−1
n
has its rank equal to the number of
eigenvalues of W
n
different from 1. With the VECM representation, one may
6
This is so, because (1) the set K
n
of eigenvectors corresponding to the zero eigenvalues of
(I
n
− W
n

)(I
n
− W
n
)

is the same as that of (I
n
− W
n
)

; (2) the dimension of K
n
is the number of
unit eigenvalues of W

n
; (3) W
n
= R
n
ϖ
n
R
−1
n
if and only if W

n

= R
−1
n
ϖ
n
R

n
, i.e., the eigenvalues
of W
n
and W

n
are the same.
7
Similar to Yu, de Jong, and Lee (2007) for the spatial co-integration case, we assume that the
eigenvalues of W
n
with their absolute values less than 1 are bounded away from 1 for all n.
Appendix A.1 provides sufficient conditions on the parameters of the model, which can imply
this regularity condition.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
A Unified Estimation Approach for Spatial Dynamic Panel Data Models 403
regard I
n
− W
n

as a co-integrating matrix with the co-integration rank as the
number of non-unit eigenvalues of W
n
. Hence, this transformation method
has exploited the spatial co-integration of Y
nt
’s for the estimation.
14.2.3 The Log-Likelihood Function
Suppose that V
nt
is normally distributed as N(0, ␴
2
0
I
n
), the transformed V

nt
in Equation 14.5 will be N(0, ␴
2
0
I
n

). Denote ␦ = (␥, ␳, ␤

)

, ␪ = (␦


, ␭)

and
S

n
(␭) = I
n

− ␭W

n
. The log-likelihood function for Y

nt
in Equation 14.5 is
ln L
n,T
(␪, c

n
) =−
n

T
2
ln 2␲ −
n

T

2
ln ␴
2
+ T ln |S

n
(␭)|

1
2␴
2
T

t=1
V
∗
nt
(␪, c

n
)V

nt
(␪, c

n
), (14.6)
where V

nt

(␪, c

n
) = S

n
(␭)Y

nt
− Z

nt
␦ − c

n
, Z

nt
= (Y

n,t−1
,W

n
Y

n,t−1
,X

nt

). In or-
der to use Equation 14.6 for an effective estimation, the determinant and
inverse of S

n
(␭)are needed. As is derived in Appendix A.4, using S

n
(␭) =

−1/2
n
F

n
S
n
(␭)F
n

1/2
n
,wehave
|S

n
(␭)|=
1
(1 − ␭)
n−n


|S
n
(␭)|, and S
∗−1
n
(␭) = 
−1/2
n
F

n
S
−1
n
(␭)F
n

1/2
n
. (14.7)
Hence, the computation of the determinant of S

n
(␭)isnot more complicated
than S
n
(␭). Also,
V


nt
(␪, c

n
) = S

n
(␭)Y

nt
− Z

nt
␦ − c

n
= 
−1/2
n
F

n
S
n
(␭)F
n
F

n
(I

n
− W
n
)Y
nt
− 
−1/2
n
F

n
(I
n
− W
n
)Z
nt

−
−1/2
n
F

n
(I
n
− W
n
)c
n

= 
−1/2
n
F

n
(I
n
− W
n
)[S
n
(␭)Y
nt
− Z
nt
␦ − c
n
]
= 
−1/2
n
F

n
(I
n
− W
n
)V

nt
(␪, c
n
),
by using F
n
F

n
+H
n
H

n
= I
n
and H

n
(I
n
−W
n
) = 0,where Z
nt
= (Y
n,t−1
,W
n
Y

n,t−1
,
X
nt
) and V
nt
(␪, c
n
) = S
n
(␭)Y
nt
− Z
nt
␦ − c
n
. Hence,
V
∗
nt
(␪, c

n
)V

nt
(␪, c

n
) = V


nt
(␪, c
n
)(I
n
− W
n
)


+
n
(I
n
− W
n
)V
nt
(␪, c
n
), (14.8)
where 
+
n
= F
n

−1
n

F

n
is the generalized inverse of 
n
= (I
n
− W
n
)(I
n
− W
n
)

.
By using Equation 14.7 and 14.8, the log-likelihood function (Equation 14.6)

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
404 Handbook of Empirical Economics and Finance
for Y

nt
can be expressed in terms of Y
nt
as
ln L
n,T
(␪, c

n
) =−
n

T
2
ln(2␲␴
2
) − (n −n

)T ln(1 − ␭) + T ln |S
n
(␭)|

1
2␴
2
T

t=1
(S
n
(␭)Y
nt
− Z
nt
␦ − c
n
)


(I
n
− W
n
)


+
n
(I
n
− W
n
)
×(S
n
(␭)Y
nt
− Z
nt
␦ − c
n
). (14.9)
Hence, after the transformation, the QML method is to estimate the SDPD
model with only individual effects with n

cross-section units and T time
periods, where Equation 14.6 is the objective function. Alternatively, one may
maximize Equation 14.9 expressed in terms of the original variables. How-
ever, although the components of V

nt
are i.i.d. in the model, the elements of
V

nt
might not be independent (they are uncorrelated). The asymptotic anal-
ysis in Yu, de Jong, and Lee (2008) may not be directly carried over to the
transformed model with the disturbances V

nt
.
8
As Equation 14.6 is equivalent
to Equation 14.9, we can analyze the asymptotic distribution of the estimator
via Equation 14.9.
Using firstorderconditions,weconcentrate out c
n
in Equation14.9to obtain
the concentrated likelihood function in terms of ␪. For an n × 1 vector at
period t, ϒ
nt
,wedefinethe deviation from time means as
˜
ϒ
nt
= ϒ
nt

¯
ϒ

nT
and

ϒ
n,t−1
= ϒ
n,t−1

¯
ϒ
nT,−1
, where
¯
ϒ
nT
=
1
T

T
t=1
ϒ
nt
and
¯
ϒ
nT,−1
=
1
T


T
t=1
ϒ
n,t−1
.
The concentrated log-likelihood is
ln L
n,T
(␪) =−
n

T
2
ln 2␲ −
n

T
2
ln ␴
2
− (n −n

)T ln(1 − ␭) + T ln |I
n
− ␭W
n
|

1

2␴
2
T

t=1
˜
V

nt
(␪)(I
n
− W
n
)


+
n
(I
n
− W
n
)
˜
V
nt
(␪), (14.10)
where
˜
V

nt
(␪) = S
n
(␭)
˜
Y
nt

˜
Z
nt
␦ and (I
n
− W
n
)
˜
V
nt
(␪) = (I
n
− W
n
)[S
n
(␭)
˜
Y
nt


˜
Z
nt
␦ − ˜␣
t
l
n
] because (I
n
−W
n
)l
n
= 0.At␪
0
,
˜
V
nt
= S
n
˜
Y
nt

˜
Z
nt

0

. For Equation
14.10, its first- and second-order derivatives are Equation A.16 and A.17 in
Appendix C.2.
14.3 Asymptotic Properties of QMLE
For our analysis of the asymptotic properties of estimators, we make the
following assumptions. Denote J

n
= (I
n
− W
n
)


+
n
(I
n
− W
n
). We note that J

n
is an orthonormal projector with rank n

(see Appendix A.5).
8
One could not treat the components of V


nt
as if they were independent when the distur-
bances are not normally distributed. Furthermore, it is not clear whether W

n
and A

n
=
(I
n

− ␭
0
W

n
)
−1
(␥
0
I
n

+ ␳
0
W

n
) would be uniformly bounded in both row and column sums

even though W
n
and A
n
are.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
A Unified Estimation Approach for Spatial Dynamic Panel Data Models 405
Assumption 1 W
n
is a row-normalized nonstochastic spatial weights matrix
with zero diagonals.
Assumption 2 The disturbances {v
it
}, i = 1, 2, ,nand t = 1, 2, ,T, are
i.i.d. across i and t with zero mean, variance ␴
2
0
and E|v
it
|
4+␩
< ∞ for some
␩ > 0.
Assumption 3 S
n
(␭)isinvertible for all ␭ ∈ . Furthermore,  is compact
and the true parameter ␭
0

is in the interior of .
Assumption 4 Theelements of X
nt
are nonstochastic and bounded, uniformly
in n and t, and the limit of
1
nT

T
t=1
˜
X

nt
J

n
˜
X
nt
exists and is nonsingular.
Assumption 5 W
n
is uniformly bounded in row and column sums in the
absolute value (for short, UB).
9
Also S
−1
n
(␭)isUB, uniformly in ␭ ∈ .

Assumption 6


h=1
abs(B
h
n
)isUB, where [abs(B
n
)]
ij
=|B
n,i j
|.
Assumption 7 n

is a nondecreasing function of T and T goes to infinity.
Assumption 1 is a standard normalization assumption in spatial economet-
rics. In many empirical applications, the rows of W
n
sum to 1, which ensures
that all the weights are between 0 and 1. Assumption 2 provides regularity as-
sumptions for v
it
. Assumption 3 guarantees that Equation 14.2 is valid. When
exogenous variables X
nt
are included in the model, it is convenient to assume
that their elements are uniformly bounded
10

as in Assumption 4. Assumption
5isoriginated by Kelejian and Prucha (1998, 2001) and is also used in Lee
(2004, 2007). The uniform boundedness of W
n
and S
−1
n
(␭)isacondition that
limits the spatial correlation to a manageable degree. Assumption 6 is the ab-
solute summability condition and row/column sum boundedness condition,
which will play an important role for asymptotic properties of QML estima-
tor. In order to justify the absolute summability of B
n
,asufficient condition
is B
n
 < 1 for any matrix norm (see Horn and Johnson (1985), Corollary
5.6.16) that satisfies B
n
=abs(B
n
). When B
n
 < 1,


h=0
B
h
n

exists and
can be defined as (I
n
− B
n
)
−1
. Assumption 7 allows two cases: (1) n

→∞
as T →∞; (2) n

can remain finite as T →∞. Because (2) is similar to a
vector autoregressive (VAR) model, our main interest is in (1). If Assumption
7 holds, then we say that n

,T→∞simultaneously. These assumptions are
similar to those in Yu, de Jong, and Lee (2008).
14.3.1 Consistency
For the log-likelihood function Equation 14.10 divided by the effective sam-
ple size n

T,wehave the corresponding Q
n,T
(␪) = E max
c
n
1
n


T
ln L
n,T
(␪, c
n
).
9
We say a (sequence of n × n) matrix P
n
is uniformly bounded in row and column sums if
sup
n≥1
P
n


< ∞ and sup
n≥1
P
n

1
< ∞, where P
n


≡ sup
1≤i≤n

n

j=1
|p
ij,n
| is the row
sum norm and P
n

1
= sup
1≤j≤n

n
i=1
|p
ij,n
| is the column sum norm.
10
If X
nt
is allowed to be stochastic, appropriate moment conditions can be imposed instead.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
406 Handbook of Empirical Economics and Finance
Hence,
11
Q
n,T
(␪) =
1

n

T
ElnL
n,T
(␪)
=−
1
2
ln 2␲ −
1
2
ln ␴
2

n −n

n

ln(1 −␭) +
1
n

ln |S
n
(␭)| (14.11)

1
2␴
2

1
n

T
E

T

t=1
˜
V

nt
(␪)J

n
˜
V
nt
(␪)

.
It is shown in Appendix D.2 that, under Assumptions 1–7,
1
n

T
ln L
n,T
(␪) −

Q
n,T
(␪)
p
→ 0 uniformly in ␪ ∈  and Q
n,T
(␪)isuniformly equicontinuous
for ␪ ∈ . For the identification, denote the information matrix 

0
,nT
=
−E(
1
n

T

2
ln L
n,T
(␪
0
)
∂␪∂␪

). If 

0
,nT

is nonsingular and −E(
1
n

T

2
ln L
n,T
(␪)
∂␪∂␪

) has full rank
for ␪ in some neighborhood N(␪
0
)of␪
0
, the parameters are locally identified
(seeRothenberg1971). DenoteH
nT
=
1
n

T

T
t=1
(
˜

Z
nt
,G
n
˜
Z
nt

0
)

J

n
(
˜
Z
nt
,G
n
˜
Z
nt

0
)
and G

n
= W


n
S
∗−1
n
. Using Lemma 15 in Yu, de Jong, and Lee (2008),


0
,nT
=
1

2
0

EH
nT
0
(k+3)×1
0
1×(k+3)
0

+






0
(k+2)×(k+2)
0
(k+2)×1
0
(k+2)×1
0
1×(k+2)
1
n


tr(G
∗
n
G

n
) + tr(G

n
2
)

1

2
0
n


tr(G

n
)
0
1×(k+2)
1

2
0
n

tr(G

n
)
1
2␴
4
0





(14.12)
+O

1
T


,
whichisnonsingular if EH
nT
isnonsingularor
1
n

[tr(G
∗
n
G

n
)+tr(G

n
2
)−
2tr
2
(G

n
)
n

]
is positive (see Appendix D.1). Also, its rank does not change in a small
neighborhood of ␪

0
(see Equation 14.49).
When lim
T→∞
EH
nT
is nonsingular, the parameters are identified.
Theorem 14.1 Under Assumptions 1–7, if lim
T→∞
EH
nT
is nonsingular, ␪
0
is
identified and
ˆ

nT
p
→ ␪
0
.
Proof See Appendix D.2.
When lim
T→∞
EH
nT
is singular, identification can still be obtained from the
following theorem. Denote ␴
2

n
(␭) =

2
0
n

tr(S
−1
n
S

n
(␭)J

n
S
n
(␭)S
−1
n
).
11
Because W
n
= R
n
ϖ
n
R

−1
n
, |S
n
(␭)|=|I
n
− ␭ϖ
n
|=(1 − ␭)
m
n

n
j=m
n
+1
(1 − ␭ϖ
nj
). Therefore,
1
n

ln |S
n
(␭)|−
n−n

n

ln(1 −␭) =

1
n


n
j=m
n
+1
(1−␭ϖ
nj
) shows that the division by n

is proper.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
A Unified Estimation Approach for Spatial Dynamic Panel Data Models 407
Theorem14.2 UnderAssumptions1–7,if lim
n

→∞
(
1
n

ln |␴
2
0
S
∗−1

n
S
∗−1
n
|−
1
n

ln |×

2
n
(␭)S
∗−1
n
(␭)S
∗−1
n
(␭)|) = 0 for ␭ = ␭
0
, then ␪
0
is identified
12
and
ˆ

nT
p
→ ␪

0
.
Proof See Appendix D.3.
14.3.2 Asymptotic Distribution
As Z
nt
= (Y
n,t−1
,W
n
Y
n,t−1
,X
nt
), we can decompose (I
n
− W
n
)
˜
Z
nt
such that
(I
n
− W
n
)
˜
Z

nt
= (I
n
− W
n
)
˜
Z
(c)
nt
− ((I
n
− W
n
)
¯
U
nT,−1
, (I
n
− W
n
)W
n
¯
U
nT,−1
, 0
n×k
),

(14.13)
where
˜
Z
(c)
nt
= ((

X
n,t−1
+ U
n,t−1
), (W
n

X
n,t−1
+ W
n
U
n,t−1
),
˜
X
nt
) with

X
n,t−1
=

X
n,t−1

¯
X
nT,−1
, X
nt



h=0
B
h
n
S
−1
n
X
n,t−h
and U
nt



h=0
B
h
n
S

−1
n
V
n,t−h
. Hence,
(I
n
−W
n
)
˜
Z
nt
has two components: one is (I
n
−W
n
)
˜
Z
(c)
nt
, which is uncorrelated
with V
nt
; the remaining one can be correlated with V
nt
when t ≤ T − 1. Here,
after thedatatransformationby I
n

−W
n
, theunstableorexplosivecomponents
and time component in
˜
Z
nt
are all eliminated. Therefore, from Equation 14.45,
the score can be decomposed into two parts such that
1

n

T
∂ ln L
n,T
(␪
0
)
∂␪
=
1

n

T
∂ ln L
(c)
n,T
(␪

0
)
∂␪
− 
nT
, (14.14)
where
1

n

T
∂ ln L
(c)
n,T
(␪
0
)
∂␪
=













1

2
0
1

n

T
T

t=1
˜
Z
(c)
nt
J

n
V
nt
1

2
0
1

n


T
T

t=1
(G
n
˜
Z
(c)
nt

0
)

J

n
V
nt
+
1

2
0
1

n

T

T

t=1
(V

nt
G

n
J

n
V
nt
− ␴
2
0
trG

n
)
1
2␴
4
0
1

n

T

T

t=1
(V

nt
J

n
V
nt
− n


2
0
)












,

(14.15)
12
For our asymptotic analysis, finite n

is allowed as long as T is tending to infinity, even
though that is not an interesting case for SAR models. When n

is finite, the condition is
1
n

ln |␴
2
0
S
∗−1
n
S
∗−1
n
|−
1
n

ln |␴
2
n
(␭)S
∗−1
n

(␭)S
∗−1
n
(␭)| = 0 for ␭ = ␭
0
.

P1: NARESH CHANDRA
November 12, 2010 18:3 C7035 C7035˙C014
408 Handbook of Empirical Economics and Finance
and

nT
=

n

T










1


2
0
T
n

(J

n
¯
U
nT,−1
,J

n
W
n
¯
U
nT,−1
, 0)

¯
V
nT
1

2
0
T
n


(J

n
G
n
(
¯
U
nT,−1
,W
n
¯
U
nT,−1
, 0)␦
0
)

¯
V
nT
+
1

2
0
T
n


¯
V

nT
G

n
J

n
¯
V
nT
1
2␴
4
0
T
n

¯
V

nT
J

n
¯
V
nT











.
(14.16)
Similarly to Yu, de Jong, and Lee (2008), the variance matrix of
1

n

T
∂ ln L
(c)
n,T
(␪
0
)
∂␪
is equal to
E

1


n

T
∂ ln L
(c)
n,T
(␪
0
)
∂␪
·
1

n

T
∂ ln L
(c)
n,T
(␪
0
)
∂␪


= 

0
,nT
+ 


0
,n
+ O(T
−1
),
(14.17)
where 

0
,nT
is in Equation 14.12 and


0
,n
=

4
− 3␴
4
0

4
0









0
(k+2)×(k+2)
0
(k+2)×1
0
(k+2)×1
0
1×(k+2)
1
n

n

i=1
(G

n
2
)
ii
1
2␴
2
0
n

tr(G


n
)
0
1×(k+2)
1
2␴
2
0
n

tr(G

n
)
1
4␴
4
0








is a symmetric matrix with ␮
4
being the fourth moment of v

it
. When V
nt
is normally distributed, 

0
,n
= 0 because ␮
4
− 3␴
4
0
= 0. Denote 

0
=
lim
T→∞


0
,nT
and 

0
= lim
T→∞


0

,n
. The asymptotic distribution of
1

n

T
∂ ln L
(c)
n,T
(␪
0
)
∂␪
can be derived from the central limit theorem for martingale
difference arrays (Lemma 14.3). For the term 
nT
,from Equation 14.36 in
Lemma 14.1 and Equation 14.38 in Lemma 14.2, 
nT
=

n

T
a

0
,n
+ O(


n

T
3
) +
O
p
(
1

T
) where
a

0
,n
=


















1
n

tr

J

n


h=0
B
h
n

S
−1
n

1
n

tr

W

n

J

n


h=0
B
h
n

S
−1
n

0
k×1
1
n


0
tr(G
n

J

n



h=0
B
h
n

S
−1
n
) +
1
n


0
tr(G
n
W
n

J

n


h=0
B
h
n


S
−1
n
) +
1
n

trG

n
1
2␴
2
0


















(14.18)

×