Class Notes in Statistics and Econometrics Part 26 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (336.27 KB, 16 trang )

CHAPTER 51
Distinguishing Random Variables from Variables
Created by a Deterministic Chaotic Process
Dynamical systems are either described as recursive functions (discrete time) or
as diﬀerential equations.
With discrete time, recursive functions (recursive functions are diﬀerence equa-
tions, discrete analog of diﬀerential equations), one can easily get chaotic behavior.
E.g., the tent map or logistic function.
The problem is: how to distinguish the output of such a process from a randomly
generated output.
The same problem can also happ en in the continuous case. First-order diﬀerential
equations can be visualized as vector ﬁelds.
1083
1084 51. DETERMINISTIC CHAOS AND RANDOMNESS
An attractor A is a compact set which has a neighborhood U such that A is the
limit set of all trajectories starting in U. That means, every trajectory starting in
U comes arbitrarily close to each point of the attractor.
In R
2
, there are three diﬀerent types of attractors: ﬁxed points, limit cycles, and
saddle lo ops. But in R
3
and higher, chaos can occur, i.e., the trajectory can have a
“strange attractor.” Example: Lorenz attractor.
There is no commonly accepted deﬁnition of a strange attractor, it is an attractor
that is neither a point nor a closed curve, and trajectories attracted by it take vastly
diﬀerent courses after a short time.
Now fractal dimensions: ﬁrst the Hausdorﬀ dimension as lim
ε→0
log N(ε)
log(1/ε)

, indi-
cating the exponent with which the number of covering pieces N(ε) increases as the
diameter of the pieces diminishes.
Examples with integer dimensions: for points we have N(ε) = 1 always, there-
fore dimension is 0. For straight lines of length L, N(ε) = L/ε, therefore we get
lim
ε→0
log(L/ε)
log(1/ε)
= 1, and for an area with s urface S it is lim
ε→0
log(S/ε
2
)
log(1/ε)
= 2.
Famous example of set with fractal dimension is the Cantor set: start with
unit interval, take middle third out, then take middle third of the two remaining
segments out, etc. For ε = 1/3 one gets N(ε) = 2, for ε = 1/9 one gets N(ε) = 4,
51. DETERMINISTIC CHAOS AND RANDOMNESS 1085
and generally, for ε = (1/3)
m
one gets N(ε) = 2
m
. Therefore the dimension is
lim
m→∞
log 2
m
log 3

m
=
log 2
log 3
= 0.63.
A c oncept related to the Hausdorﬀ dimension is the correlation dimension. To
compute this one needs C(ε), the fraction of the total number of points that are
within the Euclidian distance ε of a given point. (This C(ε) is a quotient of two
inﬁnite numbers, but in ﬁnite samples it is a quotient of two large but ﬁnite numbers,
this is w hy it is more tractable than the Hausdorﬀ dimension.) Example again with
straight line and area, using sup norm: line: C(ε) = 2ε/L, area: C(ε) = 4ε
2
/S.
Then the correlation dimension is lim
ε→0
log C(ε)
log ε
, again indicating how this count
varies with the distance.
To compute it, use log C
M
(ε), which is the sample analog of log C(ε) for a sample
of size M, and plot it against log ε. To get this sample analog, look at all pairs of
diﬀerent points, and count those which are less than ε apart, and divide by total
number of pairs of diﬀerent points N(N − 1)/2.
Clearly, if ε is too small, it falls through between the points, and if it is too large,
it extends beyond the boundaries of the set. Therefore one cannot look at the slope
in the origin but must look at the slope of a straight line segment near the origin.
Another reason for not lo oking at too small ε is that there may be a measurement
error.)

1086 51. DETERMINISTIC CHAOS AND RANDOMNESS
It seems the correlation dimension is c lose to and cannot exceed the Hausdorﬀ
dimension. What one really wants is apparently the Hausdorﬀ dimension, but the
correlation dimension is a numerically convenient surrogate.
Importance of fractal dimensions: If an attractor has a fractal dimension, then
it is likely to be a strange attractor (although strictly speaking it is neither necessary
nor suﬃcient). E.g. it s ee ms to me the precise Hausdorﬀ dimension of the Lorentz
attractor is not known, but the correlation dimension is around 2.05.
51.1. Empirical Methods: Grassberger-Procaccia Plots.
With conventional statistical means, it is hard to distinguish chaotic determin-
istic from random timeseries. In a timeseries generated by a tent map, one obtains
for almost all initial conditions a time series whose the autocorrelation function is
zero for all lags. We need sophisticated results from chaos theory to be able to tell
them apart.
Here is the ﬁrst such result: Assume there is a time series of n-dimensional
vectors x t having followed a deterministic chaotic motion for a long time, so that
for all practical purposes it has arrived at its strange attractor, but at every time
point t you only observe the jth component x
j
t. Then an embedding of dimension
m is an artiﬁcial dynamical system formed by the m-histories of this jth component.
Takens proved that if x t lies on a strange attractor, and the embedding dimension
51.1. EMPIRICAL METHODS: GRASSBERGER-PROCACCIA PLOTS. 1087
m > 2n−1 then the embedding is topologically equivalent to the original time series.
In particular this means that it has the same correlation dimension.
This has important implications: if a time series is part of a deterministic system
also including other time series, then one can draw certain conclusions about the
attractor without knowing the other time series.
Next point: the correlation dimension of this embedding is lim
ε→0

log C(ε,m)
log ε
,
where the embedding dimension m is added as second argument into the function C.
If the system is deterministic, the correlation dimension settles to a stationary value
as the embedding dimension m increases; for a random system it keeps increasing, in
the i.i.d. case it is m. (In the special case that this i.i.d. distribution is the uniform
one, the m-histories are uniformly distributed on the m-dimensional unit cub e, and it
follows immediately, like our examples above.) Therefore the Grassberger-Procaccia
plots show for each m one curve, plotting log C(ε, m) against log ε.
For ε small, i.e., log ε going towards −∞, the plots of the true C’s become
asymptotically a straight line emanating from the origin with a given slope which
indicates the dimension. Now one cannot make ε very small for two reasons: (1)
there are only ﬁnitely many data points, and (2) there is also a measurement error
whose eﬀect disappe ars if ε becomes bigger than a few standard deviations of this
measurement error. Therefore one looks at the slope for values of ε that are not too
small.
1088 51. DETERMINISTIC CHAOS AND RANDOMNESS
One method to see whether there is a deterministic structure is to compare this
sample correlation dimension with that of “scrambled” data and see whether the
slopes of the original data do not become steeper while those of the scrambled data
still become steeper. Scrambling means: ﬁt an autocorrelation and then randomly
draw the residuals.
This is a powerful tool for distinguishing random noise from a deterministic
system.
CHAPTER 52
Instrumental Variables
Compare here [DM93, chapter 7] and [Gre97, Section 6.7.8]. Greene ﬁrst intro-
duces the simple instrumental variables estimator and then shows that the general-
ized one picks out the best linear combinations for forming simple instruments. I will

follow [DM93] and ﬁrst introduce the generalized instrumental variables estimator,
and then go down to the simple one.
In this chapter, we will discuss a sequence of models y
n
= X
n
β + ε
ε
ε
n
, where
ε
ε
ε
n
∼ (o
n
, σ
2
I
n
), and X
n
are n ×k-matrices of random regressors, and the number
of observations n → ∞. We do not make the assumption plim
1
n
X

n

ε
ε
ε
n
= o which
would ensure consistency of the OLS estimator (compare Problem 394). Instead, a
sequence of n ×m matrices of (random or nonrandom) “instrumental variables” W
n
1089
1090 52. INSTRUMENTAL VARIABLES
is available which satisﬁes the following three conditions:
plim
1
n
W

n
ε
ε
ε
n
= o(52.0.1)
plim
1
n
W

n
W
n

= Q exists, is nonrandom and nonsingular(52.0.2)
plim
1
n
W

n
X
n
= D exists, is nonrandom and has full column rank(52.0.3)
Full column rank in (52.0.3) is only possible if m ≥ k.
In this situation, regression of y on X is inconsistent. But if one regresses y
on the projection of X on
R
[W ], the column space of W , one obtains a c onsistent
estimator. This is called the instrumental variables estimator.
If x
i
is the ith column vector of X, then W (W

W )
−1
W

x
i
is the pro-
jection of x
i
on the space spanned by the columns of W . Therefore the matrix

W (W

W )
−1
W

X consists of the columns of X projected on
R
[W ]. This is what
we meant by the projection of X on
R
[W ]. With these projections as regressors,
the vector of regression coeﬃcients becomes the “generalized instrumental variables
estimator”
(52.0.4)
˜
β =

X

W (W

W )
−1
W

X

−1
X


W (W

W )
−1
W

y
52. INSTRUMENTAL VARIABLES 1091
Problem 460. 3 points We are in the model y = Xβ + ε
ε
ε and we have a
matrix W of “instrumental variables” which satisﬁes the following three conditions:
plim
1
n
W

ε
ε
ε = o, plim
1
n
W

W = Q exists, is nonrandom and positive deﬁnite, and
plim
1
n
W


X = D exists, is nonrandom and has full column rank. Show that the
instrumental variables estimator
(52.0.5)
˜
β =

X

W (W

W )
−1
W

X

−1
X

W (W

W )
−1
W

y
is consistent. Hint: Write
˜
β

n
− β = B
n
·
1
n
W

ε
ε
ε and show that the sequence of
matrices B
n
has a plim.
Answer. Wr ite it as
˜
β
n
=

X

W (W

W )
−1
W

X


−1
X

W (W

W )
−1
W

(Xβ + ε
ε
ε)
= β +

X

W (W

W )
−1
W

X

−1
X

W (W

W )

−1
W

ε
ε
ε
= β +

(
1
n
X

W )(
1
n
W

W )
−1
(
1
n
W

X)

−1
(
1

n
X

W )(
1
n
W

W )
−1
1
n
W

ε
ε
ε,
1092 52. INSTRUMENTAL VARIABLES
i.e., the B
n
and B of the hint are as follows:
B
n
=

(
1
n
X


W )(
1
n
W

W )
−1
(
1
n
W

X)

−1
(
1
n
X

W )(
1
n
W

W )
−1
B = plim B
n
= (D


Q
−1
D )
−1
D

Q
−1

Problem 461. Assume plim
1
n
X

X exists, and plim
1
n
X

ε
ε
ε exists. (We only
need the existence, not that t he ﬁrst is nonsingular and the second zero). Show that
σ
2
can be estimated consistently by s
2
=
1

n
(y − X
˜
β)

(y − X
˜
β).
Answer. y −X
˜
β = Xβ + ε
ε
ε − X
˜
β = ε
ε
ε − X(
˜
β −β). Therefore
1
n
(y −X
˜
β)

(y −X
˜
β) =
1
n

ε
ε
ε

ε
ε
ε −
2
n
ε
ε
ε

X(
˜
β −β) + (
˜
β −β)


1
n
X

X

(
˜
β −β).
All summands have plims, the plim of the ﬁrst is σ

2
and those of the other two are zero.

Problem 462. In the situation of Problem 460, add the stronger assumption
1
√
n
W

ε
ε
ε → N(o, σ
2
Q), and show that
√
n(
˜
β
n
− β) → N (o, σ
2
(D

Q
−1
D)
−1
)
Answer.
˜

β
n
− β = B
n
1
n
W

n
ε
ε
ε
n
, therefore
√
n(
˜
β
n
− β) = B
n
n
−1/2
W

n
ε
ε
ε
n

→ BN(o, σ
2
Q) =
N( o, σ
2
BQB

). Since B = (D

Q
−1
D )
−1
D

Q
−1
, the result follows. 
52. INSTRUMENTAL VARIABLES 1093
From Problem 462 follows that for ﬁnite samples approximately
˜
β
n
− β ∼
N

o,
σ
2
n

(D

Q
−1
D)
−1

. Since
1
n
(D

Q
−1
D)
−1
= (nD

(nQ)
−1
nD)
−1
, MSE[
˜
β; β]
can be estimated by s
2

X


W (W

W )
−1
W

X

−1
The estimator (52.0.4) is sometimes called the two stages least squares estimate,
because the projection of X on the column space of W can be considered the pre-
dicted values if one regresses every column of X on W . I.e., instead of regressing y
on X one regresses y on those linear combinations of the columns of W which best
approximate the columns of X. Here is more detail: the matrix of estimated coeﬃ-
cients in the ﬁrst regression is
ˆ
Π = (W

W )
−1
W

X, and the predicted values in
this regression are
ˆ
X = W
ˆ
Π = W (W

W )

−1
W

X. The second regression, which
regresses y on
ˆ
X, gives the coeﬃcient vector
(52.0.6)
˜
β = (
ˆ
X

ˆ
X)
−1
ˆ
X

y.
If you plug this in you s ee this is exactly (52.0.4) again.
Now let’s look at the geometry of instrumental variable regression of one variable
y on one other variable x with w as an instrument. The speciﬁcation is y = xβ +ε
ε
ε.
On p. 851 we visualized the asymptotic results if ε
ε
ε is asymptotically orthogonal to x.
Now let us assume ε
ε

ε is asymptotically not orthogonal to x. One can visualize this as
three vectors, again normalized by dividing by
√
n, but now even in the asymptotic
1094 52. INSTRUMENTAL VARIABLES
case the ε
ε
ε-vector is not orthogonal to x. (Draw ε
ε
ε vertically, and make x long enough
that β < 1.) We assume n is large enough so that the asymptotic results hold for
the sample already (or, perhaps better, that the diﬀerence between the sample and
its plim is only inﬁnitesimal). Therefore the OLS regression, with estimates β by
x

y/x

x, is inconsistent. Let O be the origin, A the point on the x-vector where
ε
ε
ε branches oﬀ (i.e., the end of xβ), furthermore let B be the point on the x-vector
where the orthogonal projection of y comes down, and C the end of the x-vector.
Then x

y =
¯
OC
¯
OB and x


x =
¯
OC
2
, therefore x

y/x

x =
¯
OB/
¯
OC, which would
be the β if the errors were orthogonal. Now introduce a new variable w which is
orthogonal to the errors. (Since ε
ε
ε is vertical, w is on the horizontal axis.) Call D the
projection of y on w, which is the prolongation of the vector ε
ε
ε, and call E the end of
the w-vector, and call F the projection of x on w. Then w

y =
¯
OE
¯
OD, and w

x =
¯

OE
¯
OF. Therefore w

y/w

x = (
¯
OE
¯
OD)(
¯
OE
¯
OF) =
¯
OD/
¯
OF =
¯
OA/
¯
OC = β.
Or geometrically it is obvious that the regression of y on the projection of x on w
will give the right
ˆ
β. One also sees here why the s
2
based on this second regression
is inconsistent.

If I allow two instruments, the two instruments must be in the horizontal plane
perpendicular to the vector ε
ε
ε which is assumed still vertical. Here we project x on
this horizontal plane and then regress the y, which stays where it is, on this x. In
this way the residuals have the right direction!
52. INSTRUMENTAL VARIABLES 1095
What if there is one instrument, but it does not not lie in the same plane as
x and y? This is the most general case as long as there is only one regressor and
one instrument. This instrument w must lie somewhere in the horizontal plane. We
have to project x on it, and then regress y on this projection. Look at it this way:
take the plane orthogonal to w which goes through point C. The projection of x
on w is the intersection of the ray generated by w with this plane. Now move this
plane parallel until it intersects point A. Then the intersection with the w-ray is the
projection of y on w. But this latter plane contains ε
ε
ε, since ε
ε
ε is orthogonal to w.
This makes sure that the regression gives the right results.
Problem 463. 4 points The asymptotic MSE matrix of the instrumental vari-
ables estimator with W as matrix of instruments is σ
2
plim

X

W (W

W )

−1
W

X

−1
Show that if one adds more instruments, then this asymptotic MSE-matrix can only
decrease. It is suﬃcient to show that the inequality holds before going over to the
plim, i.e., if W =

U V

, then
(52.0.7)

X

U(U

U)
−1
U

X

−1
−

X


W (W

W )
−1
W

X

−1
is nonnegative deﬁnite. Hints: (1) Use theorem A.5.5 in the Appendix (proof is
not required). (2) Note that U = W G for some G. Can you write this G in
1096 52. INSTRUMENTAL VARIABLES
partitioned matrix form? (3) Show that, whatever W and G, W (W

W )
−1
W

−
W G(G

W

W G)
−1
G

W

is idempotent.

Answer.
(52.0.8) U =

U V


I
O

= W G where G =

I
O

.

Problem 464. 2 points Show: if a matrix D has full column rank and is square,
then it has an inverse.
Answer. Here you need that column rank is row rank: if D has full column rank it also
has full row rank. And to make the proof complete you need: if A has a left inverse L and a
right inverse R, then L is the only left inverse and R the only right inverse and L = R. Proof:
L = L(AR) = (LA)R = R. 
Problem 465. 2 points If W

X is square and has full column rank, then it is
nonsingular. Show that in this case (52.0.4) simpliﬁes to the “simple” instrumental
variables estimator:
(52.0.9)
˜
β = (W


X)
−1
W

y
52. INSTRUMENTAL VARIABLES 1097
Answer. In this case the big inverse can be split into three:
(52.0.10)
˜
β =

X

W (W

W )
−1
W

X

−1
X

W (W

W )
−1
W


y =
(52.0.11) = (W

X)
−1
W

W (X

W )
−1
X

W (W

W )
−1
W

y

Problem 466. We only have one regressor with intercept, i.e., X =

ι x

, and
we have one instrument w for x (while the constant term is its own instrument),
i.e., W =


ι w

. Show that the instrumental variables estimators for slope and
intercept are
˜
β =

(w
t
− ¯w)(y
t
− ¯y)

(w
t
− ¯w)(x
t
− ¯x)
(52.0.12)
˜α = ¯y −
˜
β¯x(52.0.13)
Hint: the math is identical to that in question 238.
Problem 467. 2 points Show that, if there are as many instruments as there are
observations, then the instrumental variables estimator (52.0.4) becomes identical to
OLS.
1098 52. INSTRUMENTAL VARIABLES
Answer. In this case W has an inverse, therefore the projection on
R
[W ] is the identity.

Staying in the algebraic paradigm, (W

W )
−1
= W
−1
(W

)
−1
. 
An implication of Problem 467 is that one must be careful not to include too
many instruments if one has a small sample. Asymptotically it is better to have more
instruments, but for n = m, the instrumental variables estimator is equal to OLS, i.e.,
the sequence of instrumental variables estimators starts at the (inconsistent) OLS.
If one uses fewer instruments, then the asymptotic MSE matrix is not so good, but
one may get a sequence of estimators which moves away from the inconsistent OLS
more quickly.

Class Notes in Statistics and Econometrics Part 26 pot

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về