Tải bản đầy đủ (.pdf) (30 trang)

CAO HỌC TÀI LIỆU PHÂN TÍCH STATA 3

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.38 MB, 30 trang )



Pham Thi Bich Ngoc, Ph.D. (University of Kiel, Germany)
FEC/Hoa Sen University




UNIVERSITY OF ECONOMICS HOCHIMINHCITY, June 2014
June 14 - Dr. Pham Thi Bich Ngoc
 Endogeneity refers to the fact that an
independent variable (IV) included in the
model is a choice variable (not enxogenous)
June 14 - Dr. Pham Thi Bich Ngoc
 Omitted variable bias
 Sample selection bias/Measurement error
 Simultaneity
June 14 - Dr. Pham Thi Bich Ngoc
Omitting a variable (X2) creates a bias only if:
1. X
2
is an explanator of Y (so, when omitted, it
becomes a component of the error term)
2. X
2
is correlated with X
1
(so that X
2
creates a
correlation between X


1
and
the error term).
June 14 - Dr. Pham Thi Bich Ngoc
 Sample selection bias may occur when the
study subjects are not a random sample of
the population either on the dependent
variable or on an independent variable.
 Examples:
◦ Estimating women’s wage equation.

 Measurement error also induces a
correlation between our included
explanator and the error term.
 Instead of observing X
i
, we observe
M
i
 X
i
 v
i
June 14 - Dr. Pham Thi Bich Ngoc
Sample selection bias:
 Union workers may
be less able than
non-union workers,
and would be
earning less than

non-union workers
had they not joined
a union.
 b underestimates
the wage gain of
joining a union.
Non-union Union
wages
Observed earnings of
union workers
Earnings of non-union
workers if they joined
the union.
Observed earnings of
non-union workers
Sample Selection Bias
b
Observed earnings of
union workers had they
not joined the union.
True union effect
June 14 - Dr. Pham Thi Bich Ngoc
SAT
College GPA
True relationship
Estimated relationship
Suppose that the sample is selected
such that only students with a GPA
higher than B are included.
B

June 14 - Dr. Pham Thi Bich Ngoc
 An independent variable included in the
model is a choice variable, potentially
affected by the DV.
 Examples:
◦ IV = using a tutor or not; DV = grade
◦ IV = education; DV = income
◦ IV = union status; DV = wages
June 14 - Dr. Pham Thi Bich Ngoc
June 14 - Dr. Pham Thi Bich Ngoc
 Both X and Y are jointly determined
 The process that generates Y also
generates X at the same time
 Because X and Y are determined
simultaneously, X can adjust in response to
shocks to Y (
e
)
 Thus X will be correlated with
e
June 14 - Dr. Pham Thi Bich Ngoc
 The classic example of simultaneous causality in
economics is supply and demand.
 Both prices and quantities adjust until supply and demand
are in equilibrium.
 A shock to demand or supply causes BOTH prices and
quantities to move.
 Thus, any attempt to estimate the relationship between
prices and quantities (say, to estimate a demand elasticity)
suffers from SIMULTANEITY BIAS.

 Econometricians have a frequent interest in estimating
elasticities resulting from such an equilibrium process.
Simultaneity bias is a MAJOR problem.

 Consider a simple OLS regression:
◦ Y
it
= a
0
+ a
1
X
1it
+ u
it

 Recall that our estimate of a
1
will be
unbiased only if we can assume that X
1it
is
uncorrelated with the error term (u
it
)
 We have discussed two ways to help ensure
that this assumption is true
◦ First, we should control for any observable
variables that affect Y
it

and which are correlated
with X
1it
. For example, we should control for X
2it
if
X
2it
affects Y
it
and X
2it
is correlated with X
1it
:
◦ Y
it
= a
0
+ a
1
X
1it
+ a
2
X
2it
+ u
it
June 14 - Dr. Pham Thi Bich Ngoc

 Second, if we have panel data, we can
control for any unobservable firm-specific
characteristics (u
i
) that affect Y
it
and which
are correlated with the X variables.
 From Chapter 4:
◦ Y
it
= a
0
+ a
1
X
1it
+ a
2
X
2it
+ u
i
+ e
it
 We control for the correlations between u
i

and the X variables by estimating fixed
effects models.

 Our estimates of a
1
and a
2
are unbiased if
the X variables are uncorrelated with e
it
. In
this case, we say that the X variables are
“exogenous”.
June 14 - Dr. Pham Thi Bich Ngoc
 Unfortunately, multiple regression and fixed effects
models do not always ensure that the X variables
are uncorrelated with the error term:
◦ if we do not observe all the variables that affect Y and that
are correlated with X, multiple regression will not solve the
problem.
◦ if we do not have panel data, the fixed effects models
cannot be estimated.
◦ even if we have panel data, the Y and X variables may
display little variation over time in which case the fixed
effects models can be unreliable (Zhou, 2001).
◦ even if we have panel data and the Y and X variables display
sufficient variation over time, the unobservable variables
that are correlated with X may not be constant over time in
which case the fixed effects models will not solve the
problem.
June 14 - Dr. Pham Thi Bich Ngoc
 A variable is more likely to be correlated with the
error term if it is “endogenous”


 “Endogenous” means that the variable is
determined within the economic model that we are
trying to estimate.

June 14 - Dr. Pham Thi Bich Ngoc
 For example, suppose that Y
2it
is an endogenous
explanatory variable:
◦ Y
1it
= a
0
+ a
1
Y
2it
+ a
2
X
it
+ u
it
(1)

◦ Y
2it
= b
0

+ b
1
X
it
+ b
2
Z
it
+ v
it
(2)

 Equations (1) and (2) have a “triangular” structure
since Y
2it
is assumed to affect Y
1it
, but Y
1it
is
assumed not to affect Y
2it

 Given this triangular structure, the OLS estimate of
a
1
in equation (1) is unbiased only if v
it
is
uncorrelated with u

it
 If v
it
is correlated with u
it
, then Y
2it
is correlated
with u
it
which means that the OLS estimate of a
1

would be biased
 To avoid this bias, we must estimate equation (1)
“instrumental variables” (IV) regression rather than
OLS.
June 14 - Dr. Pham Thi Bich Ngoc
 Equations (1) and (2) are called “structural”
equations because they describe the
economic relationship between Y
1it
and Y
2it
 We can obtain a “reduced-form” equation
by substituting eq. (2) into eq. (1):
◦ Y
1it
= a
0

+ a
1
(b
0
+ b
1
X
it
+ b
2
Z
it
+ v
it
) + a
2
X
it
+ u
it
◦ In this “reduced-form” equation, all the explanatory
variables (X
it
and Z
it
) are exogenous
 The basic idea underlying IV regression is to
remove v
it
from the Y

1it
model so that our
estimate of a
1
is unbiased.
June 14 - Dr. Pham Thi Bich Ngoc
 Note that v
it
is removed from the Y
1it
model if we
use the predicted rather than the actual values of
Y
2it
on the right hand side.
 We predict Y
2it
using all the exogenous variables in
the system (in our example, we use the two
exogenous variables X
it
and Z
it
)
June 14 - Dr. Pham Thi Bich Ngoc
 We then use the predicted rather than the actual
values of Y
2it
when estimating the Y
1it

model:






 The a
1
estimate is biased in eq. (3) but it is
unbiased in eq. (4) because the v
it
term has been
removed.
June 14 - Dr. Pham Thi Bich Ngoc
 In eq. (4) the estimated coefficient for the Z
it

variable is
 We already know the value of from eq. (2):


 Therefore,
 it is important to note that the coefficient can
be estimated only if there is at least one exogenous
variable in the structural model for Y
2it
that is
excluded from the structural model for Y
1it




◦ This is the Z
it
variable in eq. (2)
June 14 - Dr. Pham Thi Bich Ngoc
 In eq. (4) the coefficient is “just” identified
because there is only one exogenous variable (Z
it
)
that is in the Y
2it
model and that is excluded from
the Y
1it
model


June 14 - Dr. Pham Thi Bich Ngoc
 Suppose we had included Z
it
in both models





 In this case, the coefficient cannot be identified
because we estimate and

◦ In other words, we cannot determine whether the effect of
Z
it
on Y
1it
is a main effect (a
3
) or an indirect effect through
Y
2it
(a
1
b
2
)
 Here we say that the system of equations is
“under-identified”
June 14 - Dr. Pham Thi Bich Ngoc
 Suppose we had included two exogenous variables
in the Y
2it
model and we excluded both these
variables from the Y
1it
model




 Now we have estimates of , , , and

 Therefore:
 Here we say that the system of equations is “over-
identified”
 In this example, the system is “triangular” because
there are two equations and one endogenous
right-hand side variable

June 14 - Dr. Pham Thi Bich Ngoc
 When the models have a triangular structure,
the models can be estimated using the
ivregress command
◦ The models can be estimated using 2SLS or GMM
◦ 2SLS is more commonly used in practice
June 14 - Dr. Pham Thi Bich Ngoc
 STATA
◦ xtivreg2 depvar
1
[varlist
1
] (depvar
2
= varlist
iv
)
 depvar
1
is the dependent variable for the model which
has an endogenous regressor
 varlist
1

are the exogenous variables in the model that has
the endogenous regressor
 depvar
2
is the endogenous regressor
 varlist
iv
are the exogenous variables that are believed to
affect the endogenous regressor

June 14 - Dr. Pham Thi Bich Ngoc
 We should test whether:
◦ our chosen instruments are exogenous (i.e., they
should be uncorrelated with the error term) and
◦ it is valid to exclude some of them from the model
that has the endogenous regressor.
 If they are not exogenous or they should not
be excluded, they are not valid instruments.
 estat endogenous
June 14 - Dr. Pham Thi Bich Ngoc

×