Tải bản đầy đủ (.pdf) (32 trang)

CAO HỌC TÀI LIỆU PHÂN TÍCH STATA 2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.04 MB, 32 trang )



Pham Thi Bich Ngoc, Ph.D. (University of Kiel, Germany)
FEC/Hoa Sen University




UNIVERSITY OF ECONOMICS HOCHIMINHCITY, June 2014
June 14 - Dr. Pham Thi Bich Ngoc 1
 Multicollinearity occurs when two or more
independent variables in a regression model
are highly correlated to each other
 Standard error of the OLS parameter estimate
will be higher if the corresponding
independent variable is more highly correlated
to the other independent variables in the
model
June 14 - Dr. Pham Thi Bich Ngoc 2
 Perfect multicollinearity occurs when there is a
perfect linear correlation between two or more
independent variables

 When independent variable takes a constant
value in all observations



June 14 - Dr. Pham Thi Bich Ngoc 3
 The symptoms of a multicollinearity problem


1. independent variable(s) considered
critical in explaining the model’s
dependent variable are not
statistically significant according to
the tests


June 14 - Dr. Pham Thi Bich Ngoc 4
2. High R
2
, highly significant F-test,
but few or no statistically significant
t tests

3. Parameter estimates drastically
change values and become
statistically significant when
excluding some independent
variables from the regression


June 14 - Dr. Pham Thi Bich Ngoc 5
 A simple test for multicollinearity is to
conduct “artificial” regressions between each
independent variable (as the “dependent”
variable) and the remaining independent
variables

 Variance Inflation Factors (VIF
j

) are calculated
as:





 
2
j
j
R1
1
VIF


June 14 - Dr. Pham Thi Bich Ngoc 6
 VIF
j
= 2, for example, means that variance is
twice what it would be if X
j
, was not affected
by multicollinearity

 A VIF
j
>10 is clear evidence that the estimation
of B
j

is being affected by multicollinearity


June 14 - Dr. Pham Thi Bich Ngoc 7
 Although it is useful to be aware of the
presence of multicollinearity, it is not easy to
remedy severe (non-perfect) multicollinearity

 If possible, adding observations or taking a
new sample might help lessen multicollinearity


June 14 - Dr. Pham Thi Bich Ngoc 8
 Exclude the independent variables that appear
to be causing the problem

 Modifying the model specification sometimes
help, for example:

 using real instead of nominal economic data

 using a reciprocal instead of a polynomial
specification on a given independent variable




June 14 - Dr. Pham Thi Bich Ngoc 9
10
 Var(

u
|
x
) = σ
2
[MLR.5]
 Homoscedasticity assumption: variance is
constant
 Recall the assumption of homoskedasticity
implied that conditional on the explanatory
variables, the variance of the unobserved
error,
u
, was constant
 If this is not true, that is if the variance of
u
is different for different values of the
x
’s,
then the errors are heteroskedastic
June 14 - Dr. Pham Thi Bich Ngoc
11
.
x

x
1
x
2
f(y|x)

Picture of Heteroskedasticity
x
3
.
.
E(y|x) = b
0
+
b
1
x
June 14 - Dr. Pham Thi Bich Ngoc
12
 This provides an estimator of the variance of
which is consistent
 The square root of this can be used as a
standard error for inference
 Typically call these robust or
heteroscedasticity-consistent standard errors
 [Or White standard errors or Huber standard
errors…]
ˆ
j
b
June 14 - Dr. Pham Thi Bich Ngoc
13
 Important to remember that these robust
standard errors only have asymptotic
justification – with small sample sizes
t


statistics formed with robust standard errors
will not have a distribution close to the
t
, and
inferences will not be correct
 In Stata, robust standard errors are easily
obtained using the robust option of
regress
June 14 - Dr. Pham Thi Bich Ngoc
June 14 - Dr. Pham Thi Bich Ngoc


(3) Autocorrelation
Autocorrelation occurs in time-series studies
when the errors associated with a given time
period carry over into future time periods.
For example, if we are predicting the growth of
stock dividends, an overestimate in one year is
likely to lead to overestimates in succeeding
years.
14
 Test: Durbin-Watson statistic:
d 
(e
i
 e
i 1
)
2


e
i
2

, for n and K-1 d.f.
Positive Zone of No Autocorrelation Zone of Negative
autocorrelation indecision indecision autocorrelation
|_______________|__________________|_____________|_____________|__________________|___________________|
0 d-lower d-upper 2 4-d-upper 4-d-lower 4
Autocorrelation is clearly evident
Ambiguous – cannot rule out autocorrelation
Autocorrelation in not evident
June 14 - Dr. Pham Thi Bich Ngoc 15
 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
 estat vif

 calculates the centered or uncentered variance inflation
factors (VIFs) for the independent variables specified in a
linear regression model.


June 14 - Dr. Pham Thi Bich Ngoc 16
 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
 estat hettest

 the Breusch-Pagan (1979) and Cook-Weisberg (1983) test for
heteroskedasticity
 Ho: Constant variance (the variance of the residuals is
homogenous)


June 14 - Dr. Pham Thi Bich Ngoc 17
 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch

 estat bgodfrey  Breusch-Godfrey test for higher-
order serial correlation
H0: no serial correlation

 estat dwatson  Durbin-Watson d statistic to test
for first-order serial correlation
 The Durbin-Watson statistic has a range from 0 to
4 with a midpoint of 2.

 For panel data:
 xtserial varY varX, output
June 14 - Dr. Pham Thi Bich Ngoc 18
 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
 estat ovtest

 Ramsey regression specification-error test for omitted
variables

 Ho: model has no omitted variables
June 14 - Dr. Pham Thi Bich Ngoc 19
 xtreg lnY to lnK, lnL, lnM, horizontal, Bam, Bch
 Multicollinearity: not problematic
 Heteroschedasticity:
 xtreg lnY lnK lnL lnM horizontal Bam Bch, robust
 Heteroschedasticity + Autocorrelation:
 xtreg lnY lnK lnL lnM horizontal Bam Bch, cluster (id)


June 14 - Dr. Pham Thi Bich Ngoc 20
 Pooled OLS
 Fixed Effects (FE), Random Effects (RE), and
Hausman test
 Two stages Least Square (2SLS)
 Generalized Methods of Moments (GMM)

David Roodman, 2009. "How to do xtabond2: An introduction to
difference and system GMM in Stata," Stata Journal, StataCorp LP, vol.
9(1), pages 86-136, March.
David Roodman, 2006. "How to Do xtabond2: An Introduction to
"Difference" and "System" GMM in Stata," Working Papers 103, Center
for Global Development.


June 14 - Dr. Pham Thi Bich Ngoc 21
• Suppose
y
is firm output and
x
is a number of employees
• We have
i
= 1…n firms and
t
= 1…
T
time periods (year)
• A simple econometric model:



u
it
is a random error term:
E
(
u
it
) ~
N
(0,
σ
2
)
Assumptions:
intercept and slope coefficients are constant
across time and firms and that the error term captures
differences over time and over firms???


ititit
uxaay 
10
Pooled regression by OLS (STATA_xtreg…)
June 14 - Dr. Pham Thi Bich Ngoc 22
Pooled regression by OLS
may result in
heterogeneity bias
:


Pooled regression:
y
it
=
a
0
+
a
1
x
it
+
u
it

True model: Firm 1
True model: Firm 2
True model: Firm 3
True model: Firm 4
y
x






• •









June 14 - Dr. Pham Thi Bich Ngoc 23
(
One Way
)
Fixed Effects Model
:
If each group (firm) to have its own intercept:


HOW?  create a set of dummy (binary) variables, one for
each firm, and include them as regressors.
 This form of estimation is also known as
Least Squares
Dummy Variables (LSDV)
.
ititiit
uxaay 
10
itit
N
i
itiit
uxaDay 



1
1
0
Fixed Effects Estimation:
STATA: xtreg … i.year
June 14 - Dr. Pham Thi Bich Ngoc 24
(
Two Way
)
Fixed Effects Model
:
 allow the intercept to vary across the different time periods
(
Two Way Fixed Effects
):
itit
T
t
iti
N
i
itiit
uxaTaDay 


1
1
2

1
0
STATA: xtreg … i.id i.year
June 14 - Dr. Pham Thi Bich Ngoc 25

×