Pham Thi Bich Ngoc, Ph.D. (University of Kiel, Germany)
FEC/Hoa Sen University
UNIVERSITY OF ECONOMICS HOCHIMINHCITY, June 2014
June 14 - Dr. Pham Thi Bich Ngoc 1
Multicollinearity occurs when two or more
independent variables in a regression model
are highly correlated to each other
Standard error of the OLS parameter estimate
will be higher if the corresponding
independent variable is more highly correlated
to the other independent variables in the
model
June 14 - Dr. Pham Thi Bich Ngoc 2
Perfect multicollinearity occurs when there is a
perfect linear correlation between two or more
independent variables
When independent variable takes a constant
value in all observations
June 14 - Dr. Pham Thi Bich Ngoc 3
The symptoms of a multicollinearity problem
1. independent variable(s) considered
critical in explaining the model’s
dependent variable are not
statistically significant according to
the tests
June 14 - Dr. Pham Thi Bich Ngoc 4
2. High R
2
, highly significant F-test,
but few or no statistically significant
t tests
3. Parameter estimates drastically
change values and become
statistically significant when
excluding some independent
variables from the regression
June 14 - Dr. Pham Thi Bich Ngoc 5
A simple test for multicollinearity is to
conduct “artificial” regressions between each
independent variable (as the “dependent”
variable) and the remaining independent
variables
Variance Inflation Factors (VIF
j
) are calculated
as:
2
j
j
R1
1
VIF
June 14 - Dr. Pham Thi Bich Ngoc 6
VIF
j
= 2, for example, means that variance is
twice what it would be if X
j
, was not affected
by multicollinearity
A VIF
j
>10 is clear evidence that the estimation
of B
j
is being affected by multicollinearity
June 14 - Dr. Pham Thi Bich Ngoc 7
Although it is useful to be aware of the
presence of multicollinearity, it is not easy to
remedy severe (non-perfect) multicollinearity
If possible, adding observations or taking a
new sample might help lessen multicollinearity
June 14 - Dr. Pham Thi Bich Ngoc 8
Exclude the independent variables that appear
to be causing the problem
Modifying the model specification sometimes
help, for example:
using real instead of nominal economic data
using a reciprocal instead of a polynomial
specification on a given independent variable
June 14 - Dr. Pham Thi Bich Ngoc 9
10
Var(
u
|
x
) = σ
2
[MLR.5]
Homoscedasticity assumption: variance is
constant
Recall the assumption of homoskedasticity
implied that conditional on the explanatory
variables, the variance of the unobserved
error,
u
, was constant
If this is not true, that is if the variance of
u
is different for different values of the
x
’s,
then the errors are heteroskedastic
June 14 - Dr. Pham Thi Bich Ngoc
11
.
x
x
1
x
2
f(y|x)
Picture of Heteroskedasticity
x
3
.
.
E(y|x) = b
0
+
b
1
x
June 14 - Dr. Pham Thi Bich Ngoc
12
This provides an estimator of the variance of
which is consistent
The square root of this can be used as a
standard error for inference
Typically call these robust or
heteroscedasticity-consistent standard errors
[Or White standard errors or Huber standard
errors…]
ˆ
j
b
June 14 - Dr. Pham Thi Bich Ngoc
13
Important to remember that these robust
standard errors only have asymptotic
justification – with small sample sizes
t
statistics formed with robust standard errors
will not have a distribution close to the
t
, and
inferences will not be correct
In Stata, robust standard errors are easily
obtained using the robust option of
regress
June 14 - Dr. Pham Thi Bich Ngoc
June 14 - Dr. Pham Thi Bich Ngoc
(3) Autocorrelation
Autocorrelation occurs in time-series studies
when the errors associated with a given time
period carry over into future time periods.
For example, if we are predicting the growth of
stock dividends, an overestimate in one year is
likely to lead to overestimates in succeeding
years.
14
Test: Durbin-Watson statistic:
d
(e
i
e
i 1
)
2
e
i
2
, for n and K-1 d.f.
Positive Zone of No Autocorrelation Zone of Negative
autocorrelation indecision indecision autocorrelation
|_______________|__________________|_____________|_____________|__________________|___________________|
0 d-lower d-upper 2 4-d-upper 4-d-lower 4
Autocorrelation is clearly evident
Ambiguous – cannot rule out autocorrelation
Autocorrelation in not evident
June 14 - Dr. Pham Thi Bich Ngoc 15
regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat vif
calculates the centered or uncentered variance inflation
factors (VIFs) for the independent variables specified in a
linear regression model.
June 14 - Dr. Pham Thi Bich Ngoc 16
regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat hettest
the Breusch-Pagan (1979) and Cook-Weisberg (1983) test for
heteroskedasticity
Ho: Constant variance (the variance of the residuals is
homogenous)
June 14 - Dr. Pham Thi Bich Ngoc 17
regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat bgodfrey Breusch-Godfrey test for higher-
order serial correlation
H0: no serial correlation
estat dwatson Durbin-Watson d statistic to test
for first-order serial correlation
The Durbin-Watson statistic has a range from 0 to
4 with a midpoint of 2.
For panel data:
xtserial varY varX, output
June 14 - Dr. Pham Thi Bich Ngoc 18
regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
estat ovtest
Ramsey regression specification-error test for omitted
variables
Ho: model has no omitted variables
June 14 - Dr. Pham Thi Bich Ngoc 19
xtreg lnY to lnK, lnL, lnM, horizontal, Bam, Bch
Multicollinearity: not problematic
Heteroschedasticity:
xtreg lnY lnK lnL lnM horizontal Bam Bch, robust
Heteroschedasticity + Autocorrelation:
xtreg lnY lnK lnL lnM horizontal Bam Bch, cluster (id)
June 14 - Dr. Pham Thi Bich Ngoc 20
Pooled OLS
Fixed Effects (FE), Random Effects (RE), and
Hausman test
Two stages Least Square (2SLS)
Generalized Methods of Moments (GMM)
David Roodman, 2009. "How to do xtabond2: An introduction to
difference and system GMM in Stata," Stata Journal, StataCorp LP, vol.
9(1), pages 86-136, March.
David Roodman, 2006. "How to Do xtabond2: An Introduction to
"Difference" and "System" GMM in Stata," Working Papers 103, Center
for Global Development.
June 14 - Dr. Pham Thi Bich Ngoc 21
• Suppose
y
is firm output and
x
is a number of employees
• We have
i
= 1…n firms and
t
= 1…
T
time periods (year)
• A simple econometric model:
u
it
is a random error term:
E
(
u
it
) ~
N
(0,
σ
2
)
Assumptions:
intercept and slope coefficients are constant
across time and firms and that the error term captures
differences over time and over firms???
ititit
uxaay
10
Pooled regression by OLS (STATA_xtreg…)
June 14 - Dr. Pham Thi Bich Ngoc 22
Pooled regression by OLS
may result in
heterogeneity bias
:
Pooled regression:
y
it
=
a
0
+
a
1
x
it
+
u
it
True model: Firm 1
True model: Firm 2
True model: Firm 3
True model: Firm 4
y
x
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
June 14 - Dr. Pham Thi Bich Ngoc 23
(
One Way
)
Fixed Effects Model
:
If each group (firm) to have its own intercept:
HOW? create a set of dummy (binary) variables, one for
each firm, and include them as regressors.
This form of estimation is also known as
Least Squares
Dummy Variables (LSDV)
.
ititiit
uxaay
10
itit
N
i
itiit
uxaDay
1
1
0
Fixed Effects Estimation:
STATA: xtreg … i.year
June 14 - Dr. Pham Thi Bich Ngoc 24
(
Two Way
)
Fixed Effects Model
:
allow the intercept to vary across the different time periods
(
Two Way Fixed Effects
):
itit
T
t
iti
N
i
itiit
uxaTaDay
1
1
2
1
0
STATA: xtreg … i.id i.year
June 14 - Dr. Pham Thi Bich Ngoc 25