Dr. Pham Thi Bich Ngoc
Hoa Sen University
1
Learn and use STATA?
/>Introductory Economics: A Modern Approach
- Jeffrey M. Wooldridge (2012)
“Economic Analysis of Cross section and
Panel data” - Jeffrey M. Wooldridge (2010)
2
YEU TO CHU THE VA YEU TO THOI GIA
These are Models that Combine Crosssection and Time-Series Data
In panel data the same cross-sectional unit
(industry, firm, country) is surveyed over
time, so we have data which is pooled over
space as well as time.
I : ID (DOANH NGHIEP, INDIVIDUAL, HOUSEHOLD, COUNTRY, INDUSTRY....
T : TIME (DAY, WEEK, QUATER, YRYEAR...
ID
/ YEAR
KHOA
2010
KHOA
2011
KHOA
2012
PHUONG
2010
PHUONG
2011
/ WAGE
7
8
8
5
5
/ EDU
12
12
12
12
13
/
EXP
6
7
8
1
3
/ MARRIED.
0
0
0
0
0
file excel BT1
3
If all the cross-sectional units have the same number of time
series observations the panel is balanced, if not it is
unbalanced.
Cross section
y 11
y
12
Time
series y
1t
y 1T
y 21 y i 1 y N 1
y 22 y i 2 y N 2
y 2t
y 2T
y it
y iT
y Nt
y NT
- a matrix of balanced panel data observations on variable y,
N cross-sectional observations, T time series observations.
4
1. Panel data can take explicit account of individualspecific heterogeneity (“individual” here means
related to the microunit)
2. By combining data in two dimensions, panel data
gives more data variation, less collinearity and
more degrees of freedom.
3. Panel data is better suited than cross-sectional
data for studying the dynamics of change. For
example it is well suited to understanding
transition behaviour – for example company
bankruptcy or merger; the effects of technological
change, or economic cycles.
5
Grunfeld and Griliches [1960]
I it i Fit Cit it
◦ i = 10 firms: GM, CH, GE, WE, US, AF, DM, GY, UN,
IBM; t = 20 years: 1935-1954
◦ Iit = Gross investment
◦ Fit = Market value
◦ Cit = Value of the stock of plant and equipment
6
yit t yit 1 ln(si ) ln(ni g d ) COM i OPECi it
yit = Real per capita GDP
si = Average saving rate (over 1960-1985)
ni = Average population growth rate (over 1960-1985)
g+d = 5%
COMi = 1 if communist, 0 otherwise
OPECi =1 if OPEC, 0 otherwise
7
LWAGE = log of wage = dependent variable in regressions
EXP = work experience
WKS = weeks worked
OCC
= occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM
= 1 if female
UNION = 1 if wage set by union contract
ED = years of education
BLK = 1 if individual is black
8
Pooled OLS
Difference in Difference, First Differences
(FD), Between Effects, Fixed Effects (FE),
Random Effects (RE), and Hausman test
Two stages Least Square (2SLS)
Generalized Methods of Moments (GMM)
David Roodman, 2009. "How to do xtabond2: An introduction to
difference and system GMM in Stata," Stata Journal, StataCorp LP, vol.
9(1), pages 86-136, March.
David Roodman, 2006. "How to Do xtabond2: An Introduction to
"Difference" and "System" GMM in Stata," Working Papers 103, Center
for Global Development.
9
A. Pooled OLS
(Pooled Cross Section)
10
Often loosely use the term panel data to
refer to any data set that has both a crosssectional dimension and a time-series
dimension
More precisely it’s only data following the
same cross-section units over time
Otherwise it’s a pooled cross-section (also
called POLS)
11
coi taatat ca cac quan sat thoi gian nhu 1 quan sat binh thuong
thoi gian la tong hop cac bien co den nen kinh te.
We may want to pool cross sections just to
get bigger sample sizes
We may want to pool cross sections to
investigate the effect of time
We may want to pool cross sections to
investigate whether relationships have
changed over time
12
Pooled regression by OLS
• Suppose y is firm output and x is a number of employees
• We have i = 1…n firms and t = 1…T time periods (year)
yit = 0 + 1xit1 + . . . kxitk + ϵit
• A simple econometric model:
yit a0 a1 xit it
ϵit is a random error term: E (ϵit ) ~ N (0, σ2)
Assumptions: intercept and slope coefficients are constant
across time and firms and that the error term captures
differences over time and over firms???
13
Pooled regression by POLS may result in heterogeneity bias :
POLS khong xu ly dc tinh rieng
Pooled regression:
y
yit=a0+a1xit+uit
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
True model: Firm 4
True model: Firm 3
True model: Firm 2
True model: Firm 1
x
14
reg depvar [indepvars] [i.year]
Heteroschedasticity:
reg depvar [indepvars] [i.year], robust
Heteroschedasticity + Autocorrelation:
reg depvar [indepvars] [i.year], cluster (id)
15
B. Fixed Effects Model
16
Fixed Effects Estimation:
(One Way) Fixed Effects Model: (individual effects)
If each group (firm) to have its own intercept:
yit a0i a1 xit it
HOW? create a set of dummy (binary) variables, one for
each firm, and include them as regressors.
This form of estimation is also known as Least Squares
Dummy Variables (LSDV).
N
yit a0i Dit a1 xit it
i 1
STATA: reg depvar [indepvars] i.id
17
(Two Way) Fixed Effects Model: (individual + time effects)
allow the intercept to vary across the different time periods
(Two Way Fixed Effects):
N
T
i 1
t 1
yit a0i Dit a2iTit a1 xit it
STATA: reg depvar [indepvars] i.id i.year
18
Fixed Effects/Within:
discards all variation between individuals and uses only
variation over time within an individual
yit yi a0i a0i a1 ( xit xi ) (eit eit )
yit yi a1 ( xit xi ) uit
STATA:
xtreg depvar [indepvars] [if] [in] [weight] , fe [FE_options]
19
C. Random Effects Model
20
Previously we’ve assumed that ui was
correlated with the x’s, but what if it’s not?
OLS would be consistent in that case, but
composite error will be serially correlated
21
Need to transform the model and do GLS
to solve the problem and make correct
inferences
End up with a sort of weighted average
of OLS and Fixed Effects – use quasidemeaned data
1 T
yit yi 0 1 1 xit1 xi1 ...
2
u
2 12
a
2
u
k xitk xik it i
22
If θ = 1, then this is just the fixed effects
estimator
If θ = 0, then this is just the OLS estimator
So, the bigger the variance of the
unobserved effect, the closer it is to FE
The smaller the variance of the unobserved
effect, the closer it is to OLS
23
Random Effects Estimation:
RE >< FE?
FE assumes that each group (firm) has a non-stochastic
group-specific component to y.
RE treats these unobservable effects as being stochastic
(i.e. random).
yit a0 a1 xit ui eit
ui , the random error term/ varies between groups but not
within groups.
eit is the element of the error which varies over group and
time.
24
We assume that:
E (ui ) E (eit ) 0
E (ui2 ) v2
E (eit2 ) 2
(both components homoscedastic)
E (eit u j ) 0 i, t , j
(independence of two components)
E (eit e js ) 0 if t s or i j
E (ui u j ) 0 if i j
(no autocorrelation)
(no across group correlation)
E (ui xit ) E (eit xit ) 0 (both independent of regressor)
STATA:
xtreg depvar [indepvars] [if] [in] [weight] , [re]
25