Topic 1: Panel data models

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2 MB, 35 trang )

Dr. Pham Thi Bich Ngoc
Hoa Sen University

1







Learn and use STATA?
/>Introductory Economics: A Modern Approach
- Jeffrey M. Wooldridge (2012)
“Economic Analysis of Cross section and
Panel data” - Jeffrey M. Wooldridge (2010)

2

YEU TO CHU THE VA YEU TO THOI GIA





These are Models that Combine Crosssection and Time-Series Data
In panel data the same cross-sectional unit
(industry, firm, country) is surveyed over

time, so we have data which is pooled over
space as well as time.
I : ID (DOANH NGHIEP, INDIVIDUAL, HOUSEHOLD, COUNTRY, INDUSTRY....
T : TIME (DAY, WEEK, QUATER, YRYEAR...

ID
/ YEAR
KHOA
2010
KHOA
2011
KHOA
2012
PHUONG
2010
PHUONG
2011

/ WAGE
7
8
8
5
5

/ EDU
12
12
12
12

13

/

EXP
6
7
8
1
3

/ MARRIED.
0
0
0
0
0

file excel BT1

3

If all the cross-sectional units have the same number of time
series observations the panel is balanced, if not it is
unbalanced.
Cross section

 y 11
y

 12
Time  
series  y
 1t
 

y 1T

y 21  y i 1  y N 1 
y 22  y i 2  y N 2 


y 2t


y 2T





 y it

 y iT

 

 y Nt 

 


 y NT 

- a matrix of balanced panel data observations on variable y,
N cross-sectional observations, T time series observations.

4

1. Panel data can take explicit account of individualspecific heterogeneity (“individual” here means
related to the microunit)

2. By combining data in two dimensions, panel data
gives more data variation, less collinearity and
more degrees of freedom.
3. Panel data is better suited than cross-sectional
data for studying the dynamics of change. For
example it is well suited to understanding
transition behaviour – for example company
bankruptcy or merger; the effects of technological
change, or economic cycles.

5



Grunfeld and Griliches [1960]

I it   i   Fit   Cit   it

◦ i = 10 firms: GM, CH, GE, WE, US, AF, DM, GY, UN,
IBM; t = 20 years: 1935-1954
◦ Iit = Gross investment
◦ Fit = Market value
◦ Cit = Value of the stock of plant and equipment

6

yit  t   yit 1   ln(si )   ln(ni  g  d )   COM i   OPECi   it







yit = Real per capita GDP
si = Average saving rate (over 1960-1985)
ni = Average population growth rate (over 1960-1985)
g+d = 5%
COMi = 1 if communist, 0 otherwise
OPECi =1 if OPEC, 0 otherwise

7




LWAGE = log of wage = dependent variable in regressions
EXP = work experience
WKS = weeks worked
OCC
= occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM
= 1 if female
UNION = 1 if wage set by union contract
ED = years of education
BLK = 1 if individual is black

8







Pooled OLS

Difference in Difference, First Differences
(FD), Between Effects, Fixed Effects (FE),
Random Effects (RE), and Hausman test
Two stages Least Square (2SLS)

Generalized Methods of Moments (GMM)

David Roodman, 2009. "How to do xtabond2: An introduction to
difference and system GMM in Stata," Stata Journal, StataCorp LP, vol.
9(1), pages 86-136, March.
David Roodman, 2006. "How to Do xtabond2: An Introduction to
"Difference" and "System" GMM in Stata," Working Papers 103, Center
for Global Development.

9

A. Pooled OLS
(Pooled Cross Section)

10







Often loosely use the term panel data to
refer to any data set that has both a crosssectional dimension and a time-series
dimension
More precisely it’s only data following the
same cross-section units over time
Otherwise it’s a pooled cross-section (also

called POLS)

11

coi taatat ca cac quan sat thoi gian nhu 1 quan sat binh thuong
thoi gian la tong hop cac bien co den nen kinh te.







We may want to pool cross sections just to
get bigger sample sizes
We may want to pool cross sections to
investigate the effect of time
We may want to pool cross sections to
investigate whether relationships have
changed over time

12

Pooled regression by OLS
• Suppose y is firm output and x is a number of employees
• We have i = 1…n firms and t = 1…T time periods (year)

yit = 0 + 1xit1 + . . . kxitk + ϵit

• A simple econometric model:

yit  a0  a1 xit   it
ϵit is a random error term: E (ϵit ) ~ N (0, σ2)

Assumptions: intercept and slope coefficients are constant
across time and firms and that the error term captures
differences over time and over firms???

13

Pooled regression by POLS may result in heterogeneity bias :
POLS khong xu ly dc tinh rieng

Pooled regression:

y

yit=a0+a1xit+uit
•
•

•
•
•
•

•
•

•

•

•

•

•
•

• •

True model: Firm 4

True model: Firm 3

True model: Firm 2

True model: Firm 1

x
14



reg depvar [indepvars] [i.year]



Heteroschedasticity:



reg depvar [indepvars] [i.year], robust



Heteroschedasticity + Autocorrelation:



reg depvar [indepvars] [i.year], cluster (id)

15

B. Fixed Effects Model

16

Fixed Effects Estimation:
(One Way) Fixed Effects Model: (individual effects)

If each group (firm) to have its own intercept:

yit  a0i  a1 xit   it

HOW?  create a set of dummy (binary) variables, one for
each firm, and include them as regressors.
 This form of estimation is also known as Least Squares
Dummy Variables (LSDV).
N

yit   a0i Dit  a1 xit   it
i 1

STATA: reg depvar [indepvars] i.id
17

(Two Way) Fixed Effects Model: (individual + time effects)
 allow the intercept to vary across the different time periods
(Two Way Fixed Effects):
N

T

i 1

t 1

yit   a0i Dit   a2iTit  a1 xit   it

STATA: reg depvar [indepvars] i.id i.year

18

Fixed Effects/Within:
 discards all variation between individuals and uses only
variation over time within an individual

yit  yi  a0i  a0i  a1 ( xit  xi )  (eit  eit )
yit  yi  a1 ( xit  xi )  uit
STATA:
xtreg depvar [indepvars] [if] [in] [weight] , fe [FE_options]

19

C. Random Effects Model

20





Previously we’ve assumed that ui was
correlated with the x’s, but what if it’s not?
OLS would be consistent in that case, but
composite error will be serially correlated

21





Need to transform the model and do GLS
to solve the problem and make correct
inferences
End up with a sort of weighted average
of OLS and Fixed Effects – use quasidemeaned data

  1     T 
yit  yi   0 1     1  xit1  xi1   ...
2
u

2 12
a

2
u



  k  xitk  xik    it    i



22







If θ = 1, then this is just the fixed effects
estimator
If θ = 0, then this is just the OLS estimator
So, the bigger the variance of the
unobserved effect, the closer it is to FE
The smaller the variance of the unobserved
effect, the closer it is to OLS

23

Random Effects Estimation:
RE >< FE?

FE assumes that each group (firm) has a non-stochastic
group-specific component to y.
RE treats these unobservable effects as being stochastic
(i.e. random).

yit  a0  a1 xit  ui  eit
ui , the random error term/ varies between groups but not
within groups.

eit is the element of the error which varies over group and

time.

24

We assume that:

E (ui )  E (eit )  0
E (ui2 )   v2
E (eit2 )   2

(both components homoscedastic)

E (eit u j )  0  i, t , j

(independence of two components)

E (eit e js )  0 if t  s or i  j
E (ui u j )  0 if i  j

(no autocorrelation)

(no across group correlation)

E (ui xit )  E (eit xit )  0 (both independent of regressor)
STATA:
xtreg depvar [indepvars] [if] [in] [weight] , [re]

25

Topic 1: Panel data models

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về