Pham Thi Bich Ngoc, Ph.D. (University of Kiel, Germany)
FEC/Hoa Sen University
UNIVERSITY OF ECONOMICS HOCHIMINHCITY, 03 June 2014
June14 - Dr. Pham Thi Bich Ngoc 1
Learn and use STATA?
“Economic Analysis of Cross section and
Panel data” - Jeffrey M. Wooldridge (2010)
June14 - Dr. Pham Thi Bich Ngoc 2
These are Models that Combine Cross-
section and Time-Series Data
In panel data the same cross-sectional unit
(industry, firm, country) is surveyed over
time, so we have data which is
pooled
over
space as well as time.
June14 - Dr. Pham Thi Bich Ngoc 3
1. Panel data can take explicit account of
individual-specific heterogeneity (“individual”
here means related to the microunit)
2. By combining data in two dimensions, panel
data gives more data variation, less collinearity
and more degrees of freedom.
3. Panel data is better suited than cross-
sectional data for studying the
dynamics of
change
. For example it is well suited to
understanding
transition
behaviour – for
example company bankruptcy or merger.
June14 - Dr. Pham Thi Bich Ngoc 4
4. Panel data is better at detecting and
measuring effects that cannot be observed
in either cross-section or time-series data.
5. Panel data enables the study of more
complex behavioural models – for example
the effects of technological change, or
economic cycles.
6. Panel data can minimise the effects of
aggregation bias, from aggregating firms
into broad groups.
June14 - Dr. Pham Thi Bich Ngoc 5
If all the cross-sectional units have the same number of time
series observations the panel is
balanced
, if not it is
unbalanced
.
NTiTTT
Ntittt
Ni
Ni
yyyy
yyyy
yyyy
yyyy
21
21
222212
112111
Time
series
Cross section
- a matrix of balanced panel data observations on variable
y
,
N
cross-sectional observations,
T
time series observations.
June14 - Dr. Pham Thi Bich Ngoc 6
Grunfeld and Griliches [1960]
◦ i = 10 firms: GM, CH, GE, WE, US, AF, DM, GY, UN,
IBM; t = 20 years: 1935-1954
◦ I
it
= Gross investment
◦ F
it
= Market value
◦ C
it
= Value of the stock of plant and equipment
it i it it it
I F C
June14 - Dr. Pham Thi Bich Ngoc 7
y
it
= Real per capita GDP
s
i
= Average saving rate (over 1960-1985)
n
i
= Average population growth rate (over 1960-1985)
g+d = 5%
COM
i
= 1 if communist, 0 otherwise
OPEC
i
=1 if OPEC, 0 otherwise
1
ln( ) ln( )
it t it i i i i it
y y s n g COM OPEC
d
June14 - Dr. Pham Thi Bich Ngoc 8
LWAGE = log of wage = dependent variable in regressions
EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM = 1 if female
UNION = 1 if wage set by union contract
ED = years of education
BLK = 1 if individual is black
June14 - Dr. Pham Thi Bich Ngoc 9
June14 - Dr. Pham Thi Bich Ngoc 10
Two basic windows
◦ Command
◦ Results
Optional windows
◦ Variable list
◦ History of commands
Other functions
◦ Data browser/editor
◦ Do file editor
◦ Viewer (for log, help
files, etc)
The usual – open, save, print
Log-file open/suspend/close
Do-file editor
Browse and Edit
Break
June14 - Dr. Pham Thi Bich Ngoc 11
Open draft-student.dta
Create .do file/.log file
A 3-factor Cobb- Douglas function (simple):
lnY = a
0
+ a
1
. lnK + a
2
. lnL + a
3
. lnM + u
i
lnY: output
lnK: capital
lnL: labor
lnM: material
…
June14 - Dr. Pham Thi Bich Ngoc 12
generate [varlist]
◦ Create new variables
Replace … if … (==/ >/ </ >=/ <=)
drop … if …
keep … if …
count …
count … if …
EG. gen D7=.
replace D7 =1 if year ==2007
replace D7=0 if year>=2007
June14 - Dr. Pham Thi Bich Ngoc 13
summarize [varlist] [, detail]
◦ # obs, mean, SD, range
◦ “, detail” gets you more detail (median, etc)
Eg. sum lnY/lnK/lnL/lnM
ci [varlist]
◦ Mean, standard error of mean, and confidence
intervals
◦ Actually works for dichotomous variables, too.
◦ Eg. ci lnY/lnK/lnL/lnM
June14 - Dr. Pham Thi Bich Ngoc 14
histogram varname
◦ Simple histogram of your variable
◦ Eg. histogram lnY
histogram lnY, frac
by(D7, title(“Firm Sales in 2007 and the Rest")
subtitle("(in VND)")
qnorm varname
◦ Quantile plot of your variable to check normality
◦ Eg. qnorm lnY
June14 - Dr. Pham Thi Bich Ngoc 15
regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
predict r, resid
kdensity r, normal
June14 - Dr. Pham Thi Bich Ngoc 16
tabulate [varname]
◦ Counts and percentages
◦ (see also, table - this is very different!)
tabulate [varname], missing
Eg. tab D7
June14 - Dr. Pham Thi Bich Ngoc 17
tabulate [var1] [var2]
◦ “Cross-tab”
◦ Descriptive options
, row (row percentages)
, col (column percentages)
Eg. tab D7 sectorcode if sectorcode<11
June14 - Dr. Pham Thi Bich Ngoc 18
scatter [var1] [var2]
◦ Scatterplot of the two variables
◦ Extention:
twoway lfit[var1] [var2]
twoway scatter [var1] [var2]|| lfit [var1]
[var2]||, by(var3, total row(1))
/>aphdocs/twoway-linear-prediction-plot/index.html
Eg. Graph lnY to lnK (linear, scatter plots)
June14 - Dr. Pham Thi Bich Ngoc 19
pwcorr [varlist] [, sig]
◦ Pairwise correlations between variables
◦ “sig” option gives p-values
spearman [varlist] [, stats(rho p)]
Eg: Correlation between lnY/lnK/lnL/lnM?
June14 - Dr. Pham Thi Bich Ngoc 20
regress depvar [indepvars] [if] [in]
[weight] [, options]
regress fits a model of depvar on indepvars
using linear regression.
regress lnY lnK lnL lnM horizontal Bam Bch
Checking Homoscedasticity of Residuals
rvfplot, yline(0)
June14 - Dr. Pham Thi Bich Ngoc 21
xtset id year
xtreg lnY lnK lnL lnM …
xtreg lnY lnK lnL lnM … i.year
xtreg lnY lnK lnL lnM … i.year i.industry
June14 - Dr. Pham Thi Bich Ngoc 22