Tải bản đầy đủ (.pdf) (22 trang)

CAO HỌC TÀI LIỆU PHÂN TÍCH STATA 1

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.76 MB, 22 trang )



Pham Thi Bich Ngoc, Ph.D. (University of Kiel, Germany)
FEC/Hoa Sen University




UNIVERSITY OF ECONOMICS HOCHIMINHCITY, 03 June 2014
June14 - Dr. Pham Thi Bich Ngoc 1
 Learn and use STATA?


 “Economic Analysis of Cross section and
Panel data” - Jeffrey M. Wooldridge (2010)

June14 - Dr. Pham Thi Bich Ngoc 2
 These are Models that Combine Cross-
section and Time-Series Data
 In panel data the same cross-sectional unit
(industry, firm, country) is surveyed over
time, so we have data which is
pooled
over
space as well as time.

June14 - Dr. Pham Thi Bich Ngoc 3
1. Panel data can take explicit account of
individual-specific heterogeneity (“individual”
here means related to the microunit)


2. By combining data in two dimensions, panel
data gives more data variation, less collinearity
and more degrees of freedom.
3. Panel data is better suited than cross-
sectional data for studying the
dynamics of
change
. For example it is well suited to
understanding
transition
behaviour – for
example company bankruptcy or merger.

June14 - Dr. Pham Thi Bich Ngoc 4
4. Panel data is better at detecting and
measuring effects that cannot be observed
in either cross-section or time-series data.
5. Panel data enables the study of more
complex behavioural models – for example
the effects of technological change, or
economic cycles.
6. Panel data can minimise the effects of
aggregation bias, from aggregating firms
into broad groups.
June14 - Dr. Pham Thi Bich Ngoc 5
If all the cross-sectional units have the same number of time
series observations the panel is
balanced
, if not it is
unbalanced

.


















NTiTTT
Ntittt
Ni
Ni
yyyy
yyyy
yyyy
yyyy







21
21
222212
112111
Time
series
Cross section
- a matrix of balanced panel data observations on variable
y
,
N
cross-sectional observations,
T
time series observations.
June14 - Dr. Pham Thi Bich Ngoc 6
 Grunfeld and Griliches [1960]


◦ i = 10 firms: GM, CH, GE, WE, US, AF, DM, GY, UN,
IBM; t = 20 years: 1935-1954
◦ I
it
= Gross investment
◦ F
it
= Market value
◦ C

it
= Value of the stock of plant and equipment
it i it it it
I F C
   
   
June14 - Dr. Pham Thi Bich Ngoc 7


 y
it
= Real per capita GDP
 s
i
= Average saving rate (over 1960-1985)
 n
i
= Average population growth rate (over 1960-1985)
 g+d = 5%
 COM
i
= 1 if communist, 0 otherwise
 OPEC
i
=1 if OPEC, 0 otherwise

1
ln( ) ln( )
it t it i i i i it
y y s n g COM OPEC

    d   

        
June14 - Dr. Pham Thi Bich Ngoc 8
 LWAGE = log of wage = dependent variable in regressions
 EXP = work experience
WKS = weeks worked
OCC = occupation, 1 if blue collar,
IND = 1 if manufacturing industry
SOUTH = 1 if resides in south
SMSA = 1 if resides in a city (SMSA)
MS = 1 if married
FEM = 1 if female
UNION = 1 if wage set by union contract
ED = years of education
BLK = 1 if individual is black
June14 - Dr. Pham Thi Bich Ngoc 9
June14 - Dr. Pham Thi Bich Ngoc 10
 Two basic windows
◦ Command
◦ Results

 Optional windows
◦ Variable list
◦ History of commands
 Other functions
◦ Data browser/editor
◦ Do file editor
◦ Viewer (for log, help
files, etc)

 The usual – open, save, print
 Log-file open/suspend/close
 Do-file editor
 Browse and Edit
 Break
June14 - Dr. Pham Thi Bich Ngoc 11
 Open draft-student.dta
 Create .do file/.log file
 A 3-factor Cobb- Douglas function (simple):
 lnY = a
0
+ a
1
. lnK + a
2
. lnL + a
3
. lnM + u
i
lnY: output
lnK: capital
lnL: labor
lnM: material



June14 - Dr. Pham Thi Bich Ngoc 12
 generate [varlist]
◦ Create new variables


 Replace … if … (==/ >/ </ >=/ <=)
 drop … if …
 keep … if …
 count …
 count … if …

 EG. gen D7=.
replace D7 =1 if year ==2007
replace D7=0 if year>=2007

June14 - Dr. Pham Thi Bich Ngoc 13

 summarize [varlist] [, detail]
◦ # obs, mean, SD, range
◦ “, detail” gets you more detail (median, etc)

Eg. sum lnY/lnK/lnL/lnM

 ci [varlist]
◦ Mean, standard error of mean, and confidence
intervals
◦ Actually works for dichotomous variables, too.
◦ Eg. ci lnY/lnK/lnL/lnM
June14 - Dr. Pham Thi Bich Ngoc 14

 histogram varname
◦ Simple histogram of your variable
◦ Eg. histogram lnY
 histogram lnY, frac
by(D7, title(“Firm Sales in 2007 and the Rest")

subtitle("(in VND)")

 qnorm varname
◦ Quantile plot of your variable to check normality
◦ Eg. qnorm lnY



June14 - Dr. Pham Thi Bich Ngoc 15

 regress lnY to lnK, lnL, lnM, horizontal, Bam, Bch
 predict r, resid
 kdensity r, normal




June14 - Dr. Pham Thi Bich Ngoc 16

 tabulate [varname]
◦ Counts and percentages
◦ (see also, table - this is very different!)
 tabulate [varname], missing

 Eg. tab D7
June14 - Dr. Pham Thi Bich Ngoc 17
 tabulate [var1] [var2]
◦ “Cross-tab”
◦ Descriptive options
, row (row percentages)

, col (column percentages)

Eg. tab D7 sectorcode if sectorcode<11

June14 - Dr. Pham Thi Bich Ngoc 18
 scatter [var1] [var2]
◦ Scatterplot of the two variables
◦ Extention:
twoway lfit[var1] [var2]
twoway scatter [var1] [var2]|| lfit [var1]
[var2]||, by(var3, total row(1))
/>aphdocs/twoway-linear-prediction-plot/index.html

Eg. Graph lnY to lnK (linear, scatter plots)


June14 - Dr. Pham Thi Bich Ngoc 19
 pwcorr [varlist] [, sig]
◦ Pairwise correlations between variables
◦ “sig” option gives p-values
 spearman [varlist] [, stats(rho p)]
Eg: Correlation between lnY/lnK/lnL/lnM?

June14 - Dr. Pham Thi Bich Ngoc 20
 regress depvar [indepvars] [if] [in]
[weight] [, options]
regress fits a model of depvar on indepvars
using linear regression.
 regress lnY lnK lnL lnM horizontal Bam Bch


 Checking Homoscedasticity of Residuals
 rvfplot, yline(0)


June14 - Dr. Pham Thi Bich Ngoc 21
xtset id year

xtreg lnY lnK lnL lnM …

xtreg lnY lnK lnL lnM … i.year

xtreg lnY lnK lnL lnM … i.year i.industry


June14 - Dr. Pham Thi Bich Ngoc 22

×