Group 10
I.
Econometrics Report
INTRODUCTION..................................................................................................................3
1.
Overall about econometrics.............................................................................3
2.
Why choosing OLS?...................................................................................................4
II. QUESTION OF INTEREST...............................................................................................5
III.
ECONOMIC MODEL..........................................................................................................5
1.
Choosing the variables.......................................................................................5
2.
Embedding that target in a general unrestricted model (GUM)
8
IV. ECONOMETRICS MODEL....................................................................................................9
1.
Population regression function (PRF).....................................................9
2.
Sample regression function (SRF)...............................................................9
V.
DATA COLLECTION.........................................................................................................10
1.
Data overview..........................................................................................................10
2.
Data description...................................................................................................10
VI. ESTIMATION OF ECONOMETRIC MODEL...................................................................10
1.
Checking the correlation among variables:.......................................10
2.
Regression run........................................................................................................12
VII.
CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY.........................15
1.
Multicollinearity.................................................................................................15
2.
Heteroskedasticity..............................................................................................16
VIII.
HYPOTHESES POSTULATED...................................................................................19
1.
The t test.................................................................................................................19
2.
Confidence Intervals.........................................................................................21
3.
P Value......................................................................................................................22
4.
Testing the overall significance: The F test................................23
IX. RESULT ANALYSIS AND POLICY IMPLICATION..................................................24
X.
CONCLUSION.....................................................................................................................24
XI. REFERENCES.....................................................................................................................25
Y
Figure
Figure
Figure
Figure
1........................................................7
2........................................................9
3.......................................................10
4.......................................................11
1
Group 10
Econometrics Report
Figure
Figure
Figure
Figure
Figure
5.......................................................13
6.......................................................15
7.......................................................16
8.......................................................18
9.......................................................21
2
Group 10
Econometrics Report
I. INTRODUCTION
1.Overall about econometrics
Econometrics is the application of statistical methods to
economic data and is described as the branch of economics that
aims to give empirical content to economic relations. Precisely
speaking, it is the quantitative analysis of actual economic
problems, based on the concurrent development of theory and
observation, related by appropriate methods of inference. It is
understandable that economist make comparison econometrics is
like an effective tool to convert mountains of data into extract
simple relationships.
The reason why econometrics is effective is economics theory
use statistical theory and mathematical statistics to evaluate
and develop econometrics method. In reality, econometrics help
economists to assess economic theories, developing econometrics
model, analyzing and forecasting the economic history.
Aware of the importance of econometrics to economic phenomena,
our group decides to carry out a research of econometrics: “The
factors that have influence on median housing price” and aim to
analyze statistic and point out differences and their reason of
price level.
The data set has 506 observations with 12 variables in total.
We choose 6 variables: price, crime, nox, rooms, dist and proptax
to do the research in which price is dependent variable and the
other five are independent variables. The general method used in
this research is OLS (ordinary least squares). In addition, the
specialized method is estimate, running Stata software as well.
3
Group 10
Econometrics Report
During carrying out this research, our group is so lucky to
be guided thoroughly by Dr. Dinh Thi Thanh Binh. We are grateful
for everything you have taught us!
This is the first time our group carry out an econometrics
research, our performance is unavoidable to have many mistakes.
It would be a pleasure if we can receive the feedback from you to
better ourselves next time.
2.Why choosing OLS?
Ordinary least squares (OLS) is a type of linear least
squares method for estimating the unknown parameters in a linear
regression model. OLS chooses the parameters of a linear
function of a set of explanatory variables by the principle
of least squares: minimizing the sum of the squares of the
differences between the observed dependent variable in the
given dataset and those predicted by the linear function.
With the six selected variables, we use the OLS model because
all regressions variable are exogenous variables, the effects of
independent variables on the dependent variable are linear
effects. In addition, the estimates calculated by means of the
least squares OLS are linear estimates that are not deviate and
are better than others.
When using OLS, we have some basic assumptions:
1.
2.
The regression model is linear in the parameters
X values are fixed in repeated sampling, which means Xi
and ui are uncorrelated
3.
Zero mean value of disturbance (E(ui)) =0)
4.
Homoscedasticity or equal variance of ui : var(ui) =
5.
No correlation between disturbances
6.
The model is correctly specified.
4
Group 10
7.
II.
Econometrics Report
Number of observations must be greater than the number
of parameters to be estimated.
8.
X values in a given sample must not be the same.
9.
No perfect multicollinearity.
10. Normal distribution.
QUESTION OF INTEREST
We have always been wondering “Why do housing prices among
locations and regions differ so much?”. Housing prices are
affected by many different factors such as structure,
neighborhood, accessibility, air pollution and so on. To seek the
answer to that question, our group is going to use the collected
data to build and run the regression model and then the results
are going to be analyzed to finally answer the question of
interest above.
III. ECONOMIC MODEL
According the provided data, the economic model used in this
report is an empirical one. Note that the fundamental model is
mathematical; with an empirical model, however, data is gathered
for the variables and using accepted statistical techniques, the
data are used to provide estimates of the model's values.
1. Choosing the variables
Having described data via the command “des” in file… from
Stata software, we gain the result as following:
. des
obs: 506
vars: 12 31 Oct 1996
16:37
size: 22,770
5
Group 10
Econometrics Report
Variable name
storag
display
valu
e type
format
e
variable label
median housing
price
float
%9.0g
price, $
crimes committed
crime
float
%9.0g
per capita
nit ox concen;
nox
float
%9.0g
parts per 100m
avg number of
rooms
float
%9.0g
rooms
wght dist to 5
dist
float
%9.0g
employ centers
access. index to
radial
byte
%9.0g
rad. hghwys
property tax per
proptax
float
%9.0g
$1000
6
Group 10
Econometrics Report
average student
stratio
float
%9.0g
teacher ratio
perc of people
lowstat
lprice
lnox
lproptax
Figure 1
float
float
float
float
%9.0g
%9.0g
%9.0g
%9.0g
'lower status'
log(price)
log(nox)
log(proptax)
The above table reveal that this is the statistic of factors
which have influence in housing price via 506 observations. After
discussing carefully, our group jumped into a conclusion to
choose a dependent variable Y: Price, independent variable
contains:
X1crime
X2nox
X3rooms
X4dist
X5proptax
2. Embedding that target in a general unrestricted model
(GUM)
In its simplest acceptable representation (which will later be
specified in the econometric model), the GUM of is determined to
be:
A brief description of each variable is given in Figure 1.
7
Group 10
Econometrics Report
Name
Dependent
Variable (Y)
Independent
Variables (X)
Meaning
Expected
Price
Median housing price
sign
+
Crime
Number of crimes
Nox
committed per capita
The amount of nitrogen
oxide concentrator parts
Rooms
in the air per 100m
The average number of
+
Dist
rooms
Weight distance to 5
Proptax
employ centers
Property tax per $1000
Figure 2
IV. ECONOMETRICS MODEL
1. Population regression function (PRF)
PRF:
2. Sample regression function (SRF)
SRF:
where:
0 is the intercept of the regression model
i is the slope coefficient of the independent variable xi
is the disturbance of the regression model
is the estimator of 0
is the estimator of i
is the residual (the estimator of i )
8
Group 10
Econometrics Report
V. DATA COLLECTION
1. Data overview
This set of data is collected from a given source, therefore
it is a secondary one.
The structure of Economic data: crosssectional data
2. Data description
To get statistic indicators of the variables, in Stata, the
following command is used:
. sum
Variab
Std.
le
Obs
Mean
22511.
Dev.
9208.85
Min
Max
price
506
51
3.6115
6
8.59024
5000
0.00
50001
88.97
crime
506
36
5.5497
7
1.15839
6
6
nox
506
83
6.2840
5
0.70259
3.85
8.71
rooms
506
51
3.7957
38
2.10613
3.56
8.78
dist
propta
506
51
40.823
7
16.8537
1.13
12.13
x
506
72
1
18.7
71.1
Figure 3
where:
Obs is the number of observations
Std. Dev is the standard deviation of the variable
Min is the minimum value of the variable
Max is the maximum value of the variable
VI.
ESTIMATION OF ECONOMETRIC MODEL
1. Checking the correlation among variables:
9
Group 10
Econometrics Report
price
price
crime
crime
nox
rooms
dist
proptax
1
1
1
1
0.3879
nox
0.426
0.4212
rooms
0.6958
0.2188
0.3028
dist
0.2493
0.3799
0.7702
0.2054
0.4671
0.5828
0.667
0.2921
proptax
Figure 4
1
0.5344
1
First and foremost, the correlation of Price and nox, crime,
rooms, dist, proptax is checked by calculating the correlation
coefficient among these variables. The correlation coefficient
measures the strength and direction of a linear relationship
between two variables on a scatterplot. In Stata, the correlation
with matrix is generated the command:
corr price crime nox rooms dist proptax
We can see from the matrix, it can be inferred that the
correlation between price and each of the independent variable is
decent enough to run the regression model. Specifically:
Correlation coefficient between price and crime is 0.3879
=> price and crime have a moderate relationship.
Correlation coefficient between price and nox is 0.426 =>
price and nox have a moderate relationship.
Correlation coefficient between price and rooms is 0.6958 =>
price and rooms have a moderate relationship.
Correlation coefficient between price and dist is 0.2493 =>
price and dist have a weak relationship.
Correlation coefficient between price and proptax is 0.4671
=> price and proptax have a moderate relationship.
10
Group 10
Econometrics Report
Independent variables including Rooms and Dist have
correlation coefficient larger than 0, which means they are in
directly relationship with dependent variable. The highest
coefficient is 0.6958 (between Rooms and Price) points out that
Rooms have the strongest impaction on Price. When rooms
increases, then price will increase much. On the other hands, the
correlation coefficient between Price and Dist is 0.2493. It
implies that they have not strong connection. Even if the Dist
increases, Price increases but not much.
In addition, all variables have correlation coefficient not
larger than 0.8 so this model does not have multicollinearity
problem.
2. Regression run
Having checked the required condition of correlation among
variables, the regression model is ready to run. In Stata, this
is done by using the command:
Reg price nox crime rooms dist proptax
Number
of obs
F( 5,
=
506
500)
Prob > F
R
=
=
142.92
0
Source
Model
SS
2.52E+10
df
5
MS
5.04E+09
Residual
1.76E+10
500
35258403.7
squared
Adj R
=
0.5883
Total
4.28E+10
505
84803032
squared
=
0.5842
11
Group 10
Econometrics Report
Root MSE
[95%
Std.
Err.
t
38.11571
410.7763
399.0772
P>t
=
5937.9
price
crime
nox
rooms
Coef.
150.0703
1737.66
7707.327
dist
791.2588 197.9444
4
0
1180.164
402.3535
proptax
89.95717 23.61555
3.81
0
0.02
136.3551
43.55923
_cons
9060.303 3978.871
2.28
3
16877.67
1242.937
3.94
4.23
19.31
Conf.
Interval]
0 224.957
75.18364
0 2544.72
930.5992
0 6923.252
8491.402
Figure 5
From table above we have Sample Regression Function:
Price = 9060.303 1737.66*nox + 7707.327*rooms
89.95717*proptax
From the result, it can be inferred that
crime, nox, rooms, dist, proptax all have statistically
significant effects on price at the 5% significant level (as all
pvalues are smaller than 0.05). In particular, those effects can
be specified by the regression coefficients as follows:
β0 = 9060.303
1 = 1737.66 means that if nit ox concen per 100m increases by
one , average housing price will decrease by 1737.66 in condition
other factors do not change.
2 = 150.0703 means that if crimes committed per capital
increases by one , average housing price will decrease by 150.0703
in condition other factors do not change.
3 = 7707.327 means that if average number of rooms increases by
one, average housing price will increase by 7707.327 in condition
other factors do not change.
12
Group 10
Econometrics Report
4 = 791.2588 means that if weight distance to 5 employ centers
increases 1 unit, average housing price will decrease by 791.2588
in condition other factors do not change.
5 = 89.95717 means that if average property tax per $1000
increases by one, average housing price will decrease by 89.95717
in condition other factors do not change.
The coefficient of determination Rsquared=0.5883: all
independent variables (crime, nox, rooms, dist, proptax,)
jointly explain 58.83% of the variation in the dependent
variable (price); other factors that are not mentioned
explain the remaining 41.17% of the variation in the price.
Other indicators:
Adjusted coefficient of determination adj Rsquared = 0.5842
Total Sum of Squares TSS = 4,28E+14
Explained Sum of Squares ESS = 2,52E+14
Residual Sum of Squares RSS = 1,76E+14
The degree of freedom of Model Dfm= 5
The degree of freedom of residual Dfr = 500
VII. CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY
1. Multicollinearity
Multicollinearity is the high degree of correlation amongst
the explanatory variables, which may make it difficult to
separate out the effects of the individual regressors,
standard errors may be overestimated and tvalue depressed.
Detect multicollinearity
o
Method 1: Use cor command to examine multicollinearity
If independent variables are strongly correlated (r > 0.8),
multicollinearity may occur.
price
crime
price
1.0000
0.3879
crime
1.0000
nox
rooms
dist
proptax
13
Group 10
nox
rooms
dist
proptax
Figure 6
Econometrics Report
0.426
0.6958
0.2493
0.4671
0.4212
0.2188
0.3799
0.5828
1.0000
0.3028
0.7702
0.667
1.0000
0.2054
0.2921
1.0000
0.5344
1.0000
From the table above, we can easily see that correlating
coefficient among independent variables are pretty low and all
smaller than 0.8. As a result, we can conclude that
multicollinearity does not occur in this model.
o
Method 2: Use variance inflation factor (VIF)
If VIF > 10, multicollinearity occurs.
Variable
nox
dist
proptax
crime
rooms
Mean VIF
Figure 7
VIF
3.24
2.49
2.27
1.54
1.13
2.13
1/VIF
0.308352
0.401709
0.440742
0.651256
0.888073
The table shows that all VIF value is smaller than 10, thus,
multicollinearity does not is occur in this model.
We can draw a conclusion from 2 methods above that
multicollinearity not too worrisome a problem for this set of
data.
2. Heteroskedasticity
Another problem that our model can suffer from when being
examined is heteroskedasticity. Heteroskedasticity may result in
the situation that some least squared estimators are still
unbiased but are no longer effective, along with that, estimators
14
Group 10
Econometrics Report
of variances will become biased, thus lead to the reduction in
effectiveness of our model.
When the assumption of variance of each error term Ui is
unchanged when i moves from 1, 2 to n. It can also be rewritten
as:
Var (Ui) = Var (Uj)
i=1,2,3,…,n
j=1,2,3,…,n
When that assumption is violated, heteroskedasticity appears
Causes
o
Essence of economic phenomena: If economic phenomena
is examined on subjects having difference in scale or they
are examined under periods of time that are not similar in
fluctuation level.
o
Model’s function is wrongly formatted, maybe because
appropriate variables are missing or function analysis is
false.
o
cannot fully and correctly reflect the essence of
economic phenomena. For example, external observations
appear. Bringing in or eliminate these observations does
great impact on regression analysis.
o
Error tends to decrease as data collecting, conserving
and processing techniques are improved
o
Behaviors in the past are learnt.
Hypothesis:
Using the command estat hettest in STATA:
15
Group 10
Econometrics Report
BreuschPagan / CookWeisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of price
chi2(1) = 26.56
Prob > chi2 = 0.0000
We can see that Prob > chi2 = 0.0000 < 0.05 => We reject H0,
accept H1
We can conclude that heteroskedasticity does occur in this
model
Correcting heteroskedasticity
We use command:
reg price crime nox rooms dist proptax, robust
we have the result
Number of
obs
F(
Robust
price
Coef.
crime
150.0703
nox
1737.66
rooms
7707.327
dist
791.2588
proptax 89.95717
_cons
9060.303
Figure 8
Std. Err.
30.45247
389.6642
670.6304
175.744
26.84788
5398.964
t
4.93
4.46
11.49
4.5
3.35
1.68
=
506
103.2
500)
Prob > F
=
=
2
0
0.588
Rsquared
=
3
5937.
Root MSE
=
9
P>t
0
0
0
0
0.001
0.094
5,
[95% Conf.
209.9009
2503.241
6389.726
1136.546
142.7057
19667.75
Interval]
90.23976
972.0787
9024.928
445.9712
37.20862
1547.148
Note that comparing the results with the earlier regression,
none of the coefficient estimates changed, but the standard
16
Group 10
Econometrics Report
errors and hence the t values are different, which gives
reasonably more accurate p values.
VIII. HYPOTHESES POSTULATED
1. The t test
Hypothesis:
c(500)0.025 = 1.965 < |ts | => Reject
Conclusion: Number of crimes committed per capita
has
statistically signifincant effect on median housing price. Higher
number of crimes commited per capita, lower median housing price
Hypothesis:
c(500)0.025 = 1.965 < |ts | => Reject
Conclusion: nitrogen oxide concentrator per 100m has
statistically signifincant effect on median housing price. Higher
nitrogen oxide concentrator per 100m, lower median housing price.
Hypothesis:
c(500)0.025 = 1.965 < |ts | => Reject
Conclusion: The average number of rooms has statistically
signifincant effect on median housing price, higher average
number of rooms, higher median housing price.
Hypothesis:
4.5
17
Group 10
Econometrics Report
c(500)0.025 = 1.965 < |ts | => Reject
Conclusion weight distance to 5 employ centers has statistically
signifincant effect on median housing price, higher weight
distance to 5 employ centers, lower median housing price.
Hypothesis:
c(500)0.025 = 1.965 < |ts | => Reject
Conclusion Property tax per $1000 has statistically signifincant
effect on median housing price, higher property tax per $1000,
lower median housing price.
2. Confidence Intervals
Test the following hypothesis:
Variable
Const
X1
X2
X3
X4
Coefficient
Significant
Level
5%
5%
5%
5%
5%
Confidence Interval
(19667.75 ; 1547.148)
(209.9009 ; 90.23976)
(2503.241 ; 972.0787)
(6389.726 ; 9024.928)
(1136.546 ;445.9712)
18
Group 10
X5
Econometrics Report
5%
(142.7057 ; 37.20862)
Figure 9
We can see that for all coefficients, 0 doesn’t belong to the
confidence interval, so we reject the hypotheses H0: , , , ,
Conclusion: Number of crimes committed per capita, nitrogen oxide
concentrator per 100m, the average number of rooms, weight
distance to 5 employ centers and property tax per $1000 all have
statistically signifincant effect on median housing price with
the confidence level of 95%.
3. P Value
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
Number of crimes committed per capita has statistically
signifincant effect on median housing price. Higher number of
crimes commited per capita, lower median housing price.
In particular, with the sample we have, the estimated result
shows that one more crime committed decreases median housing
price by 150.07$, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
Nitrogen oxide concentrator per 100m has statistically
signifincant effect on median housing price. Higher nitrogen
oxide concentrator per 100m, lower median housing price
In particular, with the sample we have, the estimated result
shows that one more unit in nitrogen oxide concentrator per 100m
decreases median housing price by 1737.66$, holding other factors
fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
The average number of rooms has statistically signifincant effect
on median housing price, higher average number of rooms, higher
median housing price.
19
Group 10
Econometrics Report
In particular, with the sample we have, the estimated result
shows that one more room added in the house increases median
housing price by 7707.33 $, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
Weight distance to 5 employ centers
has statistically
signifincant effect on median housing price, higher weight
distance to 5 employ centers, lower median housing price.
In particular, with the sample we have, the estimated result
shows that one more unit increased in weight distance to 5 employ
centers decreases median housing price by 791.25$, holding other
factors fixed.
Hypothesis testing:
Pvalue = 0.0008 < α = 0.05 => Reject H0
Property tax per $1000 has statistically signifincant effect on
median housing price, higher property tax per $1000, lower median
housing price.
In particular, with the sample we have, the estimated result
shows that one more $ increased in property tax per 1000$
decreases median housing price by 89.96 $, holding other factors
fixed.
4. Testing the overall significance: The F test
This test is to examine if the parameters of the independent
variable βi at the same time can be zero.
The hypothesis is as follows:
= 142.92 >
As a result, there is enough evidence to reject the null
hypothesis and conclude that at least one independent variable in
20
Group 10
Econometrics Report
the subset does have explanatory or predictive power on price, so
we don’t reduce the model by dropping out this subset.
IX. RESULT ANALYSIS AND POLICY IMPLICATION
From data analysis in previous sections, we have gained an
overall view of data set given in term of the satistical
relationship between housing prices and each of the factors
proposed. As mentioned at the beginning of this report, we aim to
learn how security of the neighborhood, the air pollution, the
size of house, accessibility and the property tax are associated
with housing price. In other words, we are concerned about what
is the willingness of buyers to pay for these components.
Following the analysis of data, regression model run and
hypothesis testing, it can be concluded that security of the
neighborhood, the air pollution, the size of house, accessibility
and the property tax statistically affect the housing prices.
Therefore, tenants, investors or constructors should take all of
these ingredients into account when making deals.
X. CONCLUSION
This report is completed on the dedicated contribution of
each member and the knowledge from our study in Econometrics.
This research has provided us with a good opportunity to practice
what we have learned and to get a deeper understanding of data
analysis and relevant testing. From this useful application, we
hope that our research can somehow suggest the relationship
between the housing prices and some other factors.
Again, due to the limitation of understanding and resources, our
report may contain misinterpretations. We hope that teacher and
readers can give us constructive comments on the report so that
we would improve ourselves and do better in the future.
XI.
REFERENCES
21
Group 10
Econometrics Report
/>Feb_2011.pdf
1. />2. />doi=10.1.1.926.5532&rep=rep1&type=pdf
3. D.A. Belsey, E. Kuh, and R. Welsch, Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity, New
York: Wiley (1990).
22