Tải bản đầy đủ (.doc) (17 trang)

tiểu luận kinh tế lượng factors that determine housing prices

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (549.92 KB, 17 trang )

FOREIGN TRADE UNIVERSITY
FACULTY OF INTERNATIONAL ECONOMICS
-----------------o0o-----------------

ECONOMETRIC REPORT

Topic: Factors that determine housing prices

Class
: KTEE 309.1
Group No.
: 7
Student Name – ID : Nguyen Ha Trang
- 1711150066 – 40%
Nguyen Mai Thuy Tien - 1711150064 – 30%
Nguyen Thi Lan Huong - 1715150032 – 30%
Supervisor
: Dr. Dinh Thanh Binh

Hanoi, 2018


Group 7

Econometric Report

Table of Contents
II. Introduction............................................................................................................ 3
III. Literature overview.............................................................................................. 3
1. Questions of interest...........................................................................................................3
2. Procedure and program used..............................................................................................3



IV. Economic model.................................................................................................... 4
1. Specifying the object for modeling....................................................................................4
2. Defining the target for modeling by the choice of the variables to analyze, denoted xi  4

3. Embedding that target in a general unrestricted model (GUM).........................................4

V. Econometric model................................................................................................. 5
VI. Data collection....................................................................................................... 5
1. Data overview.................................................................................................................... 5
2. Data description..................................................................................................................5

VII. Estimation of econometric model....................................................................... 6
1. Checking the correlation among variables.........................................................................6
2. Regression run....................................................................................................................8

VIII. Check multicollinearity and heteroscedasticity............................................... 9
1. Multicollinearity.................................................................................................................9
2. Heteroskedasticity............................................................................................................ 10

IX. Hypothesis postulated........................................................................................ 12
1. The impact of neighborhood factors................................................................................ 12
2. The impact of accessibility factors...................................................................................13

X. Result analysis & Policy implication................................................................... 14
XI. Conclusion........................................................................................................... 15
XII. References.......................................................................................................... 16
Exhibit 1: Definition of variables in the Housing Price model......................................................... 4
Exhibit 2: Statistic indicators of variables in the Housing Price model........................................... 5
Exhibit 3: Correlation matrix............................................................................................................. 6

Exhibit 4: Scatterplot of variables in the Housing Price model....................................................... 7
Exhibit 5: Regression model............................................................................................................... 8
Exhibit 6: Multicollinearity test.......................................................................................................... 9
Exhibit 7: Heteroskedasticity test..................................................................................................... 10
Exhibit 8: Residual-versus-fitted plot of the Housing Price model................................................ 11
Exhibit 9: Correcting heteroskedasticity......................................................................................... 11
Exhibit 10: Hypothesis testing of multiple regression model of neighborhood factors................12
Exhibit 11: Hypothesis testing of multiple regression model of accessibility factors....................13

2


Group 7

Econometric Report

I. Introduction
As much as Economy is a meaningful science that determines the social development in
general and national growth in particular, Econometrics is the use of statistical techniques
to understand those issues and test theories. Without evidence, economic theories are
abstract and might have no bearing on reality (even if they are completely rigorous).
Econometrics is a set of tools we can use to confront theory with real-world data.
Given the data set, our group, which includes three members: Nguyen Ha Trang, Nguyen Mai
Thuy Tien, and Nguyen Thi Lan Huong, follows the methodology of econometric comprising
eight steps to analyze the data. Note that because of the lack of information on the data set, all
inferences of abbreviations and others are based on assumptions and self-research. As a
result, we hope to have shown clearly our logic and reasoning of analysis.
To the extent of purpose and resources, there are still deficiencies in this report, but we look
forward to providing readers with a decent view of the overall of the data set given and the
knowledge that we have gained through Dr. Dinh Thanh Binh’s Econometrics course.


II. Literature overview
1. Questions of interest
“Why do housing prices differ among locations and regions?” – this is the basic question to
which this report targets to find the answer. Although there is a variety of factors that might
affect housing prices, they are divided into four main categories: structure, neighborhood,
accessibility, and air pollution. Consequently, elements that represent each of these categories
are taken into account to find out whether they do, or at least statistically do have an impact
on housing prices.
In following parts, models are going to be built, data are going to be used in order to run the
regression model and then the results are going to be analyzed to finally answer the question
of interest above.
2. Procedure and program used
 Procedure
Step 1: Questions of interest
Step 2: Economic model
Step 3: Econometric model
Step 4: Data collection
Step 5: Estimation of econometric model
Step 6: Check multicollinearity and heteroscedasticity
Step 7: Hypothesis postulated
Step 8: Result analysis & Policy implication
 Stata program is primarily used to analyze the data and run the regression.

3


Group 7

Econometric Report


III. Economic model
As data are provided up front, the economic model used in this report is an empirical one.
Note that the fundamental model is mathematical; with an empirical model, however, data is
gathered for the variables and using accepted statistical techniques, the data are used to
provide estimates of the model's values.
Empirical model discovery and theory evaluation are suggested to involve five key steps, but
for the limitation of purpose and resources, this part of the report only follows three of them:
(1) specifying the object for modeling, (2) defining the target for modeling, (3) embedding
that target in a general unrestricted model.
1. Specifying the object for modeling

price  f x

(1)

As such, this report finds the relationship between housing price, which is the object for
modeling, and each of relating factors including structure, neighborhood, accessibility, and air
pollution ones.
2. Defining the target for modeling by the choice of the variables to analyze, denoted

x i 
As mentioned above, there are four main categories that are expected to affect housing prices:
structure, neighborhood, accessibility, and air pollution. Hence, the choices of xi  would be
such variables that constitute them. After thorough research, factors have been narrowed
down to eight significant ones: (structure) number of rooms, (neighborhood) crimes, property
tax, the percentage of people of low status, student-teacher ratio, (accessibility) distances to
employment centers, accessibility to radial highways and (air pollution) nitrous oxide.
3. Embedding that target in a general unrestricted model (GUM)
In its simplest acceptable representation (which will later be specified in the econometric

model), the GUM of is determined to be:

lprice  f crim, nox , rooms , dist , radial , proptax , stratio, lowstat 
A brief description of each variable is given in Exhibit 1.
Exhibit 1: Definition of variables in the Housing Price model
Variable
lprice
crime
nox
rooms
dist
radial
proptax
stratio
lowstat

Definition
logarithm of median housing price, $
crimes committed per capita
nitrous oxide, parts per 100 million square
average number of rooms per house
weighted distances to 5 employment centers
accessibility index to radial highways
property tax per $1000
average student-teacher ratio
percentage of people of low status

4



Group 7

Econometric Report

IV. Econometric model
To demonstrate the relationship between housing price and other factors, the regression
function can be constructed as follows:
 (PRF):
i
lprice   o   crime   nox   rooms   dist   radial   6 proptax   stratio   lowstat  
1

2

3

4

5

7

2

3

4

5


7

8

 (SRF):
i
lprice   o   crime   nox   rooms   dist   radial   6 proptax   stratio   lowstat  
where:

1

8

 0 is the intercept of the regression model
i is the slope coefficient of the independent variable xi
 is the disturbance of the regression model
 0 is the estimator of 0
 is the estimator of i
i

 is the residual (the estimator of i )
i

From this model, this report is interested in explaining lprice in terms of each of the eight
independent variables ( crim, nox , rooms , dist , radio , proptax , stratio ).

V. Data collection
1. Data overview
 This set of data is a secondary one, as they are collected from a given source.
 Data source: Regression Diagnostics: Identifying Influential Data and Sources of

Collinearity, by D.A. Belsey, E. Kuh, and R. Welsch, 1990. New York: Wiley
 The structure of Economic data: cross-sectional data
2. Data description
To get statistic indicators of the variables, in Stata, the following command is used:
sum lprice crime nox rooms dist radial proptax stratio lowstat

The result is shown in Exhibit 2.
Exhibit 2: Statistic indicators of variables in the Housing Price model
Variable

Obs

Mean

lprice

506

9.941057

crime
nox
rooms
dist

506
506
506
506


3.611536
5.549783
6.284051
3.795751

radial

506

proptax
stratio
lowstat

506
506
506

Min

Max

.4092549

8.517193

10.8198

8.590247
1.158395
.7025938

2.106137

.006
3.85
3.56
1.13

88.976
8.71
8.78
12.13

9.549407

8.707259

1

24

40.82372
18.45929
12.70148

16.85371
2.16582
7.238066

18.7
12.6

1.73

71.1
22
39.07

5

Std. Dev.


Group 7

Econometric Report

where:
Obs is the number of observations
Std. Dev is the standard deviation of the variable
Min is the minimum value of the variable
Max is the maximum value of the variable

VI. Estimation of econometric model
1. Checking the correlation among variables
First of all, the correlation of lprice and nox, rooms, dist, radial, proptax, stratio, lowstat is
checked by calculating the correlation coefficient among these variables. The correlation
coefficient r measures the strength and direction of a linear relationship between two variables
on a scatterplot. In Stata, the correlation matrix is generated with the command:
corr lprice crime nox rooms dist radial proptax stratio lowstat

The result is shown in Exhibit 3.

Exhibit 3: Correlation matrix
lprice

crime

lprice

nox

rooms

dist

radial

proptax

stratio lowstat

1.0000
0.2054
-0.2098
-0.2921
-0.3540
-0.6096

1.0000
-0.4951
-0.5344
-0.2293

-0.4956

1.0000
0.9102
0.4642
0.4760

1.0000
0.4542
0.5276

1.0000
0.3654

1.0000

crime
nox
rooms
dist
radial
proptax
stratio
lowstat

-0.5275
-0.5088
0.6329
0.3420
-0.4810

-0.5597
-0.4976
-0.7914

1.0000
0.4212
-0.2188
-0.3799
0.6254
0.5828
0.2887
0.4470

1.0000
-0.3028
-0.7702
0.6103
0.6670
0.1869
0.5856

From the matrix, it can be inferred that the correlation between lprice and each of the
independent variable is decent enough to run the regression model. Specifically:
- lprice and crime have a moderate downhill relationship
- lprice and nox have a moderate downhill relationship
- lprice and nox have a moderate uphill relationship
- lprice and dist have a weak uphill relationship
- lprice and radial have a moderate downhill relationship
- lprice and proptax have a moderate downhill relationship
- lprice and proptax have a moderate downhill relationship

- lprice and proptax have a strong downhill relationship
The correlation between each pair of variables can be visualized using the scatter
command in Stata.
The result is shown in Exhibit 4.

6

1.0000


Group 7
Exhibit 4: Scatterplot of variables in the Housing Price model

7

Econometric Report


Group 7

Econometric Report

2. Regression run
Having checked the required condition of correlation among variables, the regression model
is ready to run. In Stata, this is done by using the command:
reg lprice crime nox rooms dist radial proptax stratio lowstat

The result is shown in Exhibit 5.
Exhibit 5: Regression model
Source


SS

Model
Residual

64.8618936
19.7203314

Total
lprice

df
8
497

MS

P>|t|

Number of obs
F(
8,
497)
Prob > F
R-squared
Adj R-squared
Root MSE
[95% Conf.


8.1077367
.039678735

84.582225
505 .167489554
Coef. Std. Err.
t

506
204.33
0.0000
0.7669
0.7631
.1992
Interval]
=
=
=
=
=
=

crime

-.0111825

.0013614

-8.21


0.000

-.0138573

-.0085078

nox
rooms
dist
radial
proptax
stratio
lowstat
_cons

-.0754564
.0996545
-.0463708
.0133694
-.0062133
-.0413327
-.0280384
11.19507

.0146936
.0167697
.0067557
.0026525
.0013807
.0050633

.0019154
.2037294

-5.14
5.94
-6.86
5.04
-4.50
-8.16
-14.64
54.95

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

-.1043256
.0667061
-.0596441
.008158
-.008926
-.0512807
-.0318016
10.79479


-.0465873
.1326028
-.0330975
.0185808
-.0035006
-.0313846
-.0242752
11.59535

From the result, it can be inferred that
 crime, nox, rooms, dist, radial, proptax, stratio and lowstat all have statistically significant
effects on lprice at the 5% significant level (as all p-values are smaller than 0.05). In
particular, those effects can be specified by the regression coefficients as follows:

-

 0

 11.1951

: When all the independent variables are zero, the expected value of housing

price is 1011.1951 .
-  0.0112
: When the number of crime committed per capita increases by one, the
1
expected value of housing price decreases by 1.12%.

-  2 0.0755


: When nitrous oxide increases by one part per 100 million square, the of
expected value housing price decreases by 7.55%.

 0.0997

- 3
: When the number of rooms increases by one, the expected value of
housing price decreases by 9.97%.

 0.0464

- 4
: When the distance to 5 employment centers increases by one unit, the
expected value of housing price decreases by 4.64%.

8


Group 7

Econometric Report

-  5  0.013 : When the accessibility index to radial highways increases by one unit, the
expected value of housing price increases by 4.64%.
-  0.0062 : When the property tax per $1000 increases by $1, the expected value of
6

housing price decreases by 0.62%.
-  7 0.0413


: When the student-teacher ratio increases by 1%, the expected value of

housing price decreases by 4.13%.
-  0.028
8

: When the percentage of people of lower status increases by 1%, the

expected value of housing price decreases by 2.80%.
 The coefficient of determination R  squared  0.7669 : all independent variables (crime,
nox, rooms, dist, radial, proptax, stratio, lowstat) jointly explain 76.69% of the variation
in the dependent variable (lprice); other factors that are not mentioned explain the
remaining 23.31% of the variation in the lprice.
 Other indicators:
Adjusted coefficient of determination adj R-squared = 0.7631
Total Sum of Squares TSS = 84.5822
Explained Sum of Squares ESS = 64.8619
Residual Sum of Squares RSS = 19.7203
-

-

-

-

The degree of freedom of Model Dfm= 8
The degree of freedom of residual Dfr = 497

 Based on the data collected from the table, the sample regression function is established:


SRF : lprice  11.2  0.01crime  0.08nox  0.1rooms  0.05dist  0.01radial 
0.01proptax 0.03stratio  0.03lowstat  

VII. Check multicollinearity and heteroscedasticity
1. Multicollinearity
Multicollinearity is the high degree of correlation amongst the explanatory variables, which
may make it difficult to separate out the effects of the individual regressors, standard errors
may be overestimated and t-value depressed. The problem of Multicollinearity can be
detected by examining the correlation matrix of regressors and carry out auxiliary regressions
amongst them. In Stata, the vif command is used, which stand for variance inflation factor.
Exhibit 6 shows the result.
Exhibit 6: Multicollinearity test
Variable

VIF

1/VIF

proptax

6.89

0.145103

radial
nox
dist
lowstat
rooms

crime
stratio

6.79
3.69
2.58
2.45
1.77
1.74
1.53

0.147301
0.271206
0.388106
0.408804
0.565985
0.574531
0.653369

Mean VIF

3.43

9


Group 7

Econometric Report


The value of VIF here is lower than 10, indicating that Multicollinearity is not too worrisome
a problem for this set of data.
2. Heteroskedasticity
Heteroskedasticity indicates that the variance of the error term is not constant, which makes
the least squares results no longer efficient and t tests and F tests results may be misleading.
The problem of Heteroskedasticity can be detected by plotting the residuals against each of
the regressors, most popularly the White’s test. It can be remedied by respecifying the model
– look for other missing variables. In Stata, the imtest white command is used, which
stands for information matric test.
Exhibit 7 shows the result.
Exhibit 7: Heteroskedasticity test

. imtest, white
White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity
chi2(44)
Prob > chi2

=
=

235.31
0.0000

Cameron & Trivedi's decomposition of IM-test

Source

chi2


df

p

Heteroskedasticity

235.31

44

0.0000

Skewness
Kurtosis

34.20
12.38

8
1

0.0000
0.0004

Total

281.89

53


0.0000

At the 5% significance level, there is enough evidence to reject the null hypothesis and
conclude that this set of data meets the problem of Heteroskedasticity.
Another way to test if Heteroskedasticity exists is to graph the residual-versus-fitted plot,
which can be generated using the rvfplot, yline (0) line command in Stata.
The result is shown in Exhibit 8.

10


Group 7

Econometric Report

Exhibit 8: Residual-versus-fitted plot of the Housing Price model

In a well-fitted model, there should be no pattern to the residuals plotted against the fitted
values - something not true of our model. Ignoring the outliers at the top center of the graph,
we see curvature in the pattern of the residuals, suggesting a violation of the assumption that
price is linear in our independent variables. We might also have seen increasing or decreasing
variation in the residuals— heteroskedasticity.
To fix the problem, robust standard errors are used to relax the assumption that errors are both
independent and identically distributed. In Stata, regression is rerun with the robust option,
using the command:
reg lprice crime nox rooms dist radial proptax stratio lowstat, robust

Exhibit 9 shows the result.
Exhibit 9: Correcting heteroskedasticity
Linear regression

F(
Prob > F
R-squared
Root MSE

Number of obs =
8,
497) =
=
=
=

506
179.01
0.0000
0.7669
.1992

Robust
lprice

Coef.

Std. Err.

t

crime

-.0111825


.0019035

-5.87

0.000

-.0149225

-.0074426

nox
rooms
dist
radial
proptax
stratio
lowstat
_cons

-.0754564
.0996545
-.0463708
.0133694
-.0062133
-.0413327
-.0280384
11.19507

.0150626

.025796
.0068001
.0029003
.0013641
.0042322
.003584
.2672806

-5.01
3.86
-6.82
4.61
-4.55
-9.77
-7.82
41.89

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

-.1050506
.0489718
-.0597312
.0076711

-.0088935
-.0496478
-.0350801
10.66993

-.0458622
.1503372
-.0330103
.0190677
-.0035331
-.0330175
-.0209967
11.72021

11

P>|t|

[95% Conf. Interval]


Group 7

Econometric Report

Note that comparing the results with the earlier regression, none of the coefficient estimates
changed, but the standard errors and hence the t values are different, which gives
reasonably more accurate p values.

VIII. Hypothesis postulated

1. The impact of neighborhood factors
The question of interest: In the multiple regression model:
lprice   o   crime   nox   rooms   dist   radial   6 proptax   stratio   lowstat  
1

2

3

4

5

7

(full model)

8

Does the subset of independent variables (crime, proptax, lowstat, stratio) contribute to
explaining/ predicting lprice? Or, would it do just as well if these variables were dropped and we reduced
the model to

lprice   o  nox   rooms   dist   radial 

2

3

4


(reduced model).

5

From this question, the following hypothesis is postulated:
Null Hypothesis:

The initial assumption is that the subset does not contribute to
the model's explanatory power
Alternative Hypothesis: At least one of the independent variables in the subset is useful
in explaining/predicting lprice

 o :  
which is expressed as: H

1

 H 1 : at least one

6

0

7

j

0


In Stata, the test statistic F is calculated using the command:
test crime proptax lowstat stratio

The result is shown in Exhibit 10.
Exhibit 10: Hypothesis testing of multiple regression model of neighborhood factors

(
(
(
(

1)
2)
3)
4)

crime =
proptax
lowstat
stratio
F(
4,

0
= 0
= 0
= 0
497) =

Prob > F =


112.88
0.0000

As a result, there is enough evidence to reject the null hypothesis and conclude that at least
one independent variable in the subset (crime, proptax, stratio, lowstat) does have
explanatory or predictive power on lprice, so we don’t reduce the model by dropping out
this subset.


12


Group 7

Econometric Report

2. The impact of accessibility factors
The question of interest: In the multiple regression model:
lprice   o   crime   nox   rooms   dist   radial   6 proptax   stratio   lowstat  
1

2

3

4

5


7

(full model)

8

Does the subset of independent variables (dist, radial) contribute to explaining/ predicting
lprice? Or, would it do just as well if these variables were dropped and we reduced the
model to
lprice   o   crime   nox   rooms   6 proptax   stratio   lowstat  
1

2

3

7

(reduced model).

8

From this question, the following hypothesis is postulated:
Null Hypothesis:

The initial assumption is that the subset does not contribute
to the model's explanatory power
Alternative Hypothesis: At least one of the independent variables in the subset is
useful in explaining/predicting lprice
which is expressed as: 

H

o

:  4  5  0


j
H :
 1 at least one   0

In Stata, the test statistic F is calculated using the command:
test dist radial

The result is shown in Exhibit 11.
Exhibit 11: Hypothesis testing of multiple regression model of accessibility factors

( 1) dist = 0
( 2) radial = 0
F(
2,
497)

=

34.91

Prob > F

=


0.0000

As a result, there is enough evidence to reject the null hypothesis and conclude that at least
one independent variable in the subset (dist, radial) does have explanatory or predictive
power on lprice, so we don’t reduce the model by dropping out this subset.

13


Group 7

Econometric Report

IX. Result analysis & Policy implication
From data analysis in preceding sections, we have gained an overall view of the data set
given in terms of the statistical proof of the relationship between housing prices and each of
the factors proposed. As mentioned at the beginning of this report, we aim to learn how
structure, neighborhood, accessibility, and air pollution features are associated with housing
price. In other words, we are concerned about what is the willingness of buyers to pay for
these components.
Following the analysis of data, regression model run and hypothesis testing, it can be
concluded that structure, neighborhood, accessibility, and air pollution factors do affect, or at
least statistically so, the housing prices. Therefore, tenants, investors or constructors should
take all of these ingredients into account when making deals.

14


Group 7


Econometric Report

X. Conclusion
This report is completed on the dedicated contribution of each member and the knowledge
from our study in Econometrics. This also provides us with a good opportunity to practice
what we have learned and to get a deeper understanding of data analysis and relevant testing.
From this useful application, we hope that our work can somehow suggest the relationship
between the housing prices and structure, neighborhood, accessibility, air pollution factors.
Again, due to the limitation of understanding and resources, our report may contain
misinterpretations. We hope that Dr. Le Thanh Binh and readers can give us constructive
comments on the report so that we would improve ourselves and do better in the future.
Sincerely,
Group 7

15


Group 7

Econometric Report

XI. References
1.
2.
3.
4.

/> /> />D.A. Belsey, E. Kuh, and R. Welsch, Regression Diagnostics: Identifying Influential Data
and Sources of Collinearity, New York: Wiley (1990).


16



×