tiểu luận kinh tế lượng tài chính factors that affect housing prices among location and region

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (221.02 KB, 22 trang )

Group 10
I.

Econometrics Report

INTRODUCTION..................................................................................................................3

1.

Overall about econometrics.............................................................................3

2.

Why choosing OLS?...................................................................................................4

II. QUESTION OF INTEREST...............................................................................................5
III.

ECONOMIC MODEL..........................................................................................................5

1.

Choosing the variables.......................................................................................5

2.

Embedding that target in a general unrestricted model (GUM)
8

IV. ECONOMETRICS MODEL....................................................................................................9
1.

Population regression function (PRF).....................................................9

2.

Sample regression function (SRF)...............................................................9

V.

DATA COLLECTION.........................................................................................................10

1.

Data overview..........................................................................................................10

2.

Data description...................................................................................................10

VI. ESTIMATION OF ECONOMETRIC MODEL...................................................................10
1.

Checking the correlation among variables:.......................................10

2.

Regression run........................................................................................................12

VII.

CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY.........................15

1.

Multicollinearity.................................................................................................15

2.

Heteroskedasticity..............................................................................................16

VIII.

HYPOTHESES POSTULATED...................................................................................19

1.

The t test.................................................................................................................19

2.

Confidence Intervals.........................................................................................21

3.

P Value......................................................................................................................22

4.

Testing the overall significance: The F test................................23

IX. RESULT ANALYSIS AND POLICY IMPLICATION..................................................24
X.

CONCLUSION.....................................................................................................................24

XI. REFERENCES.....................................................................................................................25
Y
Figure
Figure
Figure
Figure

1........................................................7
2........................................................9
3.......................................................10
4.......................................................11
1

Group 10

Econometrics Report

Figure
Figure
Figure
Figure
Figure

5.......................................................13

6.......................................................15
7.......................................................16
8.......................................................18
9.......................................................21

2

Group 10

Econometrics Report

I. INTRODUCTION
1.Overall about econometrics
Econometrics is the application of statistical methods to
economic data and is described as the branch of economics that
aims to give empirical content to economic relations. Precisely
speaking, it is the quantitative analysis of actual economic
problems, based on the concurrent development of theory and
observation, related by appropriate methods of inference. It is
understandable that economist make comparison econometrics is
like an effective tool to convert mountains of data into extract
simple relationships.
The reason why econometrics is effective is economics theory
use statistical theory and mathematical statistics to evaluate
and develop econometrics method. In reality, econometrics help
economists to assess economic theories, developing econometrics
model, analyzing and forecasting the economic history.
Aware of the importance of econometrics to economic phenomena,
our group decides to carry out a research of econometrics: “The

factors that have influence on median housing price” and aim to
analyze statistic and point out differences and their reason of
price level.
The data set has 506 observations with 12 variables in total.
We choose 6 variables: price, crime, nox, rooms, dist and proptax
to do the research in which price is dependent variable and the
other five are independent variables. The general method used in
this research is OLS (ordinary least squares). In addition, the
specialized method is estimate, running Stata software as well.
3

Group 10

Econometrics Report

During carrying out this research, our group is so lucky to
be guided thoroughly by Dr. Dinh Thi Thanh Binh. We are grateful
for everything you have taught us!
This is the first time our group carry out an econometrics
research, our performance is unavoidable to have many mistakes.
It would be a pleasure if we can receive the feedback from you to
better ourselves next time.

2.Why choosing OLS?
Ordinary least squares (OLS) is a type of linear least
squares method for estimating the unknown parameters in a linear
regression model. OLS chooses the parameters of a linear
function of a set of explanatory variables by the principle
of least squares: minimizing the sum of the squares of the

differences between the observed dependent variable in the
given dataset and those predicted by the linear function.
With the six selected variables, we use the OLS model because
all regressions variable are exogenous variables, the effects of
independent variables on the dependent variable are linear
effects. In addition, the estimates calculated by means of the
least squares OLS are linear estimates that are not deviate and
are better than others.
When using OLS, we have some basic assumptions:
1.
2.

The regression model is linear in the parameters
X values are fixed in repeated sampling, which means Xi

and ui are uncorrelated
3.
Zero mean value of disturbance (E(ui)) =0)
4.
Homoscedasticity or equal variance of ui : var(ui) =
5.
No correlation between disturbances
6.
The model is correctly specified.
4

Group 10
7.

II.

Econometrics Report
Number of observations must be greater than the number

of parameters to be estimated.
8.
X values in a given sample must not be the same.
9.
No perfect multicollinearity.
10. Normal distribution.
QUESTION OF INTEREST
We have always been wondering “Why do housing prices among

locations and regions differ so much?”. Housing prices are
affected by many different factors such as structure,
neighborhood, accessibility, air pollution and so on. To seek the
answer to that question, our group is going to use the collected
data to build and run the regression model and then the results
are going to be analyzed to finally answer the question of
interest above.

III. ECONOMIC MODEL
According the provided data, the economic model used in this
report is an empirical one. Note that the fundamental model is
mathematical; with an empirical model, however, data is gathered
for the variables and using accepted statistical techniques, the
data are used to provide estimates of the model's values.
1. Choosing the variables
Having described data via the command “des” in file… from

Stata software, we gain the result as following:
. des
obs:           506
vars:            12                            31 Oct 1996
16:37
size:        22,770

5

Group 10

Econometrics Report


Variable name

storag

display

valu

e type

format

e

variable label

median housing

price

float

%9.0g

price, $
crimes committed

crime

float

%9.0g

per capita
nit ox concen;

nox

float

%9.0g

parts per 100m
avg number of

rooms

float

%9.0g

rooms
wght dist to 5

dist

float

%9.0g

employ centers
access. index to

radial

byte

%9.0g

rad. hghwys
property tax per

proptax

float

%9.0g

$1000

6

Group 10

Econometrics Report
average student

stratio

float

%9.0g

teacher ratio
perc of people

lowstat
lprice
lnox
lproptax
Figure 1

float
float
float
float

%9.0g
%9.0g
%9.0g
%9.0g

'lower status'
log(price)

log(nox)
log(proptax)

The above table reveal that this is the statistic of factors
which have influence in housing price via 506 observations. After
discussing carefully, our group jumped into a conclusion to
choose a dependent variable Y: Price, independent variable
contains:






X1crime
X2nox
X3rooms
X4dist
X5proptax

2. Embedding that target in a general unrestricted model
(GUM)
In its simplest acceptable representation (which will later be
specified in the econometric model), the GUM of is determined to
be:

A brief description of each variable is given in Figure 1.

7

Group 10

Econometrics Report

Name
Dependent
Variable (Y)

Independent
Variables (X)

Meaning

Expected

Price

Median housing price

sign
+

Crime

Number of crimes

Nox

committed per capita
The amount of nitrogen

oxide concentrator parts
Rooms

in the air per 100m
The average number of

+

Dist

rooms
Weight distance to 5

Proptax

employ centers
Property tax per $1000

Figure 2
IV. ECONOMETRICS MODEL

1. Population regression function (PRF)
PRF:
2. Sample regression function (SRF)
SRF:

where:
 0 is the intercept of the regression model

 i is the slope coefficient of the independent variable xi
 is the disturbance of the regression model

is the estimator of  0
is the estimator of i

is the residual (the estimator of i )
8

Group 10

Econometrics Report

V. DATA COLLECTION
1. Data overview
 This set of data is collected from a given source, therefore
it is a secondary one.

 The structure of Economic data: crosssectional data
2. Data description
To get statistic indicators of the variables, in Stata, the
following command is used:
. sum
Variab

Std.

le

Obs

Mean
22511.

Dev.
9208.85

Min

Max

price

506

51
3.6115

6
8.59024

5000
0.00

50001
88.97

crime

506

36
5.5497

7
1.15839

6

6

nox

506

83
6.2840

5
0.70259

3.85

8.71

rooms

506

51
3.7957

38
2.10613

3.56

8.78

dist
propta

506

51
40.823

7

16.8537

1.13

12.13

x

506

72

1

18.7

71.1

Figure 3
where:
Obs is the number of observations
Std. Dev is the standard deviation of the variable
Min is the minimum value of the variable

Max is the maximum value of the variable
VI.

ESTIMATION OF ECONOMETRIC MODEL
1. Checking the correlation among variables:

9

Group 10

Econometrics Report

price

price
crime

crime

nox

rooms

dist

proptax

1

1

1

1
0.3879

nox

0.426

0.4212

rooms

0.6958

0.2188

0.3028

dist

0.2493

0.3799

0.7702

0.2054

0.4671

0.5828

0.667

0.2921

proptax
Figure 4

1
0.5344

1

First and foremost, the correlation of Price and nox, crime,
rooms, dist, proptax is checked by calculating the correlation
coefficient among these variables. The correlation coefficient
measures the strength and direction of a linear relationship
between two variables on a scatterplot. In Stata, the correlation
with matrix is generated the command:
corr price crime nox rooms dist proptax
We can see from the matrix, it can be inferred that the
correlation between price and each of the independent variable is
decent enough to run the regression model. Specifically:

Correlation coefficient between price and crime is 0.3879

=> price and crime have a moderate relationship.

Correlation coefficient between price and nox is 0.426 =>
price and nox have a moderate relationship.

Correlation coefficient between price and rooms is 0.6958 =>
price and rooms have a moderate relationship.

Correlation coefficient between price and dist is 0.2493 =>
price and dist have a weak relationship.

Correlation coefficient between price and proptax is 0.4671
=> price and proptax have a moderate relationship.

10

Group 10

Econometrics Report

Independent variables including Rooms and Dist have
correlation coefficient larger than 0, which means they are in
directly relationship with dependent variable. The highest
coefficient is 0.6958 (between Rooms and Price) points out that
Rooms have the strongest impaction on Price. When rooms
increases, then price will increase much. On the other hands, the
correlation coefficient between Price and Dist is 0.2493. It
implies that they have not strong connection. Even if the Dist
increases, Price increases but not much.
In addition, all variables have correlation coefficient not
larger than 0.8 so this model does not have multicollinearity
problem.

2. Regression run
Having checked the required condition of correlation among
variables, the regression model is ready to run. In Stata, this
is done by using the command:
Reg price nox crime rooms dist proptax

Number
of obs
F( 5,

=

506

500)
Prob > F
R

=
=

142.92
0

Source
Model

SS
2.52E+10

df
5

MS
5.04E+09

Residual

1.76E+10

500

35258403.7

squared
Adj R

=

0.5883

Total

4.28E+10

505

84803032

squared

=

0.5842

11

Group 10

Econometrics Report
Root MSE
[95%

Std.
Err.
t
38.11571
410.7763
399.0772

P>t

=

5937.9

price
crime
nox
rooms

Coef.
150.0703
1737.66
7707.327

dist

791.2588 197.9444

4

0

1180.164

402.3535

proptax

89.95717 23.61555

3.81

0
0.02

136.3551

43.55923

_cons

9060.303 3978.871

2.28

3

16877.67

1242.937

3.94
4.23
19.31

Conf.
Interval]
0 224.957
75.18364
0 2544.72
930.5992
0 6923.252
8491.402

Figure 5
From table above we have Sample Regression Function:
Price = 9060.303 1737.66*nox + 7707.327*rooms
89.95717*proptax
From the result, it can be inferred that
crime, nox, rooms, dist, proptax all have statistically
significant effects on price at the 5% significant level (as all
pvalues are smaller than 0.05). In particular, those effects can
be specified by the regression coefficients as follows:

β0 = 9060.303
1 = 1737.66 means that if nit ox concen per 100m increases by
one , average housing price will decrease by 1737.66 in condition
other factors do not change.
2 = 150.0703 means that if crimes committed per capital
increases by one , average housing price will decrease by 150.0703
in condition other factors do not change.
3 = 7707.327 means that if average number of rooms increases by
one, average housing price will increase by 7707.327 in condition
other factors do not change.

12

Group 10

Econometrics Report

4 = 791.2588 means that if weight distance to 5 employ centers
increases 1 unit, average housing price will decrease by 791.2588
in condition other factors do not change.
5 = 89.95717 means that if average property tax per $1000
increases by one, average housing price will decrease by 89.95717
in condition other factors do not change.


The coefficient of determination Rsquared=0.5883: all

independent variables (crime, nox, rooms, dist, proptax,)
jointly explain 58.83% of the variation in the dependent

variable (price); other factors that are not mentioned
explain the remaining 41.17% of the variation in the price.
Other indicators:
Adjusted coefficient of determination adj Rsquared = 0.5842
Total Sum of Squares TSS = 4,28E+14
Explained Sum of Squares ESS = 2,52E+14
Residual Sum of Squares RSS = 1,76E+14
The degree of freedom of Model Dfm= 5
The degree of freedom of residual Dfr = 500


VII. CHECK MULLTICOLLINEARITY AND HETEROSCEDASTICITY
1. Multicollinearity

 Multicollinearity is the high degree of correlation amongst
the explanatory variables, which may make it difficult to
separate out the effects of the individual regressors,
standard errors may be overestimated and tvalue depressed.
 Detect multicollinearity
o
Method 1: Use cor command to examine multicollinearity
If independent variables are strongly correlated (r > 0.8),
multicollinearity may occur.

price
crime

price
1.0000
0.3879

crime

1.0000

nox

rooms

dist

proptax

13

Group 10
nox
rooms
dist
proptax
Figure 6

Econometrics Report
0.426
0.6958
0.2493
0.4671

0.4212
0.2188
0.3799
0.5828

1.0000
0.3028
0.7702
0.667

1.0000
0.2054
0.2921

1.0000
0.5344

1.0000

From the table above, we can easily see that correlating
coefficient among independent variables are pretty low and all
smaller than 0.8. As a result, we can conclude that
multicollinearity does not occur in this model.
o
Method 2: Use variance inflation factor (VIF)
If VIF > 10, multicollinearity occurs.

Variable
nox
dist
proptax
crime
rooms
Mean VIF
Figure 7

VIF
3.24
2.49
2.27
1.54
1.13
2.13

1/VIF
0.308352
0.401709
0.440742

0.651256
0.888073

The table shows that all VIF value is smaller than 10, thus,
multicollinearity does not is occur in this model.
We can draw a conclusion from 2 methods above that
multicollinearity not too worrisome a problem for this set of
data.
2. Heteroskedasticity
Another problem that our model can suffer from when being
examined is heteroskedasticity. Heteroskedasticity may result in
the situation that some least squared estimators are still
unbiased but are no longer effective, along with that, estimators
14

Group 10

Econometrics Report

of variances will become biased, thus lead to the reduction in
effectiveness of our model.
When the assumption of variance of each error term Ui is
unchanged when i moves from 1, 2 to n. It can also be rewritten
as:

Var (Ui) = Var (Uj)

i=1,2,3,…,n

j=1,2,3,…,n

When that assumption is violated, heteroskedasticity appears
 Causes
o
Essence of economic phenomena: If economic phenomena
is examined on subjects having difference in scale or they
are examined under periods of time that are not similar in
fluctuation level.
o

Model’s function is wrongly formatted, maybe because

appropriate variables are missing or function analysis is
false.
o

cannot fully and correctly reflect the essence of

economic phenomena. For example, external observations
appear. Bringing in or eliminate these observations does
great impact on regression analysis.
o

Error tends to decrease as data collecting, conserving

and processing techniques are improved
o

Behaviors in the past are learnt.

Hypothesis:
Using the command estat hettest in STATA:
15

Group 10

Econometrics Report

BreuschPagan / CookWeisberg test for heteroskedasticity
         Ho: Constant variance
         Variables: fitted values of price
         chi2(1)      =    26.56
         Prob > chi2  =   0.0000
We can see that Prob > chi2 = 0.0000 < 0.05 => We reject H0,
accept H1
We can conclude that heteroskedasticity does occur in this
model
Correcting heteroskedasticity
We use command:
reg price crime nox rooms dist proptax, robust
we have the result

Number of
obs
F(

Robust
price

Coef.

crime
150.0703
nox
1737.66
rooms
7707.327
dist
791.2588
proptax 89.95717
_cons
9060.303
Figure 8

Std. Err.

30.45247
389.6642
670.6304
175.744
26.84788
5398.964

t

4.93
4.46
11.49

4.5
3.35
1.68

=

506
103.2

500)
Prob > F

=
=

2
0
0.588

Rsquared

=

3
5937.

Root MSE

=

9

P>t

0
0
0
0
0.001
0.094

5,

[95% Conf.

209.9009
2503.241
6389.726
1136.546
142.7057
19667.75

Interval]

90.23976
972.0787
9024.928
445.9712
37.20862
1547.148

Note that comparing the results with the earlier regression,
none of the coefficient estimates changed, but the standard
16

Group 10

Econometrics Report

errors and hence the t values are different, which gives
reasonably more accurate p values.
VIII. HYPOTHESES POSTULATED
1. The t test
Hypothesis:

c(500)0.025 = 1.965 < |ts | => Reject
Conclusion: Number of crimes committed per capita

has

statistically signifincant effect on median housing price. Higher
number of crimes commited per capita, lower median housing price
Hypothesis:

c(500)0.025 = 1.965 < |ts | => Reject
Conclusion: nitrogen oxide concentrator per 100m has
statistically signifincant effect on median housing price. Higher
nitrogen oxide concentrator per 100m, lower median housing price.
Hypothesis:

c(500)0.025 = 1.965 < |ts | => Reject
Conclusion: The average number of rooms has statistically
signifincant effect on median housing price, higher average
number of rooms, higher median housing price.
Hypothesis:
4.5
17

Group 10

Econometrics Report

c(500)0.025 = 1.965 < |ts | => Reject
Conclusion weight distance to 5 employ centers has statistically
signifincant effect on median housing price, higher weight
distance to 5 employ centers, lower median housing price.

Hypothesis:

c(500)0.025 = 1.965 < |ts | => Reject
Conclusion Property tax per $1000 has statistically signifincant
effect on median housing price, higher property tax per $1000,
lower median housing price.
2. Confidence Intervals
Test the following hypothesis:

Variable
Const

X1
X2
X3
X4

Coefficient

Significant
Level
5%
5%
5%
5%
5%

Confidence Interval
(19667.75  ;  1547.148)
(209.9009  ; 90.23976)
(2503.241  ; 972.0787)
(6389.726  ; 9024.928)
(1136.546 ;445.9712)
18

Group 10

X5

Econometrics Report

5%

(142.7057 ; 37.20862)

Figure 9
We can see that for all coefficients, 0 doesn’t belong to the
confidence interval, so we reject the hypotheses H0: , , , ,
Conclusion: Number of crimes committed per capita, nitrogen oxide
concentrator per 100m, the average number of rooms, weight
distance to 5 employ centers and property tax per $1000 all have
statistically signifincant effect on median housing price with
the confidence level of 95%.
3. P Value
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
Number of crimes committed per capita has statistically
signifincant effect on median housing price. Higher number of
crimes commited per capita, lower median housing price.
In particular, with the sample we have, the estimated result
shows that one more crime committed decreases median housing
price by 150.07$, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
Nitrogen oxide concentrator per 100m has statistically
signifincant effect on median housing price. Higher nitrogen
oxide concentrator per 100m, lower median housing price
In particular, with the sample we have, the estimated result
shows that one more unit in nitrogen oxide concentrator per 100m
decreases median housing price by 1737.66$, holding other factors
fixed.

Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
The average number of rooms has statistically signifincant effect
on median housing price, higher average number of rooms, higher
median housing price.
19

Group 10

Econometrics Report

In particular, with the sample we have, the estimated result
shows that one more room added in the house increases median
housing price by 7707.33 $, holding other factors fixed.
Hypothesis testing:
Pvalue = 0.0004 < α = 0.05 => Reject H0
Weight distance to 5 employ centers

has statistically

signifincant effect on median housing price, higher weight
distance to 5 employ centers, lower median housing price.
In particular, with the sample we have, the estimated result
shows that one more unit increased in weight distance to 5 employ
centers decreases median housing price by 791.25$, holding other
factors fixed.
Hypothesis testing:
Pvalue = 0.0008 < α = 0.05 => Reject H0
Property tax per $1000 has statistically signifincant effect on

median housing price, higher property tax per $1000, lower median
housing price.
In particular, with the sample we have, the estimated result
shows that one more $ increased in property tax per 1000$
decreases median housing price by 89.96 $, holding other factors
fixed.
4. Testing the overall significance: The F test
This test is to examine if the parameters of the independent
variable βi at the same time can be zero.
The hypothesis is as follows:

= 142.92 >
As a result, there is enough evidence to reject the null
hypothesis and conclude that at least one independent variable in
20

Group 10

Econometrics Report

the subset does have explanatory or predictive power on price, so
we don’t reduce the model by dropping out this subset.
IX. RESULT ANALYSIS AND POLICY IMPLICATION
From data analysis in previous sections, we have gained an
overall view of data set given in term of the satistical
relationship between housing prices and each of the factors
proposed. As mentioned at the beginning of this report, we aim to
learn how security of the neighborhood, the air pollution, the
size of house, accessibility and the property tax are associated

with housing price. In other words, we are concerned about what
is the willingness of buyers to pay for these components.
Following the analysis of data, regression model run and
hypothesis testing, it can be concluded that security of the
neighborhood, the air pollution, the size of house, accessibility
and the property tax statistically affect the housing prices.
Therefore, tenants, investors or constructors should take all of
these ingredients into account when making deals.
X. CONCLUSION
This report is completed on the dedicated contribution of
each member and the knowledge from our study in Econometrics.
This research has provided us with a good opportunity to practice
what we have learned and to get a deeper understanding of data
analysis and relevant testing. From this useful application, we
hope that our research can somehow suggest the relationship
between the housing prices and some other factors.
Again, due to the limitation of understanding and resources, our
report may contain misinterpretations. We hope that teacher and
readers can give us constructive comments on the report so that
we would improve ourselves and do better in the future.
XI.

REFERENCES

21

Group 10

Econometrics Report

/>Feb_2011.pdf
1. />2. />doi=10.1.1.926.5532&rep=rep1&type=pdf
3. D.A. Belsey, E. Kuh, and R. Welsch, Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity, New
York: Wiley (1990).

22

tiểu luận kinh tế lượng tài chính factors that affect housing prices among location and region

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về