Tải bản đầy đủ (.doc) (30 trang)

tiểu luận kinh tế lượng factors affecting house price

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (437.18 KB, 30 trang )

FOREIGN TRADE UNIVERSITY
FACULTY OF INTERNATIONAL ECONOMY

REPORT
ECONOMETRICS
HOUSING PRICES AND FACTORS AFFECTING
HOUSING PRICES IN CALIFORNIA, USA
Instructor: MSc. Chu Thi Mai Phuong
Student: Group AH – KTEE309.2
1. Nguyen Lan Anh

1612250005

2. Le Mai Huong

1612250015

3. Le Thu Huong

1616250014

Hanoi, December 2017
1


EVALUATION
Name

Nguyen Lan
Anh


Le Mai Huong

Le Thu Huong

Job

Evaluation

- Estimate the linlin model, loglog model.
- Test the lin-lin
model, log-log
model.
- Find source
data.
- Write
conclusion.

- Enthusiastic,
responsible for
the work
assigned,
completed the
work well.
- Have the sense
of helping the
team members
work.

- Describe the
data, variables,

correlation.
- Estimate the
log-lin model.
- Test the log-lin
model.
- Write
introduction,
abstract.

- Be responsible
for the work,
motivate,
support
members work.
- Division of
work for the
reasonable
members.
- Complete the
deadline.

- Final check and

Mark

10/10

10/10

9/10


edit the report.

2


CATEGORY
EVALUATION.........................................................................................................2
INTRODUCTION................................................................................................... 4
ABSTRACT..............................................................................................................5
ANALYSIS................................................................................................................6
SECTION 1: DESCRIBE THE VARIABLES, DATA AND CORRELATION.....6
I. Describe the variables....................................................................................6
II.

Describe the data........................................................................................ 6

III.

Describe the correlation between variables................................................6

SECTION 2: ESTIMATED MODEL AND STATISTICAL INFERENCES.........9
I. Linear – linear model.....................................................................................9
II. Log – linear model........................................................................................15
III. Log – log model...........................................................................................21
CONCLUSION...................................................................................................... 28
REFERENCES.......................................................................................................30

3



INTRODUCTION
Econometrics is the study of the social sciences in which the tools of
economic theory, mathematics and statistical speculation are applied to analyze
economic problems. Econometrics uses the mathematical statistics methods to find
out the essence of statistics, make conclusions about the collected statistics that can
make predictions about economic phenomenon.
Since its inception, econometrics has provided economists with a sharp
instrument for measuring economic relations. As economics students, we recognize
the need to study and learn about Econometrics in logical and problem analysis. To
better understand how to put the Econometrics into reality and to apply the
Econometrics effectively and correctly, our team would like to develop the
ECONOMETRICS REPORT under the guidance of MSc. Chu Thi Mai Phuong. In
this report, we used the econometric analysis tool GRETL to analyze the topic
"Housing Prices and Factors Affecting Housing Prices in California, USA ".
We sincerely thank our instructor - MSc. Chu Thi Mai Phuong for helping us
to implement this report. During the course of the report, despite all the efforts, we
certainly can not avoid the errors, we look forward to your comments so that our
team can improve this report.

4


ABSTRACT
Recently, according to the report of the National Association of Realtors in
the United States, Vietnam is one of the 10 countries in the world with the highest
investing in real estate in the USA (VnExpress). California is one of the places
where there are many overseas Vietnamese living and also the state has a vibrant
real estate market that Vietnamese and people from other countries would like to
invest in. So what has affected housing prices in this area? As economics students

interested in real estate, we decided to do research on the topic "Housing Prices and
Factors Affecting Housing Prices in California, USA”.
In the process of searching for documents, we read a lot of foreign writings
about the factors affecting housing prices in many regions and countries around
the world such as "Macroeconomic Determinants of the Housing Market "- LSE,"
House Price Dynamics in the United States "- IMF, ... After synthesizing and
discussing, we decided to select a few factors that affect the price of a house to
conduct research on the subject.
Due to the limited time, we can only pick up a few prominent factors, hope
for your understanding. Thank you!

5


ANALYSIS
SECTION 1: DESCRIBE THE VARIABLES, DATA AND
CORRELATION
I.

Describe the variables
Function we have in this report will include these following variables:
Dependent variable: salepric - Sale price and characteristics of house in
2 communities of California: Dove Canyon and Coto de Caza (thousands
of dollars)
Independent variables:
sqft – Living area in square feet
garage – Number of car spaces
city – City: 1 for Coto de Caza and 0 for Dove Canyon

II. Describe the data

Data collections
We collect data of Sale price and characteristics of house in 2 communities of
California: Dove Canyon and Coto de Caza from Ramanathan - Gretl

III.

Describe the correlation between variables

 Correlation Matrix for Linear – linear Model:
Correlation coefficients, using the observations 1 - 224
5% critical value (two-tailed) = 0.1311 for n = 224
-

salepric
1.0000

sqft
0.9193
1.0000

garage
0.6536
0.5818
1.0000

city
0.5033
0.4275
0.2421
1.0000


salepric
sqft
garage
city

6


- Salepric is directly proportional to sqft. The set standard between these two
variable is quite high
- Salepric is directly proportional to garage. The set standard between these
two variable is medium
- Salepric is directly proportional to city. The set standard between these two
variable is medium

 Correlation Matrix for Log – linear Model:
Correlation coefficients, using the observations 1 - 224
5% critical value (two-tailed) = 0.1311 for n = 224
l_salepric
1.0000

sqft
0.8857
1.0000

garage
0.6135
0.5818
1.0000


city
0.6486
0.4275
0.2421
1.0000

l_salepric
sqft
garage
city

As we can see
-

-

-

Salepric is directly proportional to sqft. The set standard between these two
variable is quite high
Salepric is directly proportional to garage. The set standard between these
two variable is medium
Salepric is directly proportional to city. The set standard between these two
variable is medium

 Correlation Matrix for Log – log Model:
Correlation coefficients, using the observations 1 - 224
5% critical value (two-tailed) = 0.1311 for n = 224
-


l_salepric
1.0000

l_sqft
0.9001
1.0000

l_garage
0.5988
0.5665

city
0.6486
0.4750

l_salepric
l_sqft

7


1.0000

-

-

-


0.2365 l_garage
1.0000 city

Salepric is directly proportional to sqft. The set standard between these two
variable is quite high
Salepric is directly proportional to garage. The set standard between these
two variable is medium
Salepric is directly proportional to city. The set standard between these two
variable is medium

8


SECTION 2: ESTIMATED MODEL AND STATISTICAL
INFERENCES
I.

Linear – linear model.

1. Estimation.
Model 1: OLS, using observations 1-224
Dependent variable: salepric
Coefficient
−704.854
0.220060
129.286
101.275

Const
Sqft

Garage
City

Mean dependent var
Sum squared resid
R-squared
F(3, 220)
Log-likelihood
Schwarz criterion

Std. Error
53.3132
0.00891506
20.3863
19.0453

642.9294
3641423
0.881604
546.0556
−1403.821
2829.289

t-ratio
−13.22
24.68
6.342
5.318

S.D. dependent var

S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

p-value
<0.0001
<0.0001
<0.0001
<0.0001

***
***
***
***

371.3762
128.6543
0.879989
1.3e-101
2815.642
2821.150

 Describe the basic content of the value when estimating the function:
- The Population regression function is set up:
saleprici = 1 + 2sqft + 3garage+ 4city + i
- The Sample regression function is set up:
̂̂


̂ ̂
̂
̂
= + sqft + garage + city

i1234

- Equation of regression:
salepric = -704.854 + 0.220060sqft + 129.286garage + 101.275city
 Data explaination:

9








2=

0.220060: When sqft increases by 1 (square feet), holding the
value of garage and city constant, the estimated value of salepric
increases by 0.220060 (thousands of dollars).
= 129.286: When garage increases by 1 (number of car space), holding
the value of sqft and city constant, the estimated value of salepric
increases by 129.286 (thousands of dollars).
3


4=

101.275: The expected sale price of house in Coto de Caza is higher
than that in Dove Canyon with the value is 101.275 (thousands of
dollars).

 The coefficient of determination R2:
In our results, we can see R2 which indicates that the model explains all the
variability of the response data around its mean.
That R2 = 0.881604 is quite high, which suggests that the model is good fit.
Because this means 88.1604% of the sample variation in the percentage vote for
dependent variable (sale price) is explained by the changes in the independent
variables (living area, number of car spaces and city).
2. Testing
2.1.

Testing hypothesis

2.1.1. Testing an individual regression coefficient
 Purpose: Test for the statistical significace or the effect of independent


variables on dependent one. We have: α = 0.05.
Testing the variable of Living area in square feet (sqft):

 Given that the hypothesis is:
{

: =


: ≠

10





We see: P-value of sqft is < 0.0001 < 0.05 → Reject H0 → The coefficient 2 is statistically significant.

Testing the variable of Number of car spaces (garage):

 Given that the hypothesis is:
{




: ≠

We see: P-value of garage is < 0.0001 < 0.05 → Reject H0 → The coefficient 3 is statistically significant.

Testing the variable of City:

 Given that the hypothesis is:
{



: =


: =

: ≠

We see: P-value of sqft is < 0.0001 < 0.05 → Reject H0 → The coefficient 4 is statistically significant.

2.1.2. Testing the overall significance.
 Purpose: Test the null hypothesis stating that none of the explanatory
variables has an effect on the dependent variable.We have: = 0.05

 
: =

(i = 1, 2, 3, 4)

{
:∃ ≠



We have: P-value(F) = 1.3e-101 < = 0.05 → Reject H0 → All parameters are not simultaneously equal to zero→ At least one variable has an
effect on



The model is statistically fitted.

2.2.


dependent one.

Testing the model’s problems.
11


2.2.1. Testing omit variable
 Given that the hypothesis is:
{

:

:

 Ramsey’s RESET:
Auxiliary regression for RESET specification test
OLS, using observations 1-224
Dependent variable: salepric
coefficient
std. error
t-ratio
p-value
-----------------------------------------------------------const
419.370
278.731
1.505
0.1339
sqft
−0.0255144
0.0615390

−0.4146
0.6788
garage
−14.1284
41.3909
−0.3413
0.7332
city
53.0970
27.8779
1.905
0.0581 *
yhat^2
0.000847862
0.000268149
3.162
0.0018 ***
yhat^3
−1.83128e-07
7.65971e-08
−2.391
0.0177 **
Test statistic: F = 13.777161,
with p-value = P(F(2,218) > 13.7772) = 2.32e-006



We see: p-value = P(F(2,218) > 13.7772) = 2.32e – 006 < = 0,05 → Reject H0 → The model omits variable.

Method: Because of the limited research, we will spend more time reading

more documents to find out which variable is omitted.

2.2.2. Testing multicollinearity.
 Using the following command vif regression to examine multicollinearity.
“VIF” commands specific to the variance inflation factor, if a variable value
vif > 10, the model has the possibility of multicollinearity.
 Using “VIF” command in Gretl, we have following result:
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
sqft
garage
city

1.742
1.512
1.224

12


VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation
coefficient
between variable j and the other independent variables
Belsley-Kuh-Welsch collinearity diagnostics:
lambda
3.591
0.354
0.044
0.011


cond
1.000
3.185
9.020
18.192

--- variance proportions --const
sqft
garage
0.002
0.004
0.001
0.008
0.003
0.004
0.169
0.787
0.013
0.821
0.206
0.981

city
0.022
0.858
0.120
0.001

lambda = eigenvalues of X'X, largest to

smallest cond = condition index
note: variance proportions columns sum to 1.0

 We see: VIF(sqft) = 1.742 < 10
VIF(garage) = 1.512 < 10
VIF(city) = 1.224 < 10


The model does not contain perfect multicollinearity.

2.2.3. Testing Heteroskedasticity.
 Given that the hypothesis is:
{

:

:

 White’s test:
White's test for heteroskedasticity
OLS, using observations 1-224
Dependent variable: uhat^2
coefficient
std. error
t-ratio
p-value
-------------------------------------------------------------const
93359.5
63100.7
1.480

0.1405
sqft
−55.8510
15.7454
−3.547
0.0005
***
garage
26287.7
31028.8
0.8472 0.3978
city
−41070.9
62326.7
−0.6590 0.5106
sq_sqft
0.0106259
0.000988871 10.75
7.78e-022 ***
X2_X3
−7.81731
4.36037
−1.793
0.0744
*
X2_X4
−10.5893
10.5405
−1.005
0.3162

sq_garage
−2287.85
5619.95
−0.4071 0.6843
X3_X4
27866.6
18698.6
1.490
0.1376

13


Unadjusted R-squared = 0.666709
Test statistic: TR^2 = 149.342835,
with p-value = P(Chi-square(8) > 149.342835) = 2.68837e-028



We see: p-value = P(Chi-square(8) > 149.342835) = 2.68837e - 028 < = 0.05 → Reject H0 → The model has heteroskedasticity problem.



Method: Using Robust to fix the problem:
Model 2: OLS, using observations 1-224 Dependent
variable: salepric Heteroskedasticity-robust standard
errors, variant HC1
Coefficient
−704.854
0.220060

129.286
101.275

Const
Sqft
garage
City

Mean dependent var
Sum squared resid
R-squared
F(3, 220)
Log-likelihood
Schwarz criterion


Std. Error
91.3694
0.0310242
45.0903
19.4662

642.9294
3641423
0.881604
196.1526
−1403.821
2829.289

t-ratio

−7.714
7.093
2.867
5.203

S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

p-value
<0.0001
<0.0001
0.0045
<0.0001

***
***
***
***

371.3762
128.6543
0.879989
6.77e-62
2815.642
2821.150


The model has BLUE quality but it still contains heteroskedasticity problem.

2.2.4. Testing normality of residual.
 Given that the hypothesis is:
{

:

:

 Using normality of residual in Gretl:
Test for normality of residual Null hypothesis: error is normally distributed
Test statistic: Chi-square(2) = 265.203
with p-value = 2.58197e-058
14





We see: Chi-square(2) = 265.203 with p-value 2.58197e-058 < = 0.05 → Reject H0 → The model does not have normality.
Method: Increasing the number of observations until n ≥ 384.

II. Log – linear model.
1. Estimation.
Model 3: OLS, using observations 1-224
Dependent variable: l_salepric
Coefficient Std. Error
5.01704
0.0561415

0.00020749 9.38800e-06
0.117941
0.0214678
0.267482
0.0200557

Const
Sqft
garage
City

Mean dependent var
Sum squared resid
R-squared
F(3, 220)
Log-likelihood
Schwarz criterion

6.365959
4.038026
0.888862
586.5050
131.9375
−242.2283

t-ratio
89.36
22.10
5.494
13.34


S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

p-value
<0.0001
<0.0001
<0.0001
<0.0001

***
***
***
***

0.403646
0.135479
0.887346
1.2e-104
−255.8749
−250.3665

● Describe the basic content of the value when estimating the function:
- The Population regression function is set up:
ln(salepric) = 1 + 2sqft + 3garage+ 3city + i
- The Sample regression function is set up:

̂ ̂
̂
̂
= + sqft + lngarage + city

̂̂

(

) 1

2

34

- Equation of regression:
ln(salepric) = 5.01704 + 0.000207498 sqft − 0.117941 garage + 0.267482 city
● Data explaination:

15




2

= 0.000207498: When sqft increases by 1 (square feet), keeping the

value of garage and city constant, the Expected value of salepric
increases by 0.0207498%.



3

= -0.117941: When garage increases by 1 (year), keeping the value of

sqft and city constant, the Expected value of salepric decreases by
11.7941%.
✓ 4 = 0.267482: The expected sale price of house in Coto de Caza is higher
than that in Dove Canyon with the value is 26.7482 %.

● The coefficient of determination R2:
In our results, we can see R2 which indicates that the model explains all the
variability of the response data around its mean.
That R2 = 0.888862 is quite high, which suggests that the model is good fit,
which means 88.8862% of the sample variation in the percentage vote for
dependent variable (salepric) is explained by the changes in the independent
variables (sqft, garage and city).
2. Testing.
2.1.

Testing hypothesis

2.1.1. Testing an individual regression coefficient
 Purpose: Test for the statistical significace or the effect of independent
variables on dependent one. We have: α = 0.05.
 Testing the sqft:
Given that the hypothesis is:
{


:

:

=


16


 We see: P-value of sqft is < 0.0001 < 0.05 → Reject H0 → The coefficient β2
is statistically significant.
 Testing the garage:
Given that the hypothesis is:
{

:

:

=



 We see: P-value of garage is < 0.0001 < 0.05 → Reject H0 → The coefficient
β3 is statistically significant.
 Testing the city:
Given that the hypothesis is:
{


:

:

=≠

 We see: P-value of city is < 0.0001 < 0.05 → Reject H0 → The coefficient β4
is statistically significant.
2.1.2. Testing the overall significance.
 Purpose: Test the null hypothesis stating that none of the explanatory
variables has an effect on the dependent variable.We have: α=0.05
Given that the hypothesis is:
: =

{
:∃

(i = 1, 2, 3, 4)



We have: P-value(F) = 1.2e – 104 < α=0.05 → Reject H0 → All parameters
are not simultaneously equal to zero→ At least one variable has an effect on
dependent one.
→ The model is statistically fitted.
2.2. Testing the model’s problem.
2.2.1. Testing Omit variable.
 Given that the hypothesis is:

17



{ :

:

 Ramsey’s RESET:
Auxiliary regression for RESET specification test
OLS, using observations 1-224
Dependent variable: l_salepric
coefficient
std. error
t-ratio p-value
--------------------------------------------------------const
−86.4806
28.9110
−2.991
0.0031
sqft
−0.00647575
0.00218279
−2.967
0.0033
garage
−3.68898
1.24248
−2.969
0.0033
city
−8.40197

2.80299
−2.998
0.0030
yhat^2
4.90549
1.53920
3.187
0.0016
yhat^3
−0.247302
0.0748301
−3.305
0.0011

***
***
***
***
***
***

Test statistic: F = 11.345346,
with p-value = P(F(2,218) > 11.3453) = 2.05e-005

 We see: p-value = P(F(2,218) > 11.3453) = 2.05e-005 < α = 0,05 → Reject
H0 → The model omit variable.
 Method: Because of the limited research, we will spend more time reading
more documents to find out which variable is omitted.
2.2.2. Testing multicollinearity.
 Using the following command vif regression to examine multicollinearity.

“VIF” commands specific to the variance inflation factor, if a variable value
vif > 10, the model has the possibility of multicollinearity.
 Using “VIF” command in Gretl, we have following result:
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
sqft
garage
city

1.742
1.512
1.224

VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation
coefficient

18


between variable j and the other independent variables
Belsley-Kuh-Welsch collinearity diagnostics:
--- variance proportions --lambda
cond
const
sqft
garage
3.591
1.000
0.002

0.004
0.001
0.354
3.185
0.008
0.003
0.004
0.044
9.020
0.169
0.787
0.013
0.011
18.192
0.821
0.206
0.981
lambda = eigenvalues of X'X, largest to smallest

city
0.022
0.858
0.120
0.001

cond
= condition index
note: variance proportions columns sum to 1.0

We see: VIF (sqft) = 1.742 < 10

VIF (garage) = 1.512 < 10
VIF (city) = 1.224 < 10
→ The model does not contain perfect multicollinearity.
2.2.3. Testing Heteroskedasticity.
 Given that the hypothesis is:
{

:

:

 White’s test:
White's test for heteroskedasticity
OLS, using observations 1-224
Dependent variable: uhat^2
coefficient
std. error
t-ratio
p-value
-------------------------------------------------------------const
0.0199661
0.0562010
0.3553
0.7227
sqft
−1.76611e-05
1.40238e-05
−1.259
0.2093
garage

0.0326536
0.0276360
1.182
0.2387
city
−0.0776614
0.0555116
−1.399
0.1633
sq_sqft
−2.70427e-012 8.80744e-010
−0.003070 0.9976
X2_X3
6.84208e-06
3.88359e-06
1.762
0.0795 *
X2_X4
7.65262e-06
9.38796e-06
0.8152
0.4159
sq_garage
−0.0127315
0.00500544
−2.544
0.0117 **
X3_X4
0.0170934
0.0166541

1.026
0.3059
Unadjusted R-squared = 0.216999

19


Test statistic: TR^2 = 48.607688,
with p-value = P(Chi-square(8) > 48.6077) = 7.55903e-008

 We see: p-value = P(Chi-square(8) > 48.6077) = 7.55903e-008 < α = 0.05
→ Reject H0 → The model has heteroskedasticity problem.
 Method: Using Robust to fix the problem:
Model 4: OLS, using observations 1-224 Dependent
variable: l_salepric Heteroskedasticity-robust standard
errors, variant HC1
Coefficient Std. Error
5.01704
0.0615381
0.00020749 1.69491e-05
8
0.117941
0.0247488
0.267482
0.0189490

Const
Sqft
Garage
City


Mean dependent var
Sum squared resid
R-squared
F(3, 220)
Log-likelihood
Schwarz criterion

6.365959
4.038026
0.888862
336.2140
131.9375
−242.2283

t-ratio
81.53
12.24

p-value
<0.0001
<0.0001

***
***

4.766
14.12

<0.0001

<0.0001

***
***

S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

0.403646
0.135479
0.887346
7.28e-82
−255.8749
−250.3665

→The model has BLUE quality but it still contains heteroskedasticity
problem.
2.2.4. Testing normality of residual
 Given that the hypothesis is:
{

0: The residuals have normality

 Using normality of residual in Gretl:
Frequency distribution for uhat1, obs 1-224
number of bins = 15, mean = 3.17207e-017, sd = 0.135479

interval
< -0.55308
-0.55308 - -0.48516
-0.48516 - -0.41724
-0.41724 - -0.34931

midpt
frequency
-0.58704
1
-0.51912
-0.45120
-0.38328

0
0
0

rel.
0.45%

cum.
0.45%

0.00%
0.00%
0.00%

0.45%
0.45%

0.45%

20


-0.34931
-0.28139
-0.21347
-0.14555
-0.077625
-0.0097031
0.058219
0.12614
0.19406
0.26199

>=

-0.28139
-0.21347
-0.14555
-0.077625
-0.0097031
0.058219
0.12614
0.19406
0.26199
0.32991
0.32991


-0.31535
-0.24743
-0.17951
-0.11159
-0.043664
0.024258
0.092180
0.16010
0.22803
0.29595
0.36387

1
5
15
43
53
50
21
14
9
8
4

0.45%
2.23%
6.70%
19.20%
23.66%
22.32%

9.38%
6.25%
4.02%
3.57%
1.79%

0.89%
3.12%
9.82% **
29.02% ******
52.68% ********
75.00% ********
84.38% ***
90.62% **
94.64% *
98.21% *
100.00%

Test for null hypothesis of normal distribution:
Chi-square(2) = 16.779 with p-value 0.00023

 We see: Chi-square(2) = 16.779 with p-value 0.00023 < α = 0.05 → Reject
H0 → The model does not have normality.
 Method: Increasing the number of observations until n ≥ 384.
III. Log – log model.
1. Estimation
Model 5: OLS, using observations 1-224
Dependent variable: l_salepric
const
l_sqft

l_garage
city

Coefficient
−3.35140
1.10258
0.421886
0.235163

Mean dependent var
Sum squared resid
R-squared
F(3, 220)
Log-likelihood
Schwarz criterion

Std. Error
0.361870
0.0492278
0.0799229
0.0207535

6.365959
4.088755
0.887465
578.3183
130.5392
−239.4318

t-ratio

−9.261
22.40
5.279
11.33

S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

p-value
<0.0001
<0.0001
<0.0001
<0.0001

***
***
***
***

0.403646
0.136328
0.885931
4.9e-104
−253.0784
−247.5700


● Describe the basic content of the value when estimating the
function: - The Population regression function is set up:
ln(salepric) = 1 + 2ln(sqft) + 3ln(garage)+ 3ln(city) +

i

21


- The Sample regression function is set up:
̂ ̂
̂
̂
= + ln(sqft) + ln(garage) + ln(city)

̂̂

(

) 1

2

34

- Equation of regression:
ln(salepric) = −3.35140 + 1.10258ln(sqft) − 0.421886ln(garage) +
0.235163ln(city)
● Data explaination:



2

= 1.10258: When sqft increases by 1%, keeping the value of garage and

city constant, the expected value of salepric increases by 1.10258%.


3

= 0.421886: When garage increases by 1%, holding the value of sqft,

and city constant, the expected value of salepric decreases by
0.421886%.
✓ 4 = 0.235163: The expected sale price of house in Coto de Caza is higher
than that in Dove Canyon with the value is 0.235163%.

● The coefficient of determination R2:
In our results, we can see R2 which indicates that the model explains all the
variability of the response data around its mean.
That R2 = 0.887465 is quite high, which suggests that the model is good fit,
which means 88.7465% of the sample variation in the percentage vote for
dependent variable (sale price) is explained by the changes in the independent
variables (sqft, garage and city).
2. Testing
2.1. Testing hypothesis
2.1.1. Testing an individual regression coefficient
 Purpose: Test for the statistical significance or the effect of independent
variables on dependent one. We have: α = 0.05.
22



 Testing the Ln(sqft): Given
that the hypothesis is:
{

:

:

=


 We see: P-value of Ln(sqft) is < 0.0001 < 0.05 → Reject H0 → The
coefficient β2 is statistically significant.
 Testing the Ln(garage):
Given that the hypothesis is:
{

:

:

=



 We see: P-value of Ln(garage) is < 0.0001 < 0.05 → Reject H0 → The
coefficient β3 is statistically significant.
 Testing the Ln(city): Given

that the hypothesis is:
{

:

:

=



 We see: P-value of Ln(city) is < 0.0001 < 0.05 → Reject H0 → The
coefficient β4 is statistically significant.
2.1.2. Testing the overall significance.
 Purpose: Test the null hypothesis stating that none of the explanatory
variables has an effect on the dependent variable. We have: α=0.05
 
(i = 1, 2, 3, 4)
: =

{

:∃ ≠

We have: P-value(F) = 4.9e – 104 < α = 0.05 → Reject H0 → All parameters
are not simultaneously equal to zero→ At least one variable has an effect on
dependent one.
→ The model is statistically fitted

23



2.2.

Testing the model’s problems.

2.2.1. Testing Omit variable
 Given that the hypothesis is:
{

:

:

 Ramsey’s RESET:
Auxiliary regression for RESET specification test
OLS, using observations 1-224
Dependent variable: l_salepric
coefficient
std. error
t-ratio p-value
-------------------------------------------------------const
233.194
56.8671
4.101
5.82e-05 ***
l_sqft
−45.2389
11.2886
−4.007

8.42e-05 ***
l_garage
−17.3641
4.32094
−4.019
8.06e-05 ***
city
−9.60170
2.41046
−3.983
9.26e-05 ***
yhat^2
6.12875
1.54436
3.968
9.82e-05 ***
yhat^3
−0.296543
0.0773986
−3.831
0.0002
***
Test statistic: F = 17.529252,
with p-value = P(F(2,218) > 17.5293) = 8.72e-008



We see: p-value = P(F(2,218) > 17.5293) = 8.72e-008 < = 0,05 → Reject H0 → The model omits variable.

Method: Because of the limited research, we will spend more time reading

more documents to find out which variable is omitted.

2.2.2. Testing multicollinearity.
 Using the following command vif regression to examine multicollinearity.
“VIF” commands specific to the variance inflation factor, if a variable value
vif > 10, the model has the possibility of multicollinearity.
 Using “VIF” command in Gretl, we have following result:
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
l_sqft
l_garage

1.799
1.476

24


city

1.294

VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation
coefficient
between variable j and the other independent variables
Belsley-Kuh-Welsch collinearity diagnostics:
lambda
3.610
0.381

0.008
0.000

cond
1.000
3.076
20.733
113.322

--- variance proportions --const
l_sqft l_garage
0.000
0.000
0.001
0.000
0.000
0.001
0.017
0.007
0.785
0.983
0.993
0.213

city
0.019
0.774
0.020
0.186


lambda = eigenvalues of X'X, largest to
smallest cond = condition index
note: variance proportions columns sum to 1.0

 We see: VIF(sqft) = 1.799 < 10
VIF(garage) = 1.476 < 10
VIF(city) = 1.294 < 10


The model does not contain perfect multicollinearity.

2.2.3. Testing Heteroskedasticity.
 Given that the hypothesis is:
{

:

:

 White’s test:
White's test for heteroskedasticity
OLS, using observations 1-224
Dependent variable: uhat^2
coefficient
std. error
t-ratio p-value
----------------------------------------------------------const
5.08440
1.56225
3.255

0.0013
l_sqft
−1.34004
0.400019
−3.350
0.0010
l_garage
1.14523
0.542902
2.109
0.0361
city
−0.429135
0.265334
−1.617
0.1073
sq_l_sqft
0.0948272
0.0267366
3.547
0.0005
X2_X3
−0.210319
0.0800606
−2.627
0.0092
X2_X4
0.0181291
0.0331792
0.5464 0.5854

sq_l_garage
0.147245
0.0651323
2.261
0.0248

***
***
**
***
***
**

25


×