Tải bản đầy đủ (.docx) (29 trang)

tiểu luận kinh tế lượng FACTORS AFFECTING HUMAN’S LIFE EXPECTANCY

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (593.2 KB, 29 trang )

FOREIGN TRADE UNIVERSITY
FACULTY OF INTERNATIONAL ECONOMICS
----------

GROUP REPORT
FACTORS AFFECTING HUMAN’S
LIFE EXPECTANCY

Group’s members

Student ID

Hoàng Vân Anh

1815520150

Nguyễn Hoàng Bách

1815520156

Vũ Minh Ngọc

1815520211

Class: KTEE218 (1-1920).1_LT
Course: Econometrics 1
Lecturer: Dr. Tu Thuy Anh
Dr. Chu Thi Mai Phuong
Hanoi, October 2019

1




2


INDEX

ABSTRACT

Life expectancy at birth reflects the overall mortality level of a population. It
summarizes the mortality pattern that prevails across all age groups in a given year –
children and adolescents, adults and the elderly. According to World Health Organization,
global life expectancy at birth in 2016 was 72.0 years (74.2 years for females and 69.8
years for males), ranging from 61.2 years in the WHO African Region to 77.5 years in the
WHO European Region, giving a ratio of 1.3 between the two regions.So what has
affected life expectancy in the world? As economics students interested in social matters,
we decided to do research on the topic "Factors affecting human’s life expectancy ”.
In the process of searching for documents, we have pointed out some prominent
factors that affected life expentacy which should be noted are: GDP per capita, GNI per
capita, HE - Current Health expenditure (% GDP) and air pollution. In order to gain a
better understanding of the 4 factors’ influence of human expectancy, the team has
gathered data from 207 countries around the world in 2016 and estimated the regression
model using the OLS method. Life expectancy is the dependent variable with GDP per
capita, GNI per capita, HE - Current Health expenditure (% GDP) and air pollution as the
4 main determinants. The results showed that all GDP per capita, GNI per capita and HE
- Current Health expenditure (% GDP) have positive relation with life expectancy, with
the rise in GDP per capita, GNI per capita and HE - Current Health expenditure (%
3



GDP) influence an increase in life’s predicted duration. On the the other hand, air
pollution has a negative impact on average longevity.

1. INTRODUCTION
Econometrics is the meaningful study of the social sciences in which the tools of
economic theory, mathematics and statistical speculation are applied to analyze economic
problems. Econometrics uses the mathematical statistics methods to find out the essence
of statistics, make conclusions about the collected statistics that can make predictions
about economic phenomenon.
Since its inception, econometrics has provided economists with a sharp
instrument for measuring economic relations. As economics students, we recognize the
importance of studying about Econometrics in logical and problem analysis. To better
understand how to put the Econometrics into reality and to apply the Econometrics
effectively and correctly, given the data set, our group, which includes three members:
Nguyen Hoang Bach, Hoang Van Anh, and Vu Minh Ngoc, follows the methodology of
econometrics to analyze the data. Noted that because of the lack of information on the
data set, all inferences of abbreviations and others are based on assumptions and selfresearch. As a result, we hope to have shown clearly our logic and reasoning of analysis.
To the extent of purpose and resources, there are still deficiencies in this report,
but we look forward to providing readers with a decent view of the overall of the data set

4


given and the knowledge that we have gained through Dr. Tu Thuy Anh’s Econometrics
course.

2. LECTURE REVIEW
2.1 Related research
Understanding the methods to life expectancy is something that people have
done since ancient times. When science was not yet developed, people expected the

mysterious medicine even the very colorful idealistic activities to dream of immortality
and eternity. Nowadays, as science and technology are growing, people are increasingly
expanding their understanding of the world, explaining more natural and social
phenomena, and the issue of longevity is also analyzed and explained more and more
realistically, gradually moving away from the spiritual and mystical elements.
Through publications, scientific research, we also see many authors mention
issues related to human life such as: the secret to improving longevity? What makes
people quickly aging? ... Conclusions and recommendations of these publications mainly
revolve around issues of genetics, diet, rest, work, entertainment of humans. Such
explanations are far too simple, missing many important factors. Some other studies have
also mentioned macro variables at a higher level such as education level, public service,
average income, etc., but the data are not complete or no longer new to explain better for
the problem of the current world.
We have consulted a lot of life expectancy studies in history and here are some
related research :
5


+ Bergh and Nilsson (2009) analyzed the relation between three dimensions of
globalization (economic, social and political) and life expectancy using a panel of 92
countries over the period 1970-2005. They found a very robust positive effect from
economic globalization on life expectancy, even when controlling for income, nutritional
intake, literacy, number of physicians and several other factors.
+ Mariani et al. (2008) determined the relationship between life expectancy and
environmental quality dynamics. The results showed environmental conditions affected
the life expectancy.
+ Yavari and Mehrnoosh (2006) analyzed the effects of socio- economic factors
on life expectancy using multiple regression analysis. This study showed that there is a
positive, strong correlation between life expectancy as an independent variable and per
capita income, health expenditures, literacy rate and daily calorie intake. Also, it revealed

that there is a negative strong correlation between life expectancy and the number of
people per doctor in African countries.
+ Leung and Wang (2003) investigated the relationship between health care, life
expectancy and output using a modified neoclassical growth model. They showed income
and economic development factors have positive impacts on lifetime.
Summing up, the review of presented studies shows that the determinants of life
expectancy can be divided into the economic, social and environmental factors.
Accordingly, in this study, the impacts of these factors on life expectancy are estimated to
follow the existing literature.
2.2.

Research orientation

2.2.1.

Dependent variable

Dependent variable is expected life expectancy at Birth (LEB) have a difference
from the Life Average data. If the average life expectancy calculation is calculated to
6


estimate the average age of the deaths at a given time, the expected life expectancy at
birth is the estimated life expectancy for a child at birth at a specific time provided that
the factors that influence the life expectancy in the future do not change compared to the
time of birth, so the expected life expectancy at birth is the result of the whole process
from the past to the present from the factors that have relevant, the level of impact of
each factor can change over the period of time. When analyzing life expectancy at birth,
we will be able to more accurately assess the impact of related factors at a given time.
2.2.2.


Independent variables

From the studies I also decided to choose variables to analyze their influence on
human life expectancy. The four factors are: Air pollution (µg/m 3); GNIpc: Gross national
income per capita (USD); HE : Current health expenditure (USD); People using at least
basic drinking water services (% of population)
a/ Air pollution
A study published in the US National Library of Medicine National Institutes of
Health has shown an influence from air quality on human life from 2000 to 2007 in the
United States.

7


Table 2 summarizes estimated regression coefficients for the association between
changes in PM2.5 and changes in life expectancy for 545 counties for 2000 to 2007 for
selected regression models. When controlling for changes in all available socioeconomic
and demographic variables as well as smoking prevalence proxy variables (Model 3), a
10 µg/m3 decrease in PM2.5 was associated with an estimated mean increase in life
expectancy of 0.35 years (SE= 0.16 years, p = 0.033). The estimated effect of PM 2.5 on
life expectancy was consistent across models adjusting for various patterns of potentially
confounding variables (e.g. Models 2 – 4). Models 5 – 9 of Table 2 show the results for
select stratified and weighted regressions. In counties with a population density greater
than 200 people per square mile, a 10 µg/m 3 decrease in PM2.5 was associated with an
increased life expectancy of 0.72 (0.22 years, p< 0.01) (Model 6), compared with −0.31
years (0.22 years, p = 0.165) in counties with less than 200 people per square mile (P
difference <0.01). In counties whose proportion of urban residences was greater than 90
percent, a 10 µg/m3 decrease in PM2.5 was associated with an increased life expectancy of
0.95 (0.31, p< 0.01)(Model 7), compared with −0.16 (0.16 years, p = 0.299) in counties

8


with less than 90% urban residences (P difference < 0.01).
b/ Gross national income per capita
Higher income per capita (IPC) means better access to public and private health
services those are provided by public or private sectors in a country. Good health service
which lowers mortality rates in a country promotes to reach a long living population level
with a higher life expectancy at birth (LEB) and healthy labour force enhancing
productivity. People feel themselves more productive with a good health care hence
increasing productivity and working hours will cause an increase in IPC (economic
growth) incessantly. In traditional economic growth theory, labour force which is one of
the factors of production function has got an important effect on the country’s economic
growth. This study aims to investigate the relationship between LEB and IPC data and
vice versa for 56 developing countries in North Africa, Middle-East and South-East Asia
where most of them are Islamic countries and members of The Organisation of Islamic
Cooperation (OIC).
According to the random and fixed effects estimation models with panel data
analysis and cross-section data analysis in the study, LEB is found as one of the
determinants of IPC and IPC as a main determinant of LEB in 56 developing countries.
Granger causality test is also applied to test the direction of causality between LEB and
national IPC for 56 developing countries and it is seen that IPC Granger causes LEB
increase and vice versa for panel data. For cross-section data analysis there is no proved
correlation between two variables.
LEB and IPC relationship in 56 developing countries (2015)
The IMF Data. Available: Accessed: 12 October 2018

9



c/ Current health expenditure
The graph below shows the relationship between what a country spends on health
per person and life expectancy in that country between 1970 and 2015 for a number of
rich countries.
The US stands out as an outlier: it spends far more on health than any other
country, yet the life expectancy of the American population is not longer, but actually
shorter than in other countries that spend far less.
If we look at the time trend for each country, we first notice that all countries
have followed an upward trajectory—the population lives increasingly long lives as
health expenditure increases. But again, the US stands out by following a much flatter
trajectory: gains in life expectancy from additional health spending in the U.S. are much
smaller than in the other high-income countries, particularly since the mid-1980s.
This development has led to a large inequality between the US and other rich
countries. In the US health spending per capita is often more than three times higher
than in other rich countries, yet the populations of countries with much lower health
spending than the US enjoy considerably longer lives. In the most extreme case, we see
that Americans spend more than 5-times what Chileans spend, yet the population of Chile
actually lives longer than Americans.

10


d/ People using at least basic drinking water services
Infants and young children are the innocent victims of the worldwide failure to
make safe drinking water and basic sanitation services available to impoverished people
(see Figure 4).Their families’ poverty, lack of basic services and the resulting filthy living
environment mean that children under 5 years of age in particular are exposed to a
multitude of health threats, without the physical or economic means to combat them.
Malnutrition – particularly protein-energy malnutrition – stunts growth, impairs cognitive
development and, crucially, lowers the children’s resistance to a wide range of infections,

including the water-related diarrhoeal diseases and malaria (see Figure 5). In developing
countries, over 90% of all diarrhoeal deaths occur in children under 5 years of age (see
Figure 3). In sub-Saharan Africa alone, some 769 000 children under 5 years of age died
annually from diarrhoeal diseases in 2000–2003.That is more than 2000 children’s lives
lost every day, in a region where just 36% of the population have access to hygienic
means of sanitation. South Asia has a similarly low sanitation coverage.There too child
11


mortality is very high. Some 683 000 children under 5 years of age die each year from
diarrhoeal disease. Compare that with the developed regions, where most mothers and
babies benefit from safe drinking water in quantities that make hygiene behaviour easy,
have access to safe, private sanitation, adequate nutrition, and many other prerequisites to
health. Of the 57 million children under 5 years old in the developed regions, about 700
succumbed annually to diarrhoeal disease (according to statistics for 2000–2003).That
means that the sub-Saharan baby has almost 520 times the chance of dying from
diarrhoea compared with a baby born in Europe or the United States of America.

3. Methodology of the study
3.1.

Method used to collect secondary data

The team collected sample and estimated values based on data from 215
observations in 2016 from 215 countries worldwide. For quantitative results, the number
of outputs should be equal to the number of inputs, which is the data collected by the
statistical method.

12



3.2.

Method used to analyze the data

By using OLS method, data is selected and checked the statistical significance of
the regression coefficients and the suitability of the model based on the observed
observations comparing with the previous research and similar studies, to find the best
results to use for analysis.
During the course of the project, the team used the knowledge of econometrics
and macroeconomics, quantitative methods with the main support of GRETL software,
Microsoft Excel, Microsoft Word for synthesis and completion of this report.

Econometrics model
Based on developed economic theory and practical experience, we have

3.3.

identified the expectation of independent variables affecting the expected life expectancy
at birth of the following:
LEB = f(AP; GNI; HE; PW)
In which:


LEB : Life Expectancy at Birth (year)



AP : Air pollution (µg/m3)




GNI : Gross national income per capita (USD)



HE : Current health expenditure (USD)
• PW : People using at least basic drinking water services (% of population)
To determine the influence of 4 given factors on human’s life expectancy at birth,
from the theory presented above, I proposed the following research model:
Population Regression model:
LE = β1 + β2.HE + β3.AP + β4.PW + β5.GNI + ui
Sample Regression model:
13


= + + + + +

Explain the variables
No

Variable

Unit

1

LEB

Years


2

AP

3

GNIpc

4
5

HE
PW

µg/m3
(microgram
per cubic meter)
USD
% of GDP
% of population

Value
expectation of
regression
coefficient

Meaning

Life expectancy at birth

(-)

Air pollution PM2.5
(annual exposure)

(+)

Gross national income
per capita
Current health expenditure
People using at least basic
drinking water services

(+)
(+)

* Data source

Link
Variables
LEB
E

/>downloadformat=excel
H
/>downloadformat=excel
/>
GNI

downloadformat=excel

/>?downloadformat=excel

AP
P
W

/>downloadformat=excel

14


4. Result estimation
4.1.

Summary of all Variables

Summary Statistics, using the observations 1 - 215
Variable
LEB
AP
GNIpc

Mean
Median
S.D.
71,8
73,2
7,49
29,2
23,1

19,2
1,85e+004 1,29e+004 1,87e+004

HE
PW

4.2.

6,61
87,0

6,12
93,7

2,63
15,8

Min
51,6
5,89
740,
1,98
38,9

Max
84,0
98,1
1,23e+
005
17,1

100,

Correlation between dependent variable and each independent
variables

The correlation LEB, AP, GNIpc, HE, PW is checked by calculating the
correlation coefficient among these variables. The correlation coefficient r measures the
strength and direction of a linear relationship between two variables on a scatterplot. In
Gretl, the correlation matrix is generated with the command:
Correlation coefficients, using the observations 1 - 215
5% critical value (two-tailed) = 0,1338 for n = 215
LEB
1,0000

AP
-0,4418
1,0000

-

GNIpc
0,6903
-0,2686
1,0000

HE
0,4000
-0,4621
0,3296
1,0000


PW
0,8394
-0,3510
0,5689
0,2819
1,0000

LEB
AP
GNIpc
HE
PW

The correlation between dependent variable and independent

Variables:
+ the correlation between LEB and HE is 0,4000 , this is uphill relationship,
following the theory but at weak level.
+ the correlation between LEB and AP is – 0,4418, this is downhill relationship,
following the theory at moderate level.
15


+ the correlation between LEB and PW is 0,8394, this is uphill relationship,
following the theory at strong level but not perfect.
+ the correlation between LEB and GNI is 0,6903, this is uphill relationship,
following the theory at moderate level.

4.3.


Run the model with Gretl
Model 1: OLS, using observations 1-215
Dependent variable: LEB

const
AP
GNIpc
HE
PW

Coefficient
Std. Error
43,9735
1,82709
−0,0437552 0,0142760
0,000113583 1,56374e-05
0,231884
0,103627
0,292728
0,0186636

Mean dependent var
Sum squared resid
R-squared
F(4, 210)
Log-likelihood
Schwarz criterion

71,80154

2468,808
0,794530
203,0115
−567,4634
1161,780

t-ratio
24,07
−3,065
7,264
2,238
15,68

S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

p-value
<0,0001
0,0025
<0,0001
0,0263
<0,0001

***
***
***

**
***

7,493114
3,428736
0,790616
5,82e-71
1144,927
1151,736

• Describe the basic content of the value when estimating the function:
- The Population regression function is set up:
LE = β1 + β2.HE + β3.AP + β4.PW + β5.GNIpc + ui
- The Sample regression function is set up:
= + + + + +
– Equation of regression:

16


LE = 43,9735 + 0,231884HE −0,0437552AP + 0,000113583GNIpc +
0,292728PW +

Data explaination:
Name

Symbol

Number of


Using

observatio

observations

n

= 1-215
F(4,210) =

F-statistic

203,0115

Meaning
There are 215 observations

This is the F-statistic is the Mean Square
Model divided by the Mean Square
Residual, yielding F= 203,0115. The
numbers in parentheses are the Model and
Residual degrees of freedom

Co-efficient

β 2 = 0,231884

Holding that other factors remain constant,
when spend for health expenditure increases

by 1% of GDP, life expectancy increases by
0,231884

β 3 =- 0,0437552

Holding that other factors remain constant,
when amount of air pollution increases by 1
µg/m3, life expectancy decreases by 0,0437552

β 4 = 0,292728

Holding that other factors remain constant, when
amount percentage of population using at least
basic drinking water services increases by 1%,
life expectancy increases by 0,292728

17


β5=

Holding that other factors remain constant,

0,000113583

when GNI increases by 1 USD, life
expectancy increases by 0,000113583

Constant


β 1 = 43,9735

(_cons)

When other independent variables equal 0,
the expected value of life expectancy at birth
is 43,9735

R-squared

R2= 0,794530

That R2 = 0.794530 is quite high, which suggests
that the model is good fit. Because this means
79.4530% of the sample variation in the
percentage vote for dependent variable ( Life
expectancy at birth – LEB) is explained by the
changes in the independent variables (HE, PW,
GNIpc, AP). Indicates that the model is able to
explain 79,45% changes in the DIOP of students.

Adjusted R-

= 0,790616

squared

A modified version of R-squared that has
been adjusted for the number of predictors in the
model


4.4.

Testing the regression coefficient

4.4.1. Testing an individual regression coefficient

• Purpose: Test for the statistical significace or the effect of independent variables on
dependent one. We have: α = 0.05.
♦Testing the variable of Heath expenditure (HE):
• Given that the hypothesis is:
18


{ ��: �� = �
��: �� ≠ �
• We see: P-value of HE is 0.0025 < 0.05
At 5% level of significant, we have enough evidence to reject the null hypothesis H0 →
The coefficient 2 is statistically significant.
♦ Testing the variable of Air pollution (AP):
• Given that the hypothesis is:

{ ��: �� = �
��: �� ≠ �
• We see: P-value of AP is < 0.0001 < 0.05
At 5% level of significant, we have enough evidence to reject the null hypothesis H0 →
The coefficient 3 is statistically significant.
♦Testing the variable of Pure water ( PW) :
• Given that the hypothesis is:
{ ��: �� = �

��: �� ≠ �
• We see: P-value of PW is < 0.0263 < 0.05
At 5% level of significant, we have enough evidence to reject the null hypothesis H0 →
The coefficient 4 is statistically significant.
19


♦ Testing the variable of Gross national Income per capita ( GNIpc) :
• Given that the hypothesis is:
H�: �5 = �

{ : �5 ≠ �
• We see: P-value of GNIpc is < 0.0001 < 0.05
At 5% level of significant, we have enough evidence to reject the null hypothesis H0 →
The coefficient 5 is statistically significant.
4.4.2.

Testing the overall significance.

• Purpose: Test the null hypothesis stating that none of the explanatory variables has an
effect on the dependent variable.We have:  = 0.05
• Given that the hypothesis is:
{ ��: �� = �
��: ∃�� ≠ � (i = 1, 2, 3, 4,5)
• We have: P-value(F) = 5.82e – 71 <  = 0.05
As a result, at 5% level of significant, there is enough evidence to reject the null
hypothesis and conclude that at least one independent variable in the subset (HE, PW, AP,
GNIpc) does have explanatory or predictive power on LEB, so we don’t reduce the model
by dropping out this subset.
◊ The model is statistically fitted.


20


4.5.

Testing multicollinearity.

Multicollinearity is the high degree of correlation amongst the explanatory
variables, which may make it difficult to separate out the effects of the individual
regressors, standard errors may be overestimated and t-value depressed. The problem of
Multicollinearity can be detected by examining the correlation matrix of regressors and
carry out auxiliary regressions amongst them.
• Using the following command VIF regression to examine multicollinearity.
“VIF” commands specific to the variance inflation factor, if a variable value vif > 10, the
model has the possibility of multicollinearity.
• Using “VIF” command in Gretl, we have following result:
Variance Inflation Factors
Minimum possible value = 1.0
Values > 10.0 may indicate a collinearity problem
AP 1,363
GNIpc 1,550
HE 1,350
PW 1,578
VIF(j) = 1/(1 - R(j)^2), where R(j) is the multiple correlation coefficient
between variable j and the other independent variables
Belsley-Kuh-Welsch collinearity diagnostics:
variance proportions
lambda cond
4,170 1,000

0,533 2,796
0,228 4,273
0,058 8,461
0,010 20,902

const
0,001
0,001
0,003
0,041
0,954

AP
0,009
0,180
0,296
0,308
0,207

GNIpc
0,012
0,319
0,438
0,021
0,209

HE
0,005
0,003
0,135

0,788
0,069

PW
0,001
0,000
0,003
0,107
0,889

lambda = eigenvalues of inverse covariance matrix (smallest is 0,00954623)
cond = condition index
note: variance proportions columns sum to 1.0
According to BKW, cond >= 30 indicates "strong" near linear dependence,
and cond between 10 and 30 "moderately strong". Parameter estimates whose
variance is mostly associated with problematic cond values may themselves
21


be considered problematic.
Count of condition indices >= 30: 0
Count of condition indices >= 10: 1
Variance proportions >= 0.5 associated with cond >= 10:
const
PW
0,954 0,889

We see:
VIF (AP) = 1.363 < 10
VIF (GNIpc) = 1.550 < 10

VIF (HE) = 1.350 < 10
VIF (PW) = 1,578 < 10
→ The model does not contain perfect multicollinearity.

4.6.

Testing normality of residual

• Given that the hypothesis is:
{ H0: The residuals have normality
1: The residuals don′ t have normality
• Using normality of residual in Gretl:
Frequency distribution for uhat4, obs 1-215
number of bins = 15, mean = 2,44559e-015, sd = 3,42874
interval

midpt frequency

rel.

cum.

< -12,207 -13,044
1
0,47% 0,47%
-12,207 - -10,533 -11,370
1
0,47% 0,93%
-10,533 - -8,8600 -9,6967
4

1,86% 2,79%
-8,8600 - -7,1865 -8,0232
4
1,86% 4,65%
-7,1865 - -5,5130 -6,3497
3
1,40% 6,05%
-5,5130 - -3,8395 -4,6762
14
6,51% 12,56% **
-3,8395 - -2,1660 -3,0027
14
6,51% 19,07% **
-2,1660 - -0,49249 -1,3292
41
19,07% 38,14% ******
-0,49249 - 1,1810 0,34426
54
25,12% 63,26% *********
1,1810 - 2,8545 2,0178
42
19,53% 82,79% *******
2,8545 - 4,5280 3,6913
25
11,63% 94,42% ****
4,5280 - 6,2015 5,3648
8
3,72% 98,14% *
22



6,2015 - 7,8750 7,0383
7,8750 - 9,5485 8,7118
>= 9,5485 10,385

3
0
1

1,40% 99,53%
0,00% 99,53%
0,47% 100,00%

Test for null hypothesis of normal distribution:
Chi-square(2) = 17,183 with p-value 0,00019

• We see: Chi-square(2) = 17,183 with p-value 0.0002 < α = 0.05
At 5% level of significant, we have enough evidence to reject the null hypothesis
H0 → The model does not have normality.
• Method: Increasing the number of observations until n ≥ 384.

4.7.

Testing Heteroskedasticity.

Heteroskedasticity indicates that the variance of the error term is not constant,
which makes the least squares results no longer efficient and t tests and F tests results
may be misleading. The problem of Heteroskedasticity can be detected by plotting the
residuals against each of the regressors, most popularly the White’s test. It can be
remedied by respecifying the model – look for other missing variables.


23


In Gretl, the imtest white command is used, which stands for information matric
test.
• Given that the hypothesis is:
�������

��: ��� ����� ���� ��� ���� ������������������

{ ��: ��� ����� ��� ������������������
• White’s test:
White's test for heteroskedasticity
OLS, using observations 1-215
Dependent variable: uhat^2
coefficient
std. error t-ratio p-value
---------------------------------------------------------const
91,9429
51,9750
1,769 0,0784 *
AP
−1,06955
0,678559
−1,576 0,1166
GNIpc
0,00411162
0,00258476
1,591 0,1133

HE
2,66988
7,03961
0,3793 0,7049
PW
−1,50364
1,37435
−1,094 0,2752
sq_AP
0,00171615
0,00357148
0,4805 0,6314
X2_X3
−2,73513e-06 4,09125e-06 −0,6685 0,5046
X2_X4
0,0228975
0,0517823
0,4422 0,6588
X2_X5
0,00810285
0,00611762
1,325 0,1868
sq_GNIpc 1,41019e-09 4,54122e-09 0,3105 0,7565
X3_X4
−4,17399e-05 4,56088e-05 −0,9152 0,3612
X3_X5
−3,87740e-05 2,85745e-05 −1,357 0,1763
sq_HE
0,312751
0,181513

1,723 0,0864 *
X4_X5
−0,0689671
0,0641929
−1,074 0,2840
sq_PW
0,00789122
0,00988041
0,7987 0,4254
Unadjusted R-squared = 0,208014
Test statistic: TR^2 = 44,722940,
with p-value = P(Chi-square(14) > 44,722940) = 0,000045

• We see:
p-value = P(Chi-square(14) > 44,722940) = 0,000045 < α = 0.05

24


 At the 5% significance level, there is enough statistical evidence to reject the

null hypothesis H0 and conclude that this set of data meets the problem of
Heteroskedasticity.
• Method: Using Robust to fix the problem:
Model 5: OLS, using observations 1-215
Dependent variable: LEB
Heteroskedasticity-robust standard errors, variant HC1

Const
AP

GNIpc
HE
PW

Coefficient Std. Error
43,9735
2,41354
−0,0437552 0,0140952
0,00011358 1,68835e-05
3
0,231884
0,292728

Mean dependent var
Sum squared resid
R-squared
F(4, 210)
Log-likelihood
Schwarz criterion

0,148813
0,0242668

71,80154
2468,808
0,794530
170,3188
−567,4634
1161,780


t-ratio
18,22
−3,104
6,727

p-value
<0,0001
0,0022
<0,0001

***
***
***

1,558
12,06

0,1207
<0,0001

***

S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
Akaike criterion
Hannan-Quinn

7,493114

3,428736
0,790616
9,81e-65
1144,927
1151,736

→ The model still contains heteroskedasticity problem but it does not affect to the
statistical inferrence
After Using Robust to fix the problem, we have the result:
The new model:
LEB = 43,9735 −0,0437552AP + 0,000113583GNIpc +0,231884HE +0,292728 PW
+

25


×