Tải bản đầy đủ (.docx) (30 trang)

tiểu luận kinh tế lượng the determinants impacting on female labor force participation

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (709.51 KB, 30 trang )

FOREIGN TRADE UNIVERSITY
FACULTY OF INTERNATIONAL ECONOMICS
…………..o0o…………..

ECONOMETRICS FINAL EXAM
TOPIC: THE DETERMINANTS IMPACTING ON
FEMALE LABOR FORCE PARTICIPATION

Class

: 57 JIB

Lecturer : Dr. Tu Thuy Anh
Dr. Chu Thi Mai Phuong
Group 21 : Nguyễn Đặng Sơn- 1815520093
Nguyễn Thị Thương-1815520224
Chu Diệu Linh-1815520186

Ha Noi - 2019

1


TABLE OF CONTENTS

INTRODUCTION

3

LITERATURE REVIEW


4

1. Definition

4

2. Theoretical Framework

4

ANALYSIS

8

SECTION 1: METHODOLOGY, DESCRIBE THE VARIABLE, DATA,
STATISTIC AND CORRELATION.

8

I. METHODOLOGY OF RESEARCH

8

II. DESCRIBE THE DATA

8

1. Data overview

8


2. Data description

8

SECTION 2: ESTIMATED MODEL AND STATISTICAL INFERENCES. 11
I. LINEAR-LINEAR MODEL

11

1. Estimation.

11

2.Testing

12

2.1. Testing hypothesis

12

2.2. Testing the model’s problems

14

II. LOG-LINEAR MODEL.

18


1. Estimation

18

2. Testing

19

2.1. Testing hypothesis

19

2.2. Testing the model’s problems

20

2.3. Testing Heteroskedasticity.

23

CONCLUSION

25

REFERENCES

26

APPENDIX


27

2


INTRODUCTION
Despite representing just over half of the adult population worldwide,
women were underrepresented in the workforce—women were working at a lower
rate than men in nearly every country, their contribution to measured economic
activity, economic growth, and well-being is way below its potential. According to
the World Bank (2013) women now represent around 40 percent of the global labor
force, but in most countries, women labor force participation is much less than that
of men. However, these gender differences have been narrowing substantially, and
in most countries around the world now, the share of women who are part of the
labor force is higher today than half a century ago.
Generally, high female participation in the labor market implies two things;
advancement in the economic and social position, and empowerment of women.
Increasing female labor force participation rates creates an opportunity for countries
to increase the size of their workforce and achieve additional economic growth. The
clear understanding of such factors and their effect on women’s propensity to
participate plays a very important role in determining prospective growth and
development of countries. It might help us come up with new ways to encourage
female participation or address those problems that discourage females from
participating in the labor market. Although countless factors affect FLFP (Female
Labor Force Participation) when analyzing, whether they really affect or not and
how much they affect is still a big question. Nonetheless, currently there are not so
many apparent and in-depth studies on this issue. Therefore, our team decided to
choose the topic:” The determinants impacting on the Female Labor Force
Participation”.
We use the Gretl prepackaged set data 4.5 Ramanathan, Women’s labor force

participation to analyze some factors like: unemployment, white females, education,
median earning,… And With the help of the linear regression model and log-linear
model in combination with the OLS estimation method, we attempt to consider the
relationship between these above factors and further ask which factors drive FLFP
changes over time within countries, and which factors account for differences in
FLFP rates between countries.

3


LITERATURE REVIEW
1. Definition
Female Labor Force Participation was defined as the women’s decision to be part of
the economically active population: employed or unemployed population as
compared to being part of the economically inactive population of the economy –
those not working nor seeking work. FLFP is an important indicator of women’s
status and benchmark of female empowerment in society (Kapsos, Silberman and
Bourmpoula 2014; ILO).
2. Theoretical Framework
The theoretical framework on FLFP reflects the female’s decision to be an active
participant versus being an inactive participant in the labor market. Economists
have tried to explain female’s propensity to decide on one choice over another
through analyzing the impact of certain economic and demographic factors, which
they believed would affect female’s tendency to participate or opt-out of the labor
market. The main theories that have been used to analyze the labor supply of
women included: “Human Capital Theory” by Becker, “The Work-Leisure Choice
theory” by Mincer.
2.1. The Work-Leisure Choice theory
The simplest analysis of women’s choice goes back to the early 1960s to Mincer
(1962) and the neoclassical microeconomic model known as; Work-Leisure Choice

model, which assumed that households; suppliers of labor in an economy are
rational and seek to maximize their utility; deciding on how much time to devote to
work and how much time to devote for leisure. The theory was explained by
Psacharopoulos and Tzannatos (1989) who further added that since the choice is
based on the remuneration from work (wage rate) then the higher the wage rate, the
less attractive leisure becomes and the more attractive work becomes. Such relation
has two effects; substitution effect and income effect. Firstly, for whoever is not
working, a higher wage may encourage them to join the labor market for that the
opportunity cost of not working will be high; thus higher wages are said to stimulate
higher participation. Secondly, for those already working, a higher wage makes
work more attractive for that it has a higher rate of return than leisure. Encouraging

4


participation or working more time as a result of an increase in the wage rate is
known as the substitution effect as leisure time becomes more costly. Individuals
then tend to devote more time to work rather than leisure. On the other hand, as
wage rate increases, an individual’s real income rises this leads to an increase in the
consumption of normal goods and if as previously assumed leisure is a normal
good, the higher wage would persuade individuals to consume larger quantity (time)
of leisure and reduce hours of work and that is known as the income effect resulting
from a wage increase (FRF 1979; Heckman 2014).
According to the textbook “Race, Class, and Gender”, it can be said that
“Women are at a higher risk of financial disadvantage in modern-day society than
men”. Statistical findings suggest that women are underpaid for similar jobs men
complete despite having the same qualifications. The statistical data collected by the
U.S. Department of Labor suggests that women are discriminated against in the
workforce based on gender. The textbook reads, “Women’s wages are also more
volatile than men’s wages, and women face a much higher risk of seeing large drops

in income than do men” (Kennedy 2008).
2.2. Human Capital Theory
After the Work-Leisure Choice theory, Human Capital Theory was
developed. According to Becker (1975), human capital can be defined as the
productive investments embodied in individuals, including skills, abilities,
knowledge, habits, and social attributes often resulting from expenditures on
education, on-the-job training programs, and medical care. The human capital
theory was then used to analyze the relationship between labor force participation
and education specifically for married women. Economists argue that the
relationship may be U-shaped across educational attainment categories.
Accordingly, participation rate was found to be high for illiterate women, lower for
women at the primary and secondary education level and higher for university
graduates. The positive relation between education and wage rate can explain such
U-shaped relationship (Schultz 1961). Higher labor force participation at low levels
of education – illiterate and thus low wages can be explained by the need to earn
some income for survival – subsistence wage. Furthermore, the low level of
participation for married women with a primary and secondary level of education
might be explained by that women with such low levels of education mostly seek
job opportunities only in specific occupations such as secretarial work. Thus when
there is a shortage in such jobs, women with such low educational attainment tend
to stay home. Besides that, it is common in most developing countries that women

5


with lower levels of education to work in the household – household production or
in the informal sector, which is excluded from the definition of the labor force.
Consequently, informal sector workers are not included in the labor force and thus
not reflected in the FLFPR, therefore, indicating a low female participation rate
(Cameron et al. 2001; Lincove 2005; Schultz 1961).

Particularly, studies of female labor force participation suggest that the most
important personal variable influencing FLFPR is education. The hypothesis that
education can be generally treated as an investment in human capital has proved to
be influential and helpful in its way and to be a key ingredient in studies of the
sources of economic development and the distribution of income all over the world.
Education is mostly regarded as a specialized form of human capital, contribution to
which economic growth is noteworthy. The human capital theory proposes that just
as physical capital – machines enhance people's economic efficiency, so human
capital acquired through education improves the productivity and efficiency of
individuals. Studies of the sources of economic growth credibly confirm that
education plays a major role in increasing output per worker. In accordance, the new
development theories in economics shed light on the importance of education and
human resource development for long term economic growth. It is usually regarded
as the catalyst or engine of growth and development in the new world economy
(Becker 1975; Psacharopoulos and Tzannatos 1989; Taubman and Wales 1975;
OECD 1989).
2.3 Other Factors Influencing Female Labor Force Participation
2.3.1 Age Factor
Women in their twenties and thirties have higher chances to participate in the
labor market as compared to their counterparts in other age groups. On one hand, it
was empirically proven through a study undertaken in Kuwait and Jordan that age
negatively affects FLFP. On the other hand, a study undertaken in Pakistan has
showed that the effect of age on FLFP is positive only up till the age of 49, which
after then negatively affects women’s tendency to participate in the labor market. It
was then concluded that age could positively or negatively affect FLFP, all based on
the age group considered.
2.3.2. Urbanization factor
In urban areas there may be more paid employment opportunities than in rural
areas. Thus, the higher the proportion of the population living in urban areas, the
higher will be the female labor force participation. However, most women in rural


6


areas participate in the labor force in large numbers in agriculture as unpaid family
workers. Thus, if a province has a large rural population the female labor force
participation may be high. This implies a negative sign of the impact of the urban
share of a province on the female labor force participation. The net effect of urban
share can be empirically determined.
2.3. Unemployment factor
The effect of the unemployment rate on female labor force participation is
ambiguous depending on the relative strengths of “discouraged-worker effect” and
the “added-worker effect”. Unemployment affects the probability that women
entering the labor market will find a job. The higher the provincial unemployment
rate, the less likely will it be for women to find a job. Economic and psychological
costs associated with job search will be higher when the local unemployment rate is
high. The unemployment rate of women compared to men suggests that single
women are discriminated against based on gender. Anderson writes, “All women
are disproportionately at risk in the current foreclosure crisis, since women are 32%
more likely than men to have subprime mortgages (One-third of women, compared
to one-fourth of men, have subprime mortgages; and, the disparity between women
and men increases in higher income brackets)” (Anderson 265). The statistical
information illustrates the dramatic difference between men and women in regards
to finances. It can be inferred that men are favored in the workforce over women.
Women are discriminated against based on their gender and thus are more likely to
struggle financially because of discriminatory employers. For these reasons, women
may be discouraged from looking for a job and drop out of the labor force.
Therefore, the discouraged-worker hypothesis implies a negative effect of the local
unemployment on female labor force participation.


7


ANALYSIS
SECTION 1: METHODOLOGY, DESCRIBE THE VARIABLE, DATA, STATISTIC
AND CORRELATION.
I. METHODOLOGY OF RESEARCH

Using the Quantitative method to determine the relationship between women labor
force participation and these influential factors’ relevant
We are implementing 2 models: Linear -linear regression and Log-linear regression
by OLS- normal least square method to determine the direction of the impact
independent variables on the dependent variable and regression coefficient value.

II. DESCRIBE THE DATA
1. Data overview

- Data’s source: We use the data from Gretl source.
- The structure of Economic Data: cross-sectional data.

2. Data description
2.1 A brief description of each variables is given in Exhibit 1

8


Variables

Abbreviation


wlfp

Y

Meaning

Unit

person ≥ 16 years:% in labor force

%

who are female
yf

X1

Median earning ($000s) by female ≥

$

15 years
educ

X2

female ≥ 25 years: % high school

years


graduation. or above
ue

X3

civilian labor force, % unemployed

%

urb

X4

percent of population living in urban area

%

wh

X5

females ≥ 16 years : percent white

%

(Exhibit 1.Description of each variables/ Source: Gretl self-aggregated )
2.2. Describe the statistics between variables
wlfp

yf


educ

ue

urb

wh

l_wlfp

Mean

57,47

18,42

76,11

6,160

68,18

65,91

4,049

Median

57,75


18,08

77,10

6,150

68,80

69,13

4,056

Minimum

42,60

14,27

64,50

3,500

32,20

24,69

3,752

Maximum


66,40

25,62

86,10

9,600

92,60

77,73

4,196

Std. Dev.

4,249

2,703

5,736

1,364

14,67

9,379

0,07670


( Exhibit 2. Describe the statistics between variables via self-synthesis based on
Gretl)

2.3 Describe the correlation between variables

9




Correlation matrix for Linear- linear Model

Before running the regression model, we consider the degree of correlation between
variables using the command correlation.
Correlation Coefficients, using the observations 1 - 50
5% critical value (two-tailed) = 0.2787 for n= 50 observations
wlfp

yf

educ

ue

urb

wh

1,000


0,5476

0,6582

-0,5887

0,2705

-0,1039

wlfp

1,000

0,3883

-0,0488

0,6178

-0,1264

yf

1,000

-0,3986

0,2340


0,2262

educ

1,0000

-0,1607

-0,0651

ue

1,000

-0,2293

urb

1,000

wh

(Exhibit 3.The correlation between variables? Source: Gretl self-aggregated )
Look at the table of correlation, we draw some comments:
+ r( yf,Y) = 0,5476 >0 =>The variable yf is positively correlated with the variable
Y. On that basis, the regression coefficient of yf is marked with (+). The correlation
between yf and Y is a strong mean correlation (= 54,76%)
+ r( educ,Y) = 0,6582 >0 => The variable educ is positively correlated with the
variable Y. On that basis, the regression coefficient of educ is marked with (+).

Besides, experience and education affect 65,82% on women’s participants in labor
force.
+ r (ue,Y) = -0,5887 <0.The variable ue is negatively correlated with the variable
Y. On that basis, the regression coefficient of ue is marked with (-).
+ r (urb,Y) =0,2705 >0 .The variable urb is positively correlated with the variable
Y. On that basis, the regression coefficient of urb is marked with (+).Living in urban
or rural areas also has a relative impact on the female labor force, but in urban areas
in terms of opportunities, employment will increase more women in the labor force.
(=27,05%)
+ r (wh,Y) = -0,1039<0.The variable wh is negatively correlated with the variable
Y. On that basis, the regression coefficient of wh is marked with (-).The racial
difference affects the female labor force, the two-correlation coefficient is negative,

10


indicating that the trend of participation in the labor force is less white women than
for black.
In general, the correlation between independent variables is not high, the
highest correlation coefficient is only 0.6178 between urb and yf . Because there is
no correlation coefficient of magnitude exceeding 0.8, it is possible to predict that
the model has no collinearity phenomenon when regressing.

SECTION 2: ESTIMATED MODEL AND STATISTICAL INFERENCES.
I. LINEAR-LINEAR MODEL
1. Estimation.

Describe the basic content of the value when estimating the function:
- The population regression function (PRE) is set up:
wlfp = β0 + β1yf + β2educ + β3ue+ β4urb +β5wh + ui

-The Sample regression function (SRF) is set up:
wlfp = β0 +β1yf + β2educ + β3ue+ β4urb +β5wh+ ei ( ei is error)
- Equation of regression:
wlfp = 41,5811+0,796960yf +0,284961educ -1,45164ue -0,0744791urb
-0,0978928wh + e

● Meaning of coefficient

11


β0 : is estimator of β0 =41,4811 , when independent variables YF, YM, EDUC, UE,
MR, DR, URB, WH are 0, the mean value of the dependent variable equal 41,5811.
ei: is the residual ( the estimator of ei)
β1, β2,β3,β4,β5: ( the estimate of slope coefficient) when the value of these variables
YF, YM, EDUC, UE, MR, DR, URB, WH change in one unit ( the remaining factors
are constant), the mean value of dependent variable ( WLFP) will change follows
β1, β2,β3,β4,β5
β1= 0,796960 when yf increase by 1 ( dollar earning by female ), holding the value
of the other coefficient constant, the estimated value of wlfp increase by 0,796960
β2=0,284961 when educ increase by 1 ( year) , holding the value of the other
coefficient constant, the estimated value of wlfp increase by 0,284961
β3=-1,45164 when ue increase by 1 (% unemployed ), holding the value of the other
coefficient constant, the estimated value of wlfp decrease by -1,45164
β4= -0,074479 when urb increase by 1 (% of population living in urban area),
holding the value of the other coefficient constant, the estimated value of wlfp
decrease by -0,074479
β5=-0,0978928 when wh increase by 1 (white female), holding the value of the other
coefficient constant, the estimated value of wlfp decrease by -0,0978928
● The coefficient of determination R2

In the results, we can see R2 which indicates that the model explains all the
variability of the response data around its mean.
R2 = 0,758038 is quite high, which suggests that the model is a good fit. Because
this means 75,8038% of the sample variation in the percentage vote for the
dependent variable ( women in labor force participation ) is explained by the
changes in the independent variables ( Median earning, education, unemployed,
urban area and white female).Other factors that are not mentioned explain the
remaining 24,1962% of the variation in the wlfp.

2.Testing

2.1. Testing hypothesis

2.1.1. Testing an individual regression coefficient Purpose:

12


Test for the statistical significance or the effect of independent variables on
dependent one. We have: α = 0.05.
● Testing the variable of Median earning of female (Yf)
● Given that the hypothesis is:



��: �1 = �
��: �1 ≠ 0

● We see: P-value of yf is < 1.09e-05 < 0.05 → Reject H0 → The
coefficient �1 is statistically significant.

● Testing the variable of Educ:
● Given that the hypothesis is:

��: �2 = �



��: �2 ≠ �

● We see: P-value of educ is < 0.0001 < 0.05 → Reject H0 → The
coefficient �2 is statistically significant.
● Testing the variable of Ue:
● Given that the hypothesis is:

��: �3 = �



��: �3 ≠ �

● We see: P-value of ue is < 1.23e-06 < 0.05 → Reject H0 → The
coefficient �3 is statistically significant.
● Testing the variable of urb:
● Given that the hypothesis is:

��: �4= �
��: �� ≠ �




● We see: P-value of urb is < 0.0118 < 0.05 → Reject H0 → The
coefficient �4 is statistically significant.
● Testing the variable of wh:
● Given that the hypothesis is:

��: �5 = �



��: �5 ≠ �

We see: P-value of wh is < 0.0098 < 0.05 → Reject H0 → The
coefficient �5 is statistically significant.
2.1.2. Testing the overall significance.
Purpose: Test the null hypothesis stating that none of the explanatory variables has
an effect on the dependent variable.We have: � = 0.05
Given that the hypothesis is:

��: �� = �
��: ∃�� ≠ �

)

13

(i = 1, 2, 3, 4,5


We have: P-value(F) = 1.57e-12 < � = 0.05 → Reject H0 → All parameters
are not simultaneously equal to zero→ At least one variable has an

effect on dependent one.
The model is statistically fitted.
2.2. Testing the model’s problems
2.2.1. Testing multicollinearity
Multicollinearity is the phenomenon of the independent variables in the model that
are interdependent and are shown as a function.

Cause of multicollinearity many but mainly after 3 basic reasons:
+
+
+

Because of less data collection, not comprehensive.
Due to the nature of the independent variables are correlated.
Due to several models produced multicollinearity.
✔ Consequences of multicollinearity:

+ Estimates variance becomes less accurate.
+
The value of the term t becomes smaller than actual when R2 is quite high. Ttest and F becomes less effective.
+ The estimated value when volatility changes in the data model.
+ The value of the estimated volatility likely to change ( draw or add ) the
variables involved in the multicollinearity phenomenon.
In order to check whether the model has a multicollinearity problem or not, we have
2 ways to check. It includes 2 methods: using the VIF (Variance Inflation Factors)
and using the correlation between the variables each other.

14



a.

VIF (Variance Inflation Factors)

b.
Using the following command vif regression to examine multicollinearity.
“VIF” commands specific to the variance inflation factor, if a variable’s value vif >
10, the model has the possibility of multicollinearity.
All the VIF of each variables has the value < 10
● MeanVIF = 1.523 < 10
● Conclusion: No multicollinearity are found.
b. Correlation

15


Following to the above table, the correlation between each couple of variables is
lower than 0.8. Again, we have enough statistical evidence to conclude that no
multicollinearity is found in our model.
2.2.2. Testing Normality

Test for normality of residual
Null hypothesis: H0 : error is normally distributed
Two-sided alternative hypothesis H1: error is not normally distributed
Test statistic: Chi-square(2) = 4.602 with p-value = 0.10018 > 0.05

16


● We have enough statistical evidence to accept the hypothesis

H0 (the error is normally distributed). In the other words, our
model has normal distribution.

\
2.2.3. Testing Heteroskedasticity.
The homoskedasticity assumption states that the variance of the unobservable error
(u) is constant. Homoskedasticity fails whenever the variance of the unobservable
changes across different segments of the population where the segments are
determined by the different values of the explanatory variable. In that phenomenon
is called the heteroskedasticity.
We can use the following command “white test” to examine heteroskedasticity. If
the p-value is smaller than 0.05, the model has the heteroskedasticity.
White test

17


White's test for heteroskedasticity
Null hypothesis: H0: heteroskedasticity not present
Two-sided alternative hypothesis: H1: heteroskedasticity presents
Test statistic: LM = 24.6526
with p-value = P(Chi-square(20) > 24.6526) = 0.215047 > 0.05
We have enough statistical evidence to accept the hypothesis H 0. In conclusion, our
model does not have heteroskedasticity.
CONCLUSION:
Through 3 tests of Normality, Multicollinearity and Heteroskedasticity, our linearlinear model has met the requirements of 3 assumptions. Hence, this model does not
have any problems and has meaning in statistics.

II. LOG-LINEAR MODEL.


18


1. Estimation

Describe the basic content of the value when estimating the function:
_ The PRE is set up:
lnwlfp= β0 + β1yf + β2educ + β3ue+ β4urb +β5wh + ui
_ The SRF is set up:
lnwlfp=β0 +β1yf + β2educ + β3ue+ β4urb +β5wh+ ei
_ Equation of regression
lnwlfp= 3,77140 +0,0134091yf+ 0,00508077educ - 0,0269259ue-0,00114536urb0,00170770wh+ e
● Meaning of coefficient
β1= 0,013409 when median earning by female (yf) increase by 1 unit ,keeping the
value of other coefficients constant , the expected value of women in labor force
participation increased by 1,3409%
β2= 0,00508077 when educ increase by 1 (year ), keeping the value of other
coefficients constant , the expected value of wlfp increase by 0,508077%
β3= -0,269259 when ue increase by 1 ( % unemployed ),keeping the value of other
coefficients constant , the expected value of wlfp decrease by 26,3259%
β4= -0,00114536 when increase the percent of the population living in the urban
area by 1 unit ,keeping the value of other coefficients constant , the expected value
of women in labor force participation decreased by 0,114536%
β5= -0,00170770 when wh increasing by 1 ( white female), keeping the value of
other coefficients constant, the expected value of wlfp decrease by 0,14077%
● The coefficient of determination R2

19



In the results, we can see R2 which indicates that the model explain all the
variability of the response data around its mean.
R2 = 0,749517 is quite high, which suggests that the model is a good fit. Because
this means 74,9517% of the sample variation in the percentage vote for the
dependent variable ( women in labor force participation ) is explained by the
changes in the independent variables ( Median earning, education, unemployed,
urban area and white female).Other factors that are not mentioned explain the
remaining 25,04833% of the variation in the wlfp.

2. Testing

2.1. Testing hypothesis

2.1.1. Testing an individual regression coefficient
Purpose: Test for the statistical significance or the effect of independent variables
on dependent one. We have: α = 0.05.
● Testing the variable of Median earning of female (Yf)
● Given that the hypothesis is:

��: �1 = �



��: �1 ≠ 0

● We see: P-value of yf is < 2.03e-06 < 0.05 → Reject H0 → The
coefficient �1 is statistically significant.
● Testing the variable of Educ:
● Given that the hypothesis is:


��: �2 = �



��: �2 ≠ �

● We see: P-value of educ is < 0.0001 < 0.05 → Reject H0 → The
coefficient �2 is statistically significant.
● Testing the variable of Ue:
● Given that the hypothesis is:

��: �3 = �



��: �3 ≠ �

● We see: P-value of ue is < 7.23e-06 < 0.05 → Reject H0 → The
coefficient �3 is statistically significant.
● Testing the variable of urb:
● Given that the hypothesis is:

��: �4= �

20


��: �� ≠ �




● We see: P-value of urb is < 0.0740 < 0.05 → Reject H0 → The
coefficient �4 is statistically significant.
● Testing the variable of wh:
● Given that the hypothesis is:

��: �5 = �



��: �5 ≠ �

● We see: P-value of wh is < 0.0536 < 0.05 → Reject H0 → The
coefficient �5 is statistically significant.
2.1.2. Testing the overall significance.
Purpose: Test the null hypothesis stating that none of the explanatory variables has
an effect on the dependent variable. We have: � = 0.05
Given that the hypothesis is:

��: �� = �
��: ∃�� ≠ �

(i = 1, 2, 3, 4,5)

We have: P-value(F) = 2.38e-11 < � = 0.05 → Reject H0 → All parameters
are not simultaneously equal to zero→ At least one variable has an
effect on dependent one.
The model is statistically fitted.

2.2. Testing the model’s problems


2.2.1 Testing multicollinearity
In order to check whether the model has a multicollinearity problem or not, we have
2 ways to check. It includes 2 methods: using the VIF (Variance Inflation Factors)
and using the correlation between the variables each other.

21


● VIF

(Variance

Inflation

Factors)

● Using the following command vif regression to examine multicollinearity. “VIF”
commands specific to the variance inflation factor, if a variable’s value vif > 10,
the model has the possibility of multicollinearity.
All the VIF of each variables has the value < 10
MeanVIF = 1.4326 < 10
=>Conclusion: No multicollinearity are found

22


●CORRELATION

●Following to the above table, the correlation between each couple of variables

is lower than 0.8.
●Again, we have enough statistical evidence to conclude that no
multicollinearity are found in our model.
2.2.2. Testing Normality

Using the “normality of residual” in Gretl:
Test for normality of residual :
Null hypothesis: error is normally distributed
Test statistic: Chi-square(2) = 9.04483 with p-value = 0.0108628 < 0.05
We have enough statistical evidence to reject the null hypothesis H 0. It means error
is not normally distributed.

23


To fix this problem, we should add more observation to make the big enough to
reach normal distribution. Because our model now just got 50 observations for each
variables.
Method: Increasing the number of observations until n ≥ 384 to
obtain the normal distribution.
2.3. Testing Heteroskedasticity.
We can use the following command “white test” to examine heteroskedasticity. If
the p-value is smaller than 0.05, the model has the heteroskedasticity.

24


White's test for heteroskedasticity Null hypothesis: heteroskedasticity not present
Test statistic: LM = 32.7672 with p-value = P(Chi-square(20) > 32.7672) =
0.0357784 < 0.05

We have enough statistical evidence to reject the null hypothesis H 0. Conclusively,
our model is heteroskedasticity.
Fix the problem.
The log-linear model has the heteroskedasticity problem so we try using Robust to
fix the problem by using the command “Heteroskedasticity-robust standard
errors, variant HC1”.

25


×