Tải bản đầy đủ (.pdf) (21 trang)

(TIỂU LUẬN) RMIT international university vietnam ASSINGMENT 3 PART a DATA COLLECTION the data for the total number of deaths due to COVID 19 between april 01

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.61 MB, 21 trang )

RMIT International University Vietnam
ASSINGMENT 3 PART A
Subject code
Subject name

ECON1193B
Business Statistic 1

Location and campus
Title of assignment

RMIT Vietnam – South
Saigon
Team assignment report

Lecture

Greeni Maheshwari

Assignment due date

18th September, 2020

Team

Team 06

Number of page

12


1
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


TABLE OF CONTENTS
PART 1: DATA COLLECTION

3

PART 2: DESCRIPTIVE STATISTICS

3

PART 3: MULTIPLE REGRESSION

5

PART 4: TEAM REGRESSION CONCLUSION 6
PART 5: TIME SERIES

7

PART 6: TIME SERIES CONCLUSION 11
PART 7: OVERALL TEAM CONCLUSION
REFERENCES

12

15


APPENDIX 16

CONTRIBUTION
First name

Student ID

Parts contributed Contribution

Khanh

S3811511

1,3,5,7

100%

Kha

S3826384

1,5,6,7

100%

Thinh

S3818172

2,4,6


100%

Lan

S3836374

2,3,7

100%

Hoang

S3826384

1,3,7

100%

Signature

PART 1: DATA COLLECTION
The data for the total number of deaths due to COVID 19 between April 01 to July 31, 2020,
and five other variables including average temperature (in Celsius) and average rainfall (in
mm) based on available data from 1991 to 2016, medical doctors ( per 10,000 people, latest
available), hospital beds (per 10,000, latest available) and population of the country (in
2
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06



millions, latest available) for 50 countries in Region A: Asia and 23 countries in
Region B: North America were collected. After the cleaning process, there are 46 countries
remaining in Region A: Asia and 21 countries remaining in Region B: North America. The
datasets are presented in the attached Excel file.

PART 2: DESCRIPTIVE STATISTICS


Central Tendency Measurements
Central Tendency
Asia

North America

Mean

36.756

91.747

Median

10.031

27.09

Mode

0


0

Figure 1. Measures of Central Tendency of total number of deaths due to COVID-19 between
April 01 to July 31, 2020, in Asia and North America.
In comparing the total death in Asia and North America by using the Central Tendency
measurements, there is nothing worth notice in the mode figure, which will not be
considered. Moreover, the mean will not be used to interpret since there is the existence of
outliers, based on the calculation in appendix 1.1 and appendix 1.2. Consequently, the
Median will be the most suitable measurement for the comparison which illustrates that 50
percent of the values are greater than the median and the remaining 50 percent are lower than
the median.At first glance, it can be clearly defined that there is a significant difference
between Asia and North America middle number of total deaths relating to the COVID-19. In
addition, North America with the figure of 27.09, which is roughly three times higher than
Asia with the median of 10.031. Therefore, it can be concluded that North American
countries have more deaths relating to the Cocid-19 than the Asian countries.
 Box and whisker plot

Figure 2. Box-and-whisker plots of total number of deaths due to COVID 19 in Asia and EU
countries.

3
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


As can be seen from the box and whisker plot we drew above, the data distribution of
Asia and North America region are both right-skewed. Moreover, the right whiskers of Asia
and North America are both longer than the left whiskers shows the presence of outliers in the
datasets. The box and whisker plots show that 75% of countries in North America have more
than 27 deaths per million population while 75% of countries in Asia have only more than 10
deaths. In addition, 25% of the number of deaths in Asia is around 1 to 10 deaths and 2 to 27

deaths in North America. From which demonstrates that North American countries have a
higher death rate than Asian countries.
 Measurements of variation
Variation Measurements
Asia

North America

Range

248.04

447.099

IQR

50.501

99.125

Variance

3170.34

17621.706

Standard Deviation

56.306


132.747

Coefficient of Variation

86.253

192.068

Figure 3. Measures of Variation of total deaths in Asia and EU (Unit: number of deaths
except for the Coefficient of Variation).
In this scenario, the best measure of variation is the Interquartile Range (IQR) due to the
existence of outliers. In addition, standard deviation is not suitable to measure because it can
be heavily influenced by the outliers, the coefficient of variation is also not a good choice as
we can notice that the distribution of the datasets above is highly right-skewed. The
Interquartile Range of Asia region (50.501) is smaller than the Interquartile Range of North
America (99.125), indicating that the dispersion of data of Asia region around the median is
smaller. In other words, the total number of deaths by Covid-19 in Asia are more consistent
than in North America, or the Covid-19 pandemic has less impact on the Asia region than on
North America.

PART 3: MULTIPLE REGRESSION
1. Region A: Asian countries (FINAL)
After applying backward elimination, we find that one variable which is the average rainfall
is significant at a 5% level of significance. The FINAL regression model for Asian countries
is given below.
a. Regression output

4
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06



Figure 4: FINAL regression model of Region A: Asia
b. Regression Equation : = b0 + b1 *
= 61.01 - 0.286*
c. Regression coefficient of the significant independent variable
The slope b1= - 0.286 indicates that the total number of deaths due to COVID 19 between
April 01 to July 31, 2020, decreased by 0.286 deaths with every mm increase in the amount
of rainfall.
In this case, for no rainfall, b0 = 61.01, which makes sense as it is possible to have deaths
regardless there is rain or not. Also, the intercept indicates that over the sample size selected,
the portion of the total number of deaths due to COVID 19 between April 01 and July 31,
2020, is not explained by the average rainfall (in mm) of a country is 61.01 deaths. Therefore,
the total number of deaths is 61.01 when there is no rainfall.
d. The coefficient of determination
The coefficient of determination (R square = 16.3%) shows that 16.3% of the total variation
in the total number of deaths due to COVID 19 from April 01 to July 31, 2020, can be
explained by the variation in the amount of rainfall, while 83,7% of the total variation in the
total number of deaths due to COVID 19 between April 01 and July 31, 2020, is due to non
included factors in the observation.
2. Region B: North American Countries (FINAL)
After applying backward elimination, we find that only one variable named Population (in
millions) is significant at a 5% level of significance. The Final regression for North American
countries is given below.

5
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


a. Regression Output


Figure 5: FINAL regression model of Region B: North America.
b. Regression Equation: = b0 + b1*
= 52.98 +1.399*
c. The regression coefficient of the significant independent variables
The slope b1= 1.399 indicates that the total number of deaths due to COVID 19 between
April 01 to July 31, 2020, increased by 1.399 deaths with every million people increasing in
the population of the country.
In this case, for no population, b0= 52.98, which makes no sense. However, the intercept
simply indicates that over the sample size selected, the portion of the total number of deaths
due to COVID 19 between April 01 and July 31, 2020, not explained by the number of the
population of the country is 52.98 deaths. Also, when X1 = 0, that means it is impossible to
have deaths when there is no population.
d. The coefficient of determination
The coefficient of determination (R Square = 61.1 %) shows that 61.1 % of the total variation
in the total number of deaths due to COVID 19 from April 1 to July 31, 2020, can be
explained by the variation in the population of the country, while 38.9% of the total variation
in the total number of deaths due to COVID 19 between April 1 and July 31, 2020, is due to
non included factors in this observation.

PART 4: TEAM REGRESSION CONCLUSION
According to the study in Part 3, the final claim is that the two regions have the same amount
of significant independent variables but in different types including average rainfall (in mm),
hospital beds (per 10,000 population), medical doctors (per 10,000 population), average
temperature (in Celsius) and population (in millions). In the Asia final regression model, the
significant independent variable is the average rainfall (in mm). In the North America data
set, the significant independent variable in the final regression model is Population (in
millions) among the five listed above variables. In comparison, the North America region has
remarkably more total deaths according to the findings in part 2, which means the region has
6
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06



been impacted more than the Asia Region due to the pandemic. Moreover, from the
study in part 3, 61.1% of the total variation in the total total deaths in North America due to
COVID 19 can be explained by the population of the country (in millions) which illustrates
that the variation of population contributes a major impact to the variation of the total number
of deaths in the NA region. Meanwhile, in Asia, only 16.3% of the variations in the total
number of deaths can be explained by the variation of the average rainfall (in mm), which
means that the average rainfall influence on the total deaths is not too great and a large
amount of other considerable factors that are not included in the study leading to a lower
reliable result compared to that of the North America region.
To conclude, by building the regression models and comparing the descriptive statistics of
two regions, this study indicates that the average rainfall can be used to forecast the total
number of deaths due to COVID 19 in Asia while in North America, the population of the
country is the independent variable that can be utilized to predict the total number of deaths.
Also, the North American countries have suffered a higher impact due to the greater number
of deaths due to the pandemic in comparison to Asian countries.

PART 5. TIME SERIES
In part 5, our group collected data for the total number of deaths per day in two regions Asia
and North America from April 01 to July 31, 2020. In the collected datasets, if there are no
deaths on a particular day and hence to build the exponential trend model, we will take
0.00005 instead of 0 to build the exponential trend model as log(0) cannot be calculated. The
datasets are presented in the attached Excel file.
1. Build Linear, Quadratic and Exponential trend models.
1.1 Region A: Asia
After testing the Hypothesis for trend models in the Asia region (appendix 3.1), the findings
indicate that linear, quadratic and exponential trend models are significant for this region.



a.

Linear Trend Model
Regression output

Figure 6. Time Series outputs for Region A: Asia linear trend.
b. Formula:
= 88.199 + 10.366*
7
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06






a.

The slope b1= 10.366 indicates that the total number of deaths due to COVID
19 between April 01 to July 31, 2020, increased by 10.366 deaths every day.
b0 = 88.199 when T = 0, which illustrates that there were 88.199 deaths on 31 March,
2020.
Quadratic Trend Model
Regression output

Figure 7. Time Series outputs for Region A: Asia quadratic trend.
b. Formula:
=338.607–1.75*+ 0.0985*
 The slope b2= 0.0985 indicates that the total number of deaths due to COVID 19
between April 01 to July 31, 2020, increased by 0.0985 deaths every .

 b0 = 338.607 when T = 0, which illustrates that there were 338.607 deaths on 31
March, 2020.
 Exponential Trend Model
a. Regression output

Figure 8. Time Series outputs for Region A: Asia exponential trend.
b. Formula: in linear format:
log() = 2.383 + 0.00653*
In non-linear format:
= 241.546 *

8
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


Interpretation: ( b1 - 1) * 100% = 1.5% is the estimated daily compound growth rate in
percentage for the total number of deaths due to COVID 19 from April 01 to July 31, 2020 in
Asia.
1.2 Region B: North America
After testing the Hypothesis for trend models in the North America region (appendix 3.2), the
findings indicate that linear and exponential trend models are significant.
Linear Trend Model
a. Regression output



Figure 9. Time Series outputs for Region B: North America linear trend.
b. Formula: = 2056.42 - 5.37*
 The slope b1= - 5.37 indicates that the total number of deaths due to COVID 19
between April 01 to July 31, 2020, decreased by 5.37 deaths every day.

 b0 = 2056.42 when T = 0, which illustrates that there were 2056.42 deaths on 31
March, 2020.
 Exponential Trend Model
a. Regression output

Figure 10. Time Series outputs for Region A: Asia exponential trend.

9
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


b. Formula:
In linear format: log() = 3.2840 - 0.00127*
In non-linear format: = 1923.43 *
Interpretation: ( b1 - 1) x 100% = 0.3% is the estimated daily compound decrease rate in
percentage for the total number of deaths due to COVID 19 from April 01 to July 31, 2020 in
North America.
2. Recommended Trend Models
The Coefficient of Determination (R Square) will be used to determine the most suitable
trend model for the regression outputs. Higher the coefficient of determination, the more of
the total variation in the number of deaths can be explained, which is better for the estimating
the number of deaths due to COVID 19.
a. Region A: Asia

R Square

Linear

Quadratic


Exponential

67.62%

73.68%

80.60%

Figure 11. Coefficient of determination of linear , quadratic and exponential trend models of
NA (%).
For region A, it can be seen in the figure that the exponential trend had the highest coefficient
of determination, which means the exponential trend model will be the most suitable in
region A's situation to predict the total number of deaths due to Covid-19 as it will produce
fewer errors.
b. Region B: North America

R Square

Linear

Exponential

7.90%

7.26%

Figure 12. Coefficient of determination of linear and exponential trend models of NA (%).
For region B, with a slightly higher coefficient of determination; hence, the linear trend
model will be the most suitable in region B's situation to predict the total number of deaths
due to Covid-19 as it will produce fewer errors compared to the exponential trend model.

3. Predict the number of deaths on September 28, September 29, and September 30.

10
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


a. Region A: Asia
As the above conclusion, the exponential trend is the best model for predicting the number of
deaths due to COVID 19 in Asia, with the formula:
= 241.546 *

Date (T)

Forecasted number of deaths

September 28 (181)

3575.64

September 29 (182)

3629.27

September 30 (183)

3683.71

Figure 13. Forecasted number of deaths on September 28,29,30 in Asia.
b. Region B: North America
As the above conclusion, the linear trend is the best model to predict the number of deaths

due to COVID 19 in North America, with the formula:
= 2056.42 - 5.37*

Date (T)

Forecasted number of deaths

September 28 (181)

1084.45

September 29 (182)

1079.08

September 30 (183)

1073.71

Figure 14. Forecasted number of deaths on September 28, 29, 30 in North America.

PART 6: TIME SERIES CONCLUSION
a. Line chart
11
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


Figure 15. . Line graph of Daily total number of deaths due to COVID 19 in Asia and North
America from April 01 to July 31,2020.
b. Explanation

The line graph above presents the daily total number of Deaths in Asia and North America
due to Covid 19 from April 01 to July 31, 2020. It can be concioused that the number of
Deaths in Asia is more stable and significantly less (in number of deaths) compared to North
America, although this is the region where the pandemic was spread. There is an existence of
irregular components in 2 periods, once occurred in 15-April and once in 17-June, and started
to increase steadily from 24-June to 29-June. On the other hand, in North America was a
chaos of fluctuation, the number of deaths reached the peak in 15-April, then started to move
downward with the cyclical component of a 7 days period until the end of the observation.
Also, the region has the irregular component of 24-June, which the number of deaths got
higher than any other nearby period.
Relating to Part 5.3, Asia and North America do not follow the same trend model in order to
predict the numbers of death due to the Covid-19, which is the exponential trend model in
Asia and the linear trend model in North America.
To come up with the conclusion, our team has compared the Coefficient of Determination (R
Square), because the higher the Coefficient of Determination, the lesser error, the more total
variation in the number of deaths can be explained. The R Square of exponential trend mode
of Asia is the highest (80.6%), similarity, the linear trend model of North America is higher
than the other (7.9%). In conclusion, we want to use exponential trend model to predict the
total number of death in the world since its R square is larger than the Linear trend model in
North America (80.6% > 7.9%), presenting that 80.6% of the independent variable (number
of deaths by the Covid 19) can be explained by exponential trend model.

PART 7 : OVERALL TEAM CONCLUSION
7.1 Main factors impacting the total number of deaths
Based on part 3, Multiple Regression analysis of Asia region, it indicates that there is only
one significant independent variable that may affect the total number of deaths due to COVID
19 which is the average rainfall (in mm) at 95% level of confidence. Based on the regression
12
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06



equation in part 3 (Total number of deaths= 61.01 - 0.286 *average rainfall), we can
easily see that the coefficient of rainfall is negative, hence, the amount of rainfall has an
inverse relationship with the total number of deaths due to COVID 19 which means that with
every mm increasing in rainfall, the total number of deaths will decrease by 0.286 deaths.
However, the findings in Part 3 show that the coefficient of determination of the average
rainfall is only 16.3%, which could be inferred that the influence of rainfall on COVID 19
pandemic is not too great but still be considered.
Similarly with the Multiple Regression analysis of the North America region, the significant
variable that may affect the total number of deaths due to COVID 19 is population (in
millions) at 95% level of confidence. By observing the regression equation for North America
in Part 3 ( Total number of deaths = 52.98 +1.399 *Population), we can conclude that
population (in millions) has a direct positive relationship with the total number of deaths due
to COVID 19. To be more specific, with every one million people increase in the population
of a country, the total number of deaths due to COVID 19 will increase by 1.399 deaths
which should be weighed more in the research and prevention process for the pandemic.
Each region has only one significant variable that affects the total number of deaths by
COVID 19. However, as this study only analyses two specific regions (Asia and North
America) and five variables, it can be obviously to define that there are more than five
variables that may affect the total number of deaths due to COVID 19 which are not included
in the observed study.
7.2 Predicted number of deaths due to COVID-19 in the world on October 31
As mentioned in part 6, our team preferred to use the best trend model to predict the number
of deaths due to COVID-19, which would be the exponential trend model of Asia region.
Moreover, to predict the number of Covid-19 deaths in the world on October 31, we would
use the formula of the exponential trend model of Asia region with T equal to 214 to predict
it.
Formula: = 241.546 *
= 241.546 *
= 5844.305

The calculation illustrates the numbers of deaths because of Covid-19 with a positive value
on October 31 which means the disease will become more serious in the future. To compare
in the real world this can happen because nowadays Covid-19 is very complicated, which has
more infections day by day.
7.3 The number of deaths by COVID 19 by the end of year 2020
Based on the number of death evaluations of Asia and North America in part 5, it shows a
gradual rise in the number of deaths due to COVID 19. Besides, the prediction of the total
number of deaths by COVID 19 in the world on October 31 also illustrates a positive number
(5844.305) indicating that the pandemic will continue to happen. According to WHO,
widespread vaccinations do not expect to happen until the middle of next year, which means
there are still risks that the number of cases and deaths may increase (CNBC 2020). Social
distancing is one of the most efficient measures to help prevent COVID-19 ( Robert Preidt
2020). Unfortunately, Asia countries are having less restrictions with their earlier stringent
measures as they move towards reopening their economies and they are facing an increase in
the daily number of cases and deaths as stated by WHO (Salma Khalik 2020). The United
States also begins to reopen the economy with relaxing social distancing restrictions and
some states witness a rise in new cases ( USA Today 2020). To conclude, from our team’s
perspective, by the end of 2020, the number of deaths by COVID-19 will increase.

13
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


7.4 Two other variables impacting the total number of deaths
To explain more about the factors impact the total number of deaths due to COVID 19 aside
from the five variables in the first part. Our team would recommend the two variables, the
first one would be ages, which is a numerical variable and the second one is wearing masks
behaviour, which is a categorical variable.
According to research by Clara et al. (2020), ages have a positive relationship with the
increase in the mortality rate of COVID 19. To be more specific, mortality was less than 1%

in patients aged below 50 years, it increased significantly after that age and the highest rate
was observed in patients aged above 80 years, at 29.6%. In the report of the Centers for
Disease Control and Prevention (CDC) (2020), the death rate ratio of ages after 29-year-old is
higher in every 10 years. Patients from 30 to 39-year-old and patients from 40 to 49-year-old
will have 4 times and 10 times higher death rate respectively compared to a group of patients
from 18 to 29-year-old. The higher the age, the more chances the patients get comorbidities,
such as diabetes, hypertension, and cardiovascular disease, which have been associated with
worse outcomes in COVID-19 and can positively impact the total number of deaths in this
pandemic (Clara et al. 2020).
The wearing mask behaviour variable also impacts on the total number of deaths due to
COVID-19. According to the University of Cambridge (2020), wearing masks is a cheap and
effective way to reduce the transmission of the COVID-19 virus and keeps the coronavirus
‘reproduction number’ under 1.0. Hence, wearing crude homemade masks can reduce disease
spread by catching the wearer’s virus particles, breathed directly into the fabric, whereas
inhaled air is often sucked in around the exposed sides of the mask (University of Cambridge
2020). A study by Christopher Leffler from Virginia Commonwealth University indicates that
wearing masks can help to lower the COVID-19 death rate not just by a few percent, but up
to a hundred times lower mortality (Kate Marino 2020). Some countries that recommended
mask-wearing within 15 days and 30 days, the death rate was far lower than the countries that
waited longer or no policy recommended wearing masks (Kate Marino 2020). Several
countries in Asia began using masks very early and still have mortality close to 1 in 1 million
or less, while in the U.S the mortality is 1 in 2,500 people in the population (Kate Marino
2020). In a forecast of the Institute for Health Metrics and Evaluation (IHME), if 95% of the
people in the US wearing masks in public, the total number of deaths would decrease from
295,011 by December to 228,271, a 49% drop, which means more than 66,000 lives would be
saved (IHME 2020). To conclude, wearing a mask is a cheap and effective solution to reduce
the total number of deaths due to COVID-19. The more people wearing masks in public, the
less total cases and total deaths in this pandemic.
In conclusion, the study in this report illustrates the comparison between Asia and North
America in the total number of deaths due to COVID 19 from 01 April to 31 July. Also, the

findings from analyzing the regression and time series can be used to estimate the significant
factors that affect the number of deaths as well as forecast the death cases in two
aforementioned regions and even the world in the upcoming period. In detail, countries in
North America have suffered more than Asia’s countries with significant number of deaths
due to the pandemic in the observed period; however, by the end of September, the number of
deaths per day in North America is just about 1074 deaths compared to that of Asia is
predicted to be 3 times higher with 3684 deaths on 30 September. Moreover, the world is also
forecasted to be at the level of approximately 5844 deaths on 31 October. Therefore, as the
second wave of the pandemic is started and spreading widely again, our team would
recommend everyone to strictly follow the preventive measures and practices such as
continuing social distancing restrictions and patiently waiting for the completion of COVID
19 vaccine.
14
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


References:
CDC 2020, COVID-19 Hospitalization and Death by Age, cdc , viewed 10 September 2020,
< >
Clara Bonanad et al. 2020, The Effect of Age on Mortality in Patients With COVID-19: A
Meta-Analysis With 611,583 Subjects, sciencedirect, viewed 10 September 2020,
< />token=F801601078B8F98E8ED750F6C96AE14F77C50800B9E39C232A6D7A591D7322B
F95F64FE4CF56DEE65462BEA621BAF7CD >
CNBC 2020, WHO says widespread coronavirus vaccinations are not expected until mid2021, CNBC, viewed 10 September 2020 < >
Coronavirus (COVID-19) deaths, 2020, Total number of deaths daily due to COVID-19 from
01 April to 31 July, 2020, data file, Our World in Data, viewed 5 September 2020,
< />Hospital beds (per 10,000 population) 2020, Hospital beds (per 10,000 population), data file,
World
Health
Organization,

viewed
28
August
2020,
< />IHME 2020, New IHME COVID-19 Forecasts See Nearly 300,000 Deaths by December 1,
However, Consistent Mask-Wearing Could Save about 70,000 Lives, viewed 10 September
2020, < >
Kate Marino 2020, Early face mask policies curbed COVID-19’s spread, according to 198country analysis, viewed 10 September 2020,
< />ng_to >
Medical doctors (per 10,000 population) 2020, Medical doctors (per 10,000 population), data
file,
World
Health
Organization,
viewed
28
August
2020,
< />Population, total 2019, Population, data file, The World Bank, viewed 28 August,
< />Rainfall n.d., Average Rainfall (in mm) from 1991 to 2016, data file, Climate Change
Knowledge
Portal,
viewed
28
August
2020,
< />
15
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06



Robert Preidt 2020, Benefits of Social Distancing Outweigh Economic Toll:
Study,usnews, viewed 10 September 2020,< />%20Thunstrom%20and%20colleagues.>
Salma Khalik 2020, Pandemic has entered new phase in Asia-Pacific:WHO, Straitstimes,
viewed 10 September 2020,< />Temperature n.d., Average Temperature (in Celsius) from 1991 to 2016, data file, Climate
Change
Knowledge
Portal,
viewed
28
August
2020,
< />Total confirmed COVID-19 deaths, 2020, Total number of deaths due to COVID-19 from 01
April to 31 July, 2020, data file, Our World in Data, viewed 28 August 2020,
< />University of Cambridge 2020, Widespread facemask use could shrink the 'R' number and
prevent a second COVID-19 wave, Eurek Alert, viewed 10 September 2020,
< >
USA today 2020, Coronavirus reopening, usatoday.com, viewed 10 September 2020, <
>

Appendix 1: Outlier calculation
Outlier calculation

Total deaths in Asia (per million population)

Q1 - 1.5 x IQR
Q3+ 1.5 x IQR

13.635
72.555


Min

29.08

Max

50.49
OUTLIER
Appendix 1.1: Asia’s Outlier calculation

Outlier calculation

Total deaths in North America (Per million
population)

Q1 - 1.5 x IQR

-143.6575

Q3+ 1.5 x IQR
Min

252.8425
0

Max

248.04


OUTLIER
Appendix 1.2: North America’s Outlier calculation

Appendix 2: Backward Elimination procedures at 5% level of significance

16
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


In this part, the independent variables will be tested whether they are significant or
insignificant at 5% level of significance by comparing p-value and the significant level .
1. Region A: Asia
Step 1: Stating the null and alternative hypotheses
H0; βj = 0 (No variables have relationship with the total number of deaths due to COVID
19)
H1; βj ≠ 0 (At least one variable has a relationship with the total number of deaths due to
COVID 19)
j = 1, 2, 3, 4, 5
Step 2: Full model with five variables
Variable

P-value

Comparison

Decision

β 1 Average rainfall

0.05




Reject H0

β 2 Hospital beds

0.35



Do not reject H0

β 3 Medical doctors

0.34



Do not reject H0

β 4 Average temperature

0.54



Do not reject H0

β 5 Population


0.53



Do not reject H0

Because the p-value of average temperature variable output is the largest with 0.54 and
greater than α, we eliminate this variable and continue the test.
Step 3: Four variables
Variable

P-value

Comparison

Decision

β 1 Average rainfall

0.02



Reject H0

β 2 Hospital beds

0.46




Do not reject H0

β 3 Medical doctors

0.27



Do not reject H0

β 5 Population

0.58



Do not reject H0

Because the p-value of population variable output is the largest with 0.58 and greater than α,
we eliminate this variable and continue the test.

Step 4: Three variables
Variable

P-value

Comparison


17
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06

Decision


β 1 Average rainfall

0.02



Reject H0

β 2 Hospital beds

0.45



Do not reject H0

β 3 Medical doctors

0.23



Do not reject H0


Because the p-value of hospital beds variable output is the largest with 0.45 and greater than
α, we eliminate this variable and continue the test.
Step 5: Two variables
Variable

P-value

Comparison

Decision

β 1 Average rainfall

0.01



Reject H0

β 3 Medical doctors

0.32



Do not reject H0

Because the p-value of medical doctors variable output is the largest with 0.32 and greater
than α, we eliminate this variable and continue the test.
Step 6: One variable

Variable
β1

Average rainfall

P-value

Comparison


0.01

Decision
Reject H0

We can see here that the average rainfall variable output is 0.01, which less than α so we
reject H0.
The backward elimination procedure above shows a final multiple regression model for Asia
dataset including only one significant variable at 5% level of significance: Average rainfall
(in mm).
2. Region B: North American countries
Step 1: Stating the null and alternative hypotheses
H0; βj = 0 (No variables have relationship with the total number of deaths due to COVID
19)
H1; βj ≠ 0 (At least one variable has a relationship with the total number of deaths due to
COVID 19)
j = 1, 2, 3, 4, 5
Step 2: Full model with five variables

Variable


P-value

Comparison

Average rainfall

0.909



Decision

β1

18
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06

Do not reject H0


β2
Hospital beds

0.536



Do not reject H0


Medical doctors

0.925



Do not reject H0

Average temperature

0.191



Do not reject H0

Population

0.002



Reject H0

β3

β4

β5


Because the p-value of medical doctors variable output is the largest with 0.925 and greater
than α, we eliminate this variable and continue the test.
Step 3: Four variables

Variable

P-value

Comparison

Decision

Average rainfall

0.904



Do not reject H0

Hospital beds

0.403



Do not reject H0

Average temperature


0.177



Do not reject H0

Population

0.002



Reject H0

β1

β2

β4

β5

Because the p-value of average rainfall variable output is the largest with 0.904 and greater
than α, we eliminate this variable and continue the test.

Step 4: Three variables

Variable

P-value


Comparison

19
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06

Decision


β2
Hospital beds

0.351



Do not reject H0

Average temperature

0.128



Do not reject H0

Population

0.001




Reject H0

β4

β5

Because the p-value of hospital bed variable output is the largest with 0.351 and greater than
α, we eliminate this variable and continue the test.
Step 5: Two variables

Variable

P-value

Comparison

Decision

Average temperature

0.119



Do not reject H0

Population


0.0009



Do not reject H0

β4

β5

Because the p-value of average temperature variable output is the largest with 0.119 and
greater than α, we eliminate this variable and continue the test.
Step 6: One variable

Variable

P-value

Comparison

Population

0.0000284



Decision

β5
Reject H0


Because the p-value of population variable output is lower than α, we reject H0, which means
variables have a relationship with the total number of deaths due to COVID 19.
A final multiple regression model for North American countries dataset above is contained in
the backward elimination procedure that includes only one significant variable at 5% level of
significance: Population (in millions).

Appendix 3: Hypothesis Testing for trend models at 5% significance level
1. Region 1: Asia
20
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06


a. Linear
Step 1: H0; β 1 = 0 (There is no linear trend)
H1; β 1 ≠ 0 (There is linear trend)
Step 2: p-value = 0.000 < α (0.05)
=> Reject H0
Step 3: As H0 is rejected, it can be concluded that there is a linear trend at 5% level of
significance.
b. Quadratic
Step 1: H0; β 2 = 0 (There is no quadratic trend)
H1; β 2 ≠ 0 (There is quadratic trend)
Step 2: p-value = 0.000 < α (0.05)
=> Reject H0
Step 3: As H0 is rejected, it can be concluded that there is a quadratic trend at 5% level of
significance.
c. Exponential
Step 1: H0; β 1 = 0 (There is no exponential trend)
H1; β 1 ≠ 0 (There is exponential trend)

Step 2: p-value = 0.000 < α (0.05)
=> Reject H0
Step 3: As H0 is rejected, it can be concluded that there is an exponential trend at 5% level of
significance.
2. Region 2: North America
a. Linear
Step 1: H0; β 1 = 0 (There is no linear trend)
H1; β 1 ≠ 0 (There is linear trend)
Step 2: p-value = 0.0017 < α (0.05)
=> Reject H0
Step 3: As H0 is rejected, it can be concluded that there is a linear trend at 5% level of
significance.
b. Quadratic
Step 1: H0; β 2 = 0 (There is no quadratic trend)
H1; β 2 ≠ 0 (There is quadratic trend)
Step 2: p-value = 0.2817 > α (0.05)
=> Do not reject H0
Step 3: As H0 is rejected, it can be concluded that there is not a quadratic trend at 5% level of
significance.
c. Exponential
Step 1: H0; β 1 = 0 (There is no exponential trend)
H1; β 1 ≠ 0 (There is exponential trend)
Step 2: p-value = 0.00269 < α (0.05)
=> Reject H0
Step 3: As H0 is rejected, it can be concluded that there is an exponential trend at 5% level of
significance.

21
SGS RMIT – Business Statistic 1 – ECON1193B – TEAM 06




×