Bài tập về Thống kê trong kinh doanh
Thread:
To study the rate of people dying from diseases related to heart disease, a research
group at 1 American University of collecting data in several states across the U.S. in the
number of deaths and some socio-economic data related. The table given below:
The
State
number of
Age 65
Income
AL
AK
AZ
AR
CA
CO
CT
DE
FL
GA
HI
ID
deaths
307.1
90.9
226.0
325.9
217.0
158.3
278.1
266.9
340.4
225.9
203.3
202.3
13.0
5.7
13.0
14.0
10.6
9.7
13.8
13.0
17.6
9.6
13.3
11.3
IL
275.3
12.1
23.471
30.064
25.578
22.257
32.275
32.949
40.640
31.255
28.145
27.940
28.221
24.180
32.2
IN
280.4
12.4
IA
303.2
14.9
KS
262.8
13.3
KY
305.4
12.5
LA
274.6
11.6
ME
MD
MA
MI
272.8
233.6
257.0
280.8
14.4
11.3
13.5
12.3
59
27.0
11
26.7
23
27.8
16
24.2
94
23.3
34
25.623
33.872
37.992
29.612
The rate
of color
regions
26.0
3.5
3.1
15.7
6.7
3.8
9.1
19.2
14.6
28.7
1.8
0.4
1
1
1
1
1
1
1
1
1
1
1
1
15.1
1
8.4
1
2.1
1
5.7
1
7.3
1
32.5
1
0.5
27.9
5.4
14.2
1
1
1
1
1
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
199.6
337.2
328.7
232.1
269.9
233.9
229.0
288.5
198.4
324.2
250.8
289.3
294.9
335.4
219.0
347.7
303.6
256.9
276.1
296.9
216.6
130.8
226.0
223.0
200.0
377.5
263.3
210.4
12.1
12.1
13.5
13.4
13.6
11.0
12.0
13.2
11.7
12.9
12.0
14.7
13.3
13.2
12.8
15.6
14.5
12.1
14.3
12.4
9.9
8.5
12.7
11.2
11.2
15.3
13.1
11.7
32.101
20.993
27.445
22.569
27.829
30.529
33.332
36.983
22.203
34.547
27.194
25.068
28.400
23.517
28.350
29.539
29.685
24.321
26.115
26.239
27.871
23.907
26.901
31.162
31.528
21.915
28.232
27.230
3.5
36.3
11.2
0.3
4.0
6.8
0.7
13.6
1.9
15.9
21.6
0.6
11.5
7.6
1.6
10.0
4.5
29.5
0.6
16.4
11.5
0.8
0.5
19.6
3.2
3.2
5.7
0.8
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
In which:
- Number of deaths: The number of deaths related to cardiovascular disease per
100,000 population
- Age 65: Percentage of population aged 65 years and older
- Income: per capita income measured in thousands of dollars
- The rate of color: the percentage of the population are people of color.
- Region: The states are divided into two research areas of Zone 1 and Zone 2
Please use the above data to answer the following questions:
1. Use appropriate statistical description to comment on the events in the data.
2
2. Use the appropriate graph and correlation coefficient to comment on the
relationship between the number of deaths due to cardiovascular diseases associated with
each remaining variable. Since then identify if set up linear regression models with the
dependent variable is the number of deaths, which of the variables remaining in the can
affect the dependent variable (no need to distinguish the region).
3. Please estimate the confidence interval for the average number of deaths for the
states in the region 1 and region 2.
4. Compare the average number of deaths for the states in the region 1 and region 2
(for testing). Compared with results similar to the comparison of income?
5. Estimate a linear regression model with the dependent variable is the number of
deaths, the remaining independent variables are variables (by region):
a. Explain the significance of the regression coefficients and the R2 coefficient.
b. Use the appropriate expertise to know which independent variables affect and do
not affect the dependent variable? Since it can be made the comment about the factors
can affect the rate of deaths due to cardiovascular diseases. Whether there are other
factors that can affect mortality rates this?
c. Use F for testing whether the model makes sense or not? If the meaning of the
results obtained.
d. Predict the percentage of deaths in one state with the independent variables,
respectively:
15% aged 65 years or older and average income 25000usd, 4% black.
Explanation for the results received.
Answer:
To answer the question, the members of the group I've been using materials Megastat
software to analyze the data and then use the results from the software to answer the
question.
1. Use appropriate statistical description to comment on the events in the data.
1.1 The number of deaths related to cardiovascular disease.
3
From software Megastat/ Descriptive statistics, then enter the number of deaths on
the table. I have the following tables.
Descriptive statistics
The number of
deaths
count
mean
sample variance
sample standard deviation
minimum
maximum
range
50
258.954
3,191.835
56.496
90.9
377.5
286.6
skewness
kurtosis
coefficient of variation (CV)
-0.482
0.681
21.82%
1st quartile
median
3rd quartile
interquartile range
mode
223.725
265.100
296.400
72.675
226.000
Comment:
The above table we find:
- The number of the states studied are: the 50 states.
- The number of deaths related to cardiovascular disease in an American state
average is 259 per 100,000 population. The number of deaths related to cardiovascular
disease median is 265,100. Thus, 50% of the states studied The number of deaths related
to cardiovascular disease is lower than 265.1 and 50% of the states studied The number
of deaths related to major cardiovascular disease more than 265.1. The number of deaths
related to cardiovascular disease in an average state median approximation shows the
sample distribution is quite symmetrical.
- The sample standard deviation is: 56496 shows the deviation of the distribution.
4
- Some states have number of deaths from cardiovascular diseases related to the same
but the number of deaths related to cardiovascular disease the most common (maximum
frequency) is 226 per 100,000 population. The number of deaths related to cardiovascular
disease in a low state are: 90.9 per 100,000 people. The number of deaths related to
cardiovascular disease in the highest state is: 377.5 per 100,000 population. Range, in
fact, is 286.6.
The chart shows the frequency of the number of deaths related to cardiovascular
disease in an American state.
From software Megastat/Frequency Distribution/Quantitative, data entry people die
on the table, from which we have the following tables:
Frequency Distribution - Quantitative
The number of
deaths
uppe midpoin
lower
50
10
0
15
0
20
0
250
300
350
cumulative
widt
frequenc
percen
frequenc
percen
<
r
100
t
75
h
50
y
1
t
2.0
y
1
t
2.0
<
150
125
50
1
2.0
2
4.0
<
200
175
50
3
6.0
5
10.0
<
250
225
50
15
30.0
20
40.0
<
<
<
300
350
400
275
325
375
50
50
50
18
11
1
36.0
22.0
2.0
38
49
50
76.0
98.0
100.0
50
100.0
Based on the above frequency distribution table, we find: The number of deaths
related to cardiovascular disease in the U.S. states is popular 200 - 350 of 100,000
(accounting for 88%).
5
Number of dead people
Frequency distribution graph of the number of deaths is quite variable balance, focus
in the middle. However, the deviation (Sknewness) of the chart is -0482 <0 indicates that
the left direction skewed distribution.
1.2.Percentage of population aged 65 years and older
From software Magastat/Descriptive Statistics. I enter the data portion of the
population aged 65 years and above, from which we have the following tables:
Descriptive statistic
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
Skewness
Kurtosis
coefficient of variation
(CV)
1st quartile
Median
Tuổi 65
50
12.538
3.628
1.905
5.7
17.6
11.9
-0.741
3.078
15.19%
11.700
12.750
6
3rd quartile
interquartile range
Mode
13.475
1.775
12.100
Comment:
The above table we find:
- The number of the states studied are: the 50 states.
- Percentage of the population aged 65 years or more in a state of the United States
an average of 12,538 percent. Percentage of population aged 65 years or more in a U.S.
state median is 12.75%. Thus, 50% of the states studied Percentage of population aged 65
years or older in the U.S. state of less than 12.75% and 50% of the states studied
Percentage of population aged 65 years or more in a states greater than 12.75%.
Percentage of population aged 65 years or more in a state of the United States in a state
of roughly median shows the sample distribution is quite symmetrical.
- The sample standard deviation is: 1905% shows the deviation of the distribution.
- Some states have Percentage of population aged 65 years or more same but the
percentage of the population aged 65 years or more in a state of the United States the
most common (maximum frequency) is 12.1%. Percentage of population aged 65 years
or more in a state of the United States is low: 5.7%. Percentage of population aged 65 or
older in most states is: 17.6%. About the fact that 11.9% variation.
The chart shows the frequency of the percentage of the population aged 65 years or
more in a state of the United States
Frequency Distribution - Quantitative
Age 65
uppe
lowe
r
5.0
6.0
7.0
8.0
9.0
10.0
r
<
<
<
<
<
<
6.0
7.0
8.0
9.0
10.0
11.0
midpoin
widt
frequenc
percen
cumulative
frequenc percen
t
h
1.0
1.0
1.0
1.0
1.0
1.0
y
t
y
5.5
6.5
7.5
8.5
9.5
10.5
1
0
0
1
3
1
2.0
0.0
0.0
2.0
6.0
2.0
t
1
1
1
2
5
6
2.0
2.0
2.0
4.0
10.0
12.0
7
11.0
12.0
13.0
14.0
15.0
16.0
17.0
<
<
<
<
<
<
<
12.0
13.0
14.0
15.0
16.0
17.0
18.0
11.5
12.5
13.5
14.5
15.5
16.5
17.5
1.0
1.0
1.0
1.0
1.0
1.0
1.0
8
13
14
6
2
0
1
16.0
26.0
28.0
12.0
4.0
0.0
2.0
50
100.0
14
27
41
47
49
49
50
28.0
54.0
82.0
94.0
98.0
98.0
100.0
Based on the above frequency distribution table, we find: Percentage of population
aged 65 and older in the popular states in the U.S. is between 11 - 15% (up 82%).
Conclusion: Based on the survey results, the government may consider building the
preferential policies for the elderly such as building more hospitals and nursing homes to
ensure that older people get the best health care or construction of the distribution of
welfare for the elderly in the state in accordance with the current rate of the elderly ....
Age 65
Frequency distribution graph of the variable percentage of the population aged 65
years or more fairly balanced, concentrated in the center. However, the deviation
(Sknewness) of the chart is -0741 <0 indicates that the distribution of the direction of
deviation left over right.
1.3.The average income of people with thousands of dollars
8
From software Megastat/ Descriptive statistics. Then we enter income data in the
table, from which we have the following tables.
Descriptive statistics
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
Skewness
Kurtosis
coefficient of variation
(CV)
1st quartile
Median
3rd quartile
interquartile range
Mode
Income
50
28.82432
38.33179
6.19127
20.993
59.685
38.692
2.71490
11.83641
21.48%
25.19550
27.85000
31.23175
6.03625
#N/A
The above table we find:
- The number of the states studied are: the 50 states.
- The average income of the people in a state average of U.S. $ 28,824 thousand. The
average income of the people in a state of the U.S. median of $ 27.85 trillion. Thus, 50%
of the states studied had average income of less than $ 27.85 thousand people and 50% of
the states studied had average income of the people is greater than $ 27.85 trillion. The
average income of the average people in a state of roughly median shows the sample
distribution is quite disproportionate.
- The sample standard deviation: 6.19 shows the deviation of the distribution.
- The average income of the people in each state of the United States is different (no
common values). The average income of the people in the lowest state: $ 20,993
thousand. The average income of people in the highest state: $ 59,685 thousand. Range,
the fact that $ 38,692 thousand.
9
The chart shows the frequency of the average income of people in a state of the
United States
Frequency Distribution - Quantitative
lowe
Income
uppe
r
20.00
22.00
24.00
26.00
28.00
30.00
32.00
34.00
36.00
38.00
40.00
42.00
44.00
46.00
48.00
50.00
52.00
54.00
56.00
58.00
r
22.00
24.00
26.00
28.00
30.00
32.00
34.00
36.00
38.00
40.00
42.00
44.00
46.00
48.00
50.00
52.00
54.00
56.00
58.00
60.00
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
midpoin
widt
Frequenc
percen
t
21.00
23.00
25.00
27.00
29.00
31.00
33.00
35.00
37.00
39.00
41.00
43.00
45.00
47.00
49.00
51.00
53.00
55.00
57.00
59.00
h
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
y
2
7
6
12
7
5
6
1
2
0
1
0
0
0
0
0
0
0
0
1
t
4.0
14.0
12.0
24.0
14.0
10.0
12.0
2.0
4.0
0.0
2.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
2.0
50
100.0
cumulative
frequenc percen
y
2
9
15
27
34
39
45
46
48
48
49
49
49
49
49
49
49
49
49
50
t
4.0
18.0
30.0
54.0
68.0
78.0
90.0
92.0
96.0
96.0
98.0
98.0
98.0
98.0
98.0
98.0
98.0
98.0
98.0
100.0
Based on the above frequency distribution table, we find: The average income of the
people in each state of the United States is common from 22 - 34 thousand dollars
(accounting for 86%). In particular, income from 26 - 28 thousand U.S. dollars accounted
for the highest percentage at 24%.
Conclusion: Based on the results of the investigation, authorities may consider
application of the various public service fees in accordance with the average income in
the state or make policy about fees and other costs related to appropriate treatment ....
10
Income
Frequency distribution graph of the average income variable tend to be concentrated
in the middle. However, some high-income areas than other regions, the income from
56,000 - $ 60,000
1.4. Percent of the population are people of color.
From software Megastat/ Descriptive statistics, then enter the data rate of color on
the table, from which we have the following tables:
Descriptive statistics
The number of
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
deaths
50
9.902
91.779
9.580
0.3
36.3
36
Skewness
Kurtosis
coefficient of variation
1.130
0.453
96.75%
11
(CV)
1st quartile
Median
3rd quartile
interquartile range
Mode
2.350
6.750
14.975
12.625
3.500
Comment:
The above table we find:
- The number of the states studied are: the 50 states.
- Percentage of the population are people of color in a state of the United States an
average of 9.9%. Percent of the population are people of color in a U.S. state median is
6.75%. Thus, 50% of the states studied had Percent black population is less than 6.75%
and 50% of the states studied had Percent of black population is greater than 6.75%.
Percentage of population aged 65 years or more in a state of the United States in a state
greater than the median for the sample distribution skewed right.
- The sample standard deviation: 9.58% shows the deviation of the distribution.
- Some states have the same percentage of the population are colored people but the
percentage of the population is the most popular color (maximum frequency) of 3.5%.
Percent of the population are people of color in a state of the United States is low: 0.3%.
Percent of the population are people of color in the highest states: 36.3%. Range, the fact
that 36%.
The chart shows the frequency of the average income of people in a state of the
United States
Frequency Distribution - Quantitative
lowe
r
0
5
10
15
The number of deaths
midpoin
upper
t
<
5
3
<
10
8
<
15
13
<
20
18
widt
Frequenc
percen
h
5
5
5
5
y
21
9
7
6
t
42.0
18.0
14.0
12.0
cumulative
frequenc
percent
y
21
42.0
30
60.0
37
74.0
43
86.0
12
20
25
30
35
<
<
<
<
25
30
35
40
23
28
33
37
5
5
5
5
1
4
1
1
2.0
8.0
2.0
2.0
50
100.0
44
48
49
50
88.0
96.0
98.0
100.0
Based on the above frequency distribution table, we find: Percentage of the
population are people of color in each state of the United States is common from 0 - 20%
(up 86%). In particular, the state percentage of the population are people of color from 0 5% accounted for the highest percentage reached 42%.
Conclusion: Based on the results of the investigation, authorities may consider
welfare policy, priority for colored people
Rate of colored people
Frequency distribution graph of the variable percentage of the population are people
of color distribution as skewed left.
2. Use graphs and correlation coefficients to examine the relationship between
the number of deaths due to cardiovascular diseases associated with the remaining
variables:
13
2.1. Graph the relationship between the number of deaths due to cardiovascular
diseases related to the percentage of the population aged 65 years or older
From software Megastat / Correlation / Regession / Scatter Plot. Then we enter the
data the number of deaths and the percentage of the population aged 65 years or more to
Number of deaths
the table, we have the following tables.
Age 65
Based on the above chart, we find: Dynamics of linear dispersion form. Thus,
between the number of deaths due to cardiovascular diseases related to the percentage of
the population aged 65 years and older have proportional relationship to each other.
2.2. Graph the relationship between the number of deaths due to cardiovascular
diseases associated with per capita income of the people
From software Megastat a / Correlation / Regession / Scatter Plot. Then we enter
the death toll figures and earnings on the table, we have the following tables.
14
Number of deaths
Income
Based on the above chart, we find: dispersion graph is not linear. Thus, between the
number of deaths due to cardiovascular diseases associated with per capita income does
not have a relationship with each other.
1.3. Graph the relationship between the number of deaths due to cardiovascular
diseases associated with the percentage of the population are people of color
From software Megastat/ Correlation / Regession / Scatter Plot. Then we enter the
data the number of deaths and percent of the population are people of color to the table,
we have the following tables.
15
Number of deaths
Rate of deaths
Based on the above chart, we find: Graph the linear dispersion of the form ->
Between The number of deaths from cardiovascular diseases associated with the rate of
color have relationship with each other.
The correlation coefficient between the variables
From software Megastat / Correlation / Regresion / Correlation Matrix, then we put
all the data the number of deaths, age 65, of income, the rate of color on the table, with
the following results:
Correlation Matrix
Number of
Number of deaths
Age 65
Income
Rate of colored people
deaths
1.000
.788
-.044
.312
Age 65
Income
1.000
.040
-.095
1.000
-.093
Rate of colored
people
1.000
Based on the above table we find:
- Correlation between the number of deaths and the percentage of the population
aged 65 years and older is 0788
- Correlation between the number of deaths and the average income is -0044
16
- Correlation between the number of deaths and the percentage of the population are
people of color is 0312
Thus, the percentage of the population aged 65 years and older have greatest impact
on the change of the number of deaths related to cardiovascular disease, then the
percentage of the population are people of color. The average income does not affect the
number of deaths related to cardiovascular disease.
3. Confidence interval estimate for the average number of deaths for the state in
Region 1 and Region 2
3.1. The average number of deaths for the states in Region 1
a. Basic statistical description of the number of deaths in the states in Region 1
From software Megastat/ Descriptive statistics. Then we enter the data, the number
of deaths in the first tables, which have the following data:
Descriptive statistics
The number of
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
1st quartile
Median
3rd quartile
interquartile range
Mode
deaths
26
257.138
3,361.783
57.981
90.9
340.4
249.5
225.925
269.850
297.600
71.675
#N/A
b.Estimate the average number of deaths for the states in Region 1
From the data in the table above, we use The software Megastat/Confidence
interval-mean, enter data into the table, we have the following tables:
Confidence interval – mean
17
95%
257.138
57.981
26
2.060
23.419
280.557
233.719
confidence level
Mean
std. dev.
N
t (df = 25)
half-width
upper confidence limit
lower confidence limit
Based on the above results, we can estimate the confidence interval of the number
of deaths due to cardiovascular diseases related to the average in the states of Region 1 is
in the range (233 719; 280 557). In other words, we can estimate that 95% of the states in
Region 1 Number of deaths due to cardiovascular diseases related to range from 233.7 to
280.6 of 100,000 people.
3.2. The average number of deaths for the states in Region 2
Do the same as the above one.
a. Basic statistical description of the number of deaths in the state in Region 2.
Descriptive statistics
The number
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
1st quartile
Median
3rd quartile
interquartile range
Mode
of deaths
24
260.921
3,138.122
56.019
130.8
377.5
246.7
222.000
260.100
295.400
73.400
#N/A
18
b. Estimate the average number of deaths for the states in Region 2.
Confidence interval - mean
95%
260.921
56.019
24
2.069
23.655
284.576
237.266
confidence level
mean
std. dev.
n
t (df = 23)
half-width
upper confidence limit
lower confidence limit
Confidence interval of the number of deaths due to cardiovascular diseases related
to the average in the states of Region 2 is in the range (237 266; 284 576). In other
words, we can estimate that 95% of the states in Region 2 deaths related to
cardiovascular diseases in the range from 237.3 to 284.6 of 100,000 people.
4.1 Comparison of the average number of deaths for the state in Zone 1 and Zone 2
From software Megastat / Hypothesis tets / Compare Two Independent Groups.
Then we enter the data of the dead zone 1 and zone 2 on the table, then we have the
following tables:
Hypothesis Test: Independent Groups (t-test, pooled variance)
The number of
Group 2
deaths: Group 1
257.138
57.981
26
260.921
56.019
24
48
-3.7824
mean
std. dev.
n
df
difference (The number of deaths -
0
Group 2)
pooled variance
pooled std. dev.
standard error of difference
hypothesized difference
-0.23
.8158
t
p-value (two-tailed)
3,254.6121
57.0492
16.1489
19
Based on the above results, we see: With a significance level α = 5%, the number of
deaths due to cardiovascular diseases related to state average in Zone 1 and Zone 2 is the
same (p-value>α).
4.2. Comparing the average income of the people of the state in Zone 1 and Zone 2
We do the same as Section 4.2 we have the following results:
Hypothesis Test: Independent Groups (t-test, pooled variance)
Income
Group 2
Group 1
28.40842
4.82352
26
29.27488
7.48108
24
mean
std. dev.
n
48
-0.866452
38.935179
6.239806
1.766297
0
df
difference (Thu nhập - Group 2)
pooled variance
pooled std. dev.
standard error of difference
hypothesized difference
-0.49
.6260
t
p-value (two-tailed)
Based on the above results, we see: With a significance level α = 5%, the average
income of the average people in the states of Zone 1 and Zone 2 is the same (p-value>
α -> nobasis reject the hypothesis H0).
5. Estimate a linear regression model with the dependent variable is the number
of deaths and the remaining independent variables are variables
R²
Adjusted R²
R
Std. Error
0.774
0.759
0.880
27.731
SS
Df
N
K
Dep. Var.
50
3
Number of deaths
ANOVA
table
Source
Regression 121,024.7645
3
MS
40,341.588
2
F
p-value
52.46
6.92E-15
20
Residual
Total
35,375.1597
156,399.9242
46
49
769.0252
Regression output
confidence interval
std.
variables
coefficients
Intercept
Age 65
Income
Rate of
-60.1955
24.5202
-0.3757
error
32.6430
2.0904
0.6430
Colored
2.2768
0.4171
t (df=46)
p-value
95% lower 95% upper
-1.844
11.730
-0.584
.0716
2.01E-15
.5619
-125.9025
20.3124
-1.6700
5.5114
28.7280
0.9186
5.459
1.86E-06
1.4373
3.1163
People
5.1.Explain the significance of the regression coefficient and the coefficient of R2
Based on the above table we find:
The model obtained is:
(Number of deaths) = -60.2 + 24.5xAge65 - 0.4xIncome+ 2.3xRate of colored people
The meaning of the regression coefficients:
+ 24.5: If the income and the rate of color are kept constant, while the percentage of
the population aged 65 and older increased by 1%, themselves in civil 100.00, Number of
deaths related to cardiovascular disease increased by 24.5 people .
+ (-0.4): If the percentage of the population aged 65 years and older and the
proportion of color is held constant, while the average income increased $ 1,000,
themselves in civil 100.00, Number of deaths related to heart disease reduction circuit
0.4.
+ 2.3: If the percentage of the population aged 65 years and older and income is held
constant, while the percentage of people of color increased by 1%, themselves in civil
100.00, Number of deaths related to cardiovascular disease increased by 2.3 people.
- The meaning of R2 = 0774: With 03 independent variable is the percentage of the
population aged 65 years and older, the average income of the people and the percentage
21
of the population are people of color, the model explained 77.4% the change of the
number of deaths related to cardiovascular diseases.
5.2. Use the appropriate expertise to know which independent variables affect
and do not affect the dependent variable? Since it can be made the comment about
the factors can affect the rate of deaths due to cardiovascular diseases. Whether
there are other factors that can affect mortality rates this?
Based on the table above, to test the independent variables that affect and do not
affect the dependent variable, we build the 03 pairs following assumptions:
+ Pair of hypothesis 1:
H0: β1 = 0 (Percentage of the population aged 65 years and older did not affect the
number of deaths)
H1: β1 ≠ 0 (Percentage of the population aged 65 years and older can affect the
number of deaths)
+ Pair of hypothesis 2:
H0: β2 = 0 (average income does not affect the number of deaths)
H1: β2 ≠ 0 (average income may affect the number of deaths)
+ Pair of hypothesis 3:
H0: β3 = 0 (average income does not affect the number of deaths)
H1: β3 ≠ 0 (average income may affect the number of deaths)
To test 3 pairs assumptions, we observe the value of P - Value obtained in the original
spreadsheet:
+ With assumption 1 pair: P-Value = 2x10-15 <α = 0.05 -> reject the hypothesis H0
-> Percentage of the population aged 65 years and older may affect the number of deaths
due to heart-related diseases circuit.
+ With assumption 2 pairs: P-Value = 0.56> α = 0.05 -> accept the hypothesis H0 ->
The average income of the people does not affect the number of deaths due to
cardiovascular-related diseases.
22
+ With assumption 3 pairs: P-Value = 1.86x10-6 <α = 0.05 -> reject the hypothesis
H0 -> Percentage of the population are people of color can affect the number of deaths
due to heart-related diseases circuit.
Conclusion: With three independent variables of this study, only two variables is the
percentage of the population aged 65 years or more and the percentage of the population
are people of color can affect the number of deaths due to cardiovascular-related
diseases. Thus, there are other factors that can affect the number of deaths due to
cardiovascular-related diseases that we need more research as: The number of doctors per
100,000 population, the ratio of male / female per 100,000 population, ...
5.3. Use F for testing whether the model makes sense or not? If the meaning of
the results obtained.
Based on the test results, we built a linear regression model shows the
relationship between the number of deaths due to cardiovascular diseases related to
(dependent variable) with the percentage of the population aged 65 years up and the
percentage of the population are people of color (independent variables).
Regression Analysis
R²
Adjusted R²
R
Std. Error
0.772
0.762
0.879
27.536
N
K
Dep. Var.
SS
120,762.185
Df
MS
F
p-value
2
60,381.0927
79.63
8.04E-16
47
758.2498
50
2
Number of deaths
ANOVA
table
Source
Regression
Residual
Total
4
35,637.7388
156,399.924
2
Regression output
49
confidence interval
23
std.
variables
coefficients
Intercept
-70.7553
Age 65
24.4814
2.0747
11.800
2.2987
0.4125
5.573
Error
26.993
t (df=47)
p-value
-2.621
.0118
1
Rate of
colored
people
1.18E15
1.18E06
95%
95%
lower
-
upper
125.0585
-16.4522
20.3077
28.6551
1.4689
3.1285
The model obtained is:
(Number of deaths) = -70.7 + 24.5 x Age 65 + 2.3x rate of colored people
To test whether the model makes sense or not, we use test pair of the following
assumptions:
H0: β1 = β2 = 0 (Percentage of the population aged 65 years and older and the rate of
color does not affect the number of deaths)
H1: At least one coefficient β ≠ 0 (There are at least 1 in 2 variables Percentage of
population aged 65 years or older or black rate affects the number of deaths)
Based on the results of testing expertise ANNOVA and F as the table above we have:
- The meaning of R2 = 0772 -> The model makes sense in explaining the variation of
the number of deaths due to cardiovascular-related diseases: With 02 independent
variable is the percentage of the population aged 65 years and older and percentage of the
population are people of color, the model explained 77.2% of the change of the number
of deaths related to cardiovascular diseases.
- The coefficient β> 0 that depends proportional relationship with the independent
variables.
- Value P-Value in testing F is 8.04x10-16 <α = 0.05 -> reject the hypothesis H0 ->
At least one of the two variables Percentage of population aged 65 years or more, or the
rate of color affect the number of deaths.
24
Conclusion: To reduce the rate of deaths from cardiovascular disease, the
government needs more investment attention to health systems, health care for the
elderly (over 65 years).
Conclusion: To reduce the rate of deaths from cardiovascular disease, the
government needs more investment attention to health systems, health care for the
elderly (over 65 years).
5.4. Predict the percentage of deaths in one state with the independent variables,
respectively:
15% aged 65 years or older and average income 25000usd, 4% black.
Explanation for the results received.
Predicted values for: Deaths
Age
65
15
95% Confidence
95% Prediction
Interval
Interval
Rate of
colored
Predicted
people
4
305.6603
lower
upper
292.1914 319.1291
lower
upper
248.6504 362.6701
Leverage
0.059
If one state Percentage of population aged 65 to 15 and the percentage of the
population are people of color is 4%, then: The number of deaths due to cardiovascularrelated diseases will range from 292 to 319 people in 100,000 people.
References:
1. Curriculum Decision Management - Dr. Nguyen Manh The - PGSM
25