Tải bản đầy đủ (.doc) (25 trang)

bài tập về thống kê trong kinh doanh MBA e

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (288.39 KB, 25 trang )

Bài tập về Thống kê trong kinh doanh
Thread:
To study the rate of people dying from diseases related to heart disease, a research
group at 1 American University of collecting data in several states across the U.S. in the
number of deaths and some socio-economic data related. The table given below:
The
State

number of

Age 65

Income

AL
AK
AZ
AR
CA
CO
CT
DE
FL
GA
HI
ID

deaths
307.1
90.9
226.0


325.9
217.0
158.3
278.1
266.9
340.4
225.9
203.3
202.3

13.0
5.7
13.0
14.0
10.6
9.7
13.8
13.0
17.6
9.6
13.3
11.3

IL

275.3

12.1

23.471

30.064
25.578
22.257
32.275
32.949
40.640
31.255
28.145
27.940
28.221
24.180
32.2

IN

280.4

12.4

IA

303.2

14.9

KS

262.8

13.3


KY

305.4

12.5

LA

274.6

11.6

ME
MD
MA
MI

272.8
233.6
257.0
280.8

14.4
11.3
13.5
12.3

59
27.0

11
26.7
23
27.8
16
24.2
94
23.3
34
25.623
33.872
37.992
29.612

The rate
of color

regions

26.0
3.5
3.1
15.7
6.7
3.8
9.1
19.2
14.6
28.7
1.8

0.4

1
1
1
1
1
1
1
1
1
1
1
1

15.1

1

8.4

1

2.1

1

5.7

1


7.3

1

32.5

1

0.5
27.9
5.4
14.2

1
1
1
1
1


MN
MS
MO
MT
NE
NV
NH
NJ
NM

NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY

199.6
337.2
328.7
232.1
269.9
233.9
229.0
288.5
198.4
324.2

250.8
289.3
294.9
335.4
219.0
347.7
303.6
256.9
276.1
296.9
216.6
130.8
226.0
223.0
200.0
377.5
263.3
210.4

12.1
12.1
13.5
13.4
13.6
11.0
12.0
13.2
11.7
12.9
12.0

14.7
13.3
13.2
12.8
15.6
14.5
12.1
14.3
12.4
9.9
8.5
12.7
11.2
11.2
15.3
13.1
11.7

32.101
20.993
27.445
22.569
27.829
30.529
33.332
36.983
22.203
34.547
27.194
25.068

28.400
23.517
28.350
29.539
29.685
24.321
26.115
26.239
27.871
23.907
26.901
31.162
31.528
21.915
28.232
27.230

3.5
36.3
11.2
0.3
4.0
6.8
0.7
13.6
1.9
15.9
21.6
0.6
11.5

7.6
1.6
10.0
4.5
29.5
0.6
16.4
11.5
0.8
0.5
19.6
3.2
3.2
5.7
0.8

1
1
1
1
2
2
2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
2
2
2
2
2
2
2

In which:
- Number of deaths: The number of deaths related to cardiovascular disease per
100,000 population
- Age 65: Percentage of population aged 65 years and older
- Income: per capita income measured in thousands of dollars
- The rate of color: the percentage of the population are people of color.
- Region: The states are divided into two research areas of Zone 1 and Zone 2
Please use the above data to answer the following questions:
1. Use appropriate statistical description to comment on the events in the data.
2


2. Use the appropriate graph and correlation coefficient to comment on the
relationship between the number of deaths due to cardiovascular diseases associated with
each remaining variable. Since then identify if set up linear regression models with the

dependent variable is the number of deaths, which of the variables remaining in the can
affect the dependent variable (no need to distinguish the region).
3. Please estimate the confidence interval for the average number of deaths for the
states in the region 1 and region 2.
4. Compare the average number of deaths for the states in the region 1 and region 2
(for testing). Compared with results similar to the comparison of income?
5. Estimate a linear regression model with the dependent variable is the number of
deaths, the remaining independent variables are variables (by region):
a. Explain the significance of the regression coefficients and the R2 coefficient.
b. Use the appropriate expertise to know which independent variables affect and do
not affect the dependent variable? Since it can be made the comment about the factors
can affect the rate of deaths due to cardiovascular diseases. Whether there are other
factors that can affect mortality rates this?
c. Use F for testing whether the model makes sense or not? If the meaning of the
results obtained.
d. Predict the percentage of deaths in one state with the independent variables,
respectively:
15% aged 65 years or older and average income 25000usd, 4% black.
Explanation for the results received.
Answer:
To answer the question, the members of the group I've been using materials Megastat
software to analyze the data and then use the results from the software to answer the
question.
1. Use appropriate statistical description to comment on the events in the data.
1.1 The number of deaths related to cardiovascular disease.
3


From software Megastat/ Descriptive statistics, then enter the number of deaths on
the table. I have the following tables.


Descriptive statistics
The number of
deaths
count
mean
sample variance
sample standard deviation
minimum
maximum
range

50
258.954
3,191.835
56.496
90.9
377.5
286.6

skewness
kurtosis
coefficient of variation (CV)

-0.482
0.681
21.82%

1st quartile
median

3rd quartile
interquartile range
mode

223.725
265.100
296.400
72.675
226.000

Comment:
The above table we find:
- The number of the states studied are: the 50 states.
- The number of deaths related to cardiovascular disease in an American state
average is 259 per 100,000 population. The number of deaths related to cardiovascular
disease median is 265,100. Thus, 50% of the states studied The number of deaths related
to cardiovascular disease is lower than 265.1 and 50% of the states studied The number
of deaths related to major cardiovascular disease more than 265.1. The number of deaths
related to cardiovascular disease in an average state median approximation shows the
sample distribution is quite symmetrical.
- The sample standard deviation is: 56496 shows the deviation of the distribution.

4


- Some states have number of deaths from cardiovascular diseases related to the same
but the number of deaths related to cardiovascular disease the most common (maximum
frequency) is 226 per 100,000 population. The number of deaths related to cardiovascular
disease in a low state are: 90.9 per 100,000 people. The number of deaths related to
cardiovascular disease in the highest state is: 377.5 per 100,000 population. Range, in

fact, is 286.6.
The chart shows the frequency of the number of deaths related to cardiovascular
disease in an American state.
From software Megastat/Frequency Distribution/Quantitative, data entry people die
on the table, from which we have the following tables:
Frequency Distribution - Quantitative
The number of
deaths
uppe midpoin

lower
50
10
0
15
0
20
0
250
300
350

cumulative
widt

frequenc

percen

frequenc


percen

<

r
100

t
75

h
50

y
1

t
2.0

y
1

t
2.0

<

150


125

50

1

2.0

2

4.0

<

200

175

50

3

6.0

5

10.0

<


250

225

50

15

30.0

20

40.0

<
<
<

300
350
400

275
325
375

50
50
50


18
11
1

36.0
22.0
2.0

38
49
50

76.0
98.0
100.0

50

100.0

Based on the above frequency distribution table, we find: The number of deaths
related to cardiovascular disease in the U.S. states is popular 200 - 350 of 100,000
(accounting for 88%).

5


Number of dead people

Frequency distribution graph of the number of deaths is quite variable balance, focus

in the middle. However, the deviation (Sknewness) of the chart is -0482 <0 indicates that
the left direction skewed distribution.
1.2.Percentage of population aged 65 years and older
From software Magastat/Descriptive Statistics. I enter the data portion of the
population aged 65 years and above, from which we have the following tables:
Descriptive statistic
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
Skewness
Kurtosis
coefficient of variation
(CV)
1st quartile
Median

Tuổi 65
50
12.538
3.628
1.905
5.7
17.6
11.9
-0.741
3.078

15.19%
11.700
12.750
6


3rd quartile
interquartile range
Mode

13.475
1.775
12.100

Comment:
The above table we find:
- The number of the states studied are: the 50 states.
- Percentage of the population aged 65 years or more in a state of the United States
an average of 12,538 percent. Percentage of population aged 65 years or more in a U.S.
state median is 12.75%. Thus, 50% of the states studied Percentage of population aged 65
years or older in the U.S. state of less than 12.75% and 50% of the states studied
Percentage of population aged 65 years or more in a states greater than 12.75%.
Percentage of population aged 65 years or more in a state of the United States in a state
of roughly median shows the sample distribution is quite symmetrical.
- The sample standard deviation is: 1905% shows the deviation of the distribution.
- Some states have Percentage of population aged 65 years or more same but the
percentage of the population aged 65 years or more in a state of the United States the
most common (maximum frequency) is 12.1%. Percentage of population aged 65 years
or more in a state of the United States is low: 5.7%. Percentage of population aged 65 or
older in most states is: 17.6%. About the fact that 11.9% variation.

The chart shows the frequency of the percentage of the population aged 65 years or
more in a state of the United States

Frequency Distribution - Quantitative
Age 65
uppe

lowe
r
5.0
6.0
7.0
8.0
9.0
10.0

r
<
<
<
<
<
<

6.0
7.0
8.0
9.0
10.0
11.0


midpoin

widt

frequenc

percen

cumulative
frequenc percen

t

h
1.0
1.0
1.0
1.0
1.0
1.0

y

t

y

5.5
6.5

7.5
8.5
9.5
10.5

1
0
0
1
3
1

2.0
0.0
0.0
2.0
6.0
2.0

t
1
1
1
2
5
6

2.0
2.0
2.0

4.0
10.0
12.0
7


11.0
12.0
13.0
14.0
15.0
16.0
17.0

<
<
<
<
<
<
<

12.0
13.0
14.0
15.0
16.0
17.0
18.0


11.5
12.5
13.5
14.5
15.5
16.5
17.5

1.0
1.0
1.0
1.0
1.0
1.0
1.0

8
13
14
6
2
0
1

16.0
26.0
28.0
12.0
4.0
0.0

2.0

50

100.0

14
27
41
47
49
49
50

28.0
54.0
82.0
94.0
98.0
98.0
100.0

Based on the above frequency distribution table, we find: Percentage of population
aged 65 and older in the popular states in the U.S. is between 11 - 15% (up 82%).
Conclusion: Based on the survey results, the government may consider building the
preferential policies for the elderly such as building more hospitals and nursing homes to
ensure that older people get the best health care or construction of the distribution of
welfare for the elderly in the state in accordance with the current rate of the elderly ....

Age 65


Frequency distribution graph of the variable percentage of the population aged 65
years or more fairly balanced, concentrated in the center. However, the deviation
(Sknewness) of the chart is -0741 <0 indicates that the distribution of the direction of
deviation left over right.
1.3.The average income of people with thousands of dollars
8


From software Megastat/ Descriptive statistics. Then we enter income data in the
table, from which we have the following tables.

Descriptive statistics
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range
Skewness
Kurtosis
coefficient of variation
(CV)
1st quartile
Median
3rd quartile
interquartile range
Mode


Income
50
28.82432
38.33179
6.19127
20.993
59.685
38.692
2.71490
11.83641
21.48%
25.19550
27.85000
31.23175
6.03625
#N/A

The above table we find:
- The number of the states studied are: the 50 states.
- The average income of the people in a state average of U.S. $ 28,824 thousand. The
average income of the people in a state of the U.S. median of $ 27.85 trillion. Thus, 50%
of the states studied had average income of less than $ 27.85 thousand people and 50% of
the states studied had average income of the people is greater than $ 27.85 trillion. The
average income of the average people in a state of roughly median shows the sample
distribution is quite disproportionate.
- The sample standard deviation: 6.19 shows the deviation of the distribution.
- The average income of the people in each state of the United States is different (no
common values). The average income of the people in the lowest state: $ 20,993
thousand. The average income of people in the highest state: $ 59,685 thousand. Range,
the fact that $ 38,692 thousand.

9


The chart shows the frequency of the average income of people in a state of the
United States

Frequency Distribution - Quantitative
lowe

Income
uppe

r
20.00
22.00
24.00
26.00
28.00
30.00
32.00
34.00
36.00
38.00
40.00
42.00
44.00
46.00
48.00
50.00
52.00

54.00
56.00
58.00

r
22.00
24.00
26.00
28.00
30.00
32.00
34.00
36.00
38.00
40.00
42.00
44.00
46.00
48.00
50.00
52.00
54.00
56.00
58.00
60.00

<
<
<
<

<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<

midpoin

widt

Frequenc

percen

t
21.00
23.00
25.00
27.00

29.00
31.00
33.00
35.00
37.00
39.00
41.00
43.00
45.00
47.00
49.00
51.00
53.00
55.00
57.00
59.00

h
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00

2.00
2.00
2.00
2.00
2.00
2.00
2.00
2.00

y
2
7
6
12
7
5
6
1
2
0
1
0
0
0
0
0
0
0
0
1


t
4.0
14.0
12.0
24.0
14.0
10.0
12.0
2.0
4.0
0.0
2.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
2.0

50

100.0

cumulative
frequenc percen
y

2
9
15
27
34
39
45
46
48
48
49
49
49
49
49
49
49
49
49
50

t
4.0
18.0
30.0
54.0
68.0
78.0
90.0
92.0

96.0
96.0
98.0
98.0
98.0
98.0
98.0
98.0
98.0
98.0
98.0
100.0

Based on the above frequency distribution table, we find: The average income of the
people in each state of the United States is common from 22 - 34 thousand dollars
(accounting for 86%). In particular, income from 26 - 28 thousand U.S. dollars accounted
for the highest percentage at 24%.
Conclusion: Based on the results of the investigation, authorities may consider
application of the various public service fees in accordance with the average income in
the state or make policy about fees and other costs related to appropriate treatment ....

10


Income

Frequency distribution graph of the average income variable tend to be concentrated
in the middle. However, some high-income areas than other regions, the income from
56,000 - $ 60,000
1.4. Percent of the population are people of color.

From software Megastat/ Descriptive statistics, then enter the data rate of color on
the table, from which we have the following tables:

Descriptive statistics
The number of
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range

deaths
50
9.902
91.779
9.580
0.3
36.3
36

Skewness
Kurtosis
coefficient of variation

1.130
0.453
96.75%
11



(CV)
1st quartile
Median
3rd quartile
interquartile range
Mode

2.350
6.750
14.975
12.625
3.500

Comment:
The above table we find:
- The number of the states studied are: the 50 states.
- Percentage of the population are people of color in a state of the United States an
average of 9.9%. Percent of the population are people of color in a U.S. state median is
6.75%. Thus, 50% of the states studied had Percent black population is less than 6.75%
and 50% of the states studied had Percent of black population is greater than 6.75%.
Percentage of population aged 65 years or more in a state of the United States in a state
greater than the median for the sample distribution skewed right.
- The sample standard deviation: 9.58% shows the deviation of the distribution.
- Some states have the same percentage of the population are colored people but the
percentage of the population is the most popular color (maximum frequency) of 3.5%.
Percent of the population are people of color in a state of the United States is low: 0.3%.
Percent of the population are people of color in the highest states: 36.3%. Range, the fact
that 36%.

The chart shows the frequency of the average income of people in a state of the
United States

Frequency Distribution - Quantitative
lowe
r
0
5
10
15

The number of deaths
midpoin
upper
t
<
5
3
<
10
8
<
15
13
<
20
18

widt


Frequenc

percen

h
5
5
5
5

y
21
9
7
6

t
42.0
18.0
14.0
12.0

cumulative
frequenc
percent
y
21
42.0
30
60.0

37
74.0
43
86.0
12


20
25
30
35

<
<
<
<

25
30
35
40

23
28
33
37

5
5
5

5

1
4
1
1

2.0
8.0
2.0
2.0

50

100.0

44
48
49
50

88.0
96.0
98.0
100.0

Based on the above frequency distribution table, we find: Percentage of the
population are people of color in each state of the United States is common from 0 - 20%
(up 86%). In particular, the state percentage of the population are people of color from 0 5% accounted for the highest percentage reached 42%.
Conclusion: Based on the results of the investigation, authorities may consider

welfare policy, priority for colored people

Rate of colored people

Frequency distribution graph of the variable percentage of the population are people
of color distribution as skewed left.
2. Use graphs and correlation coefficients to examine the relationship between
the number of deaths due to cardiovascular diseases associated with the remaining
variables:

13


2.1. Graph the relationship between the number of deaths due to cardiovascular
diseases related to the percentage of the population aged 65 years or older
From software Megastat / Correlation / Regession / Scatter Plot. Then we enter the
data the number of deaths and the percentage of the population aged 65 years or more to

Number of deaths

the table, we have the following tables.

Age 65

Based on the above chart, we find: Dynamics of linear dispersion form. Thus,
between the number of deaths due to cardiovascular diseases related to the percentage of
the population aged 65 years and older have proportional relationship to each other.
2.2. Graph the relationship between the number of deaths due to cardiovascular
diseases associated with per capita income of the people
From software Megastat a / Correlation / Regession / Scatter Plot. Then we enter

the death toll figures and earnings on the table, we have the following tables.

14


Number of deaths

Income

Based on the above chart, we find: dispersion graph is not linear. Thus, between the
number of deaths due to cardiovascular diseases associated with per capita income does
not have a relationship with each other.

1.3. Graph the relationship between the number of deaths due to cardiovascular
diseases associated with the percentage of the population are people of color
From software Megastat/ Correlation / Regession / Scatter Plot. Then we enter the
data the number of deaths and percent of the population are people of color to the table,
we have the following tables.

15


Number of deaths

Rate of deaths

Based on the above chart, we find: Graph the linear dispersion of the form ->
Between The number of deaths from cardiovascular diseases associated with the rate of
color have relationship with each other.
The correlation coefficient between the variables

From software Megastat / Correlation / Regresion / Correlation Matrix, then we put
all the data the number of deaths, age 65, of income, the rate of color on the table, with
the following results:
Correlation Matrix
Number of
Number of deaths
Age 65
Income
Rate of colored people

deaths
1.000
.788
-.044
.312

Age 65

Income

1.000
.040
-.095

1.000
-.093

Rate of colored
people


1.000

Based on the above table we find:
- Correlation between the number of deaths and the percentage of the population
aged 65 years and older is 0788
- Correlation between the number of deaths and the average income is -0044
16


- Correlation between the number of deaths and the percentage of the population are
people of color is 0312
Thus, the percentage of the population aged 65 years and older have greatest impact
on the change of the number of deaths related to cardiovascular disease, then the
percentage of the population are people of color. The average income does not affect the
number of deaths related to cardiovascular disease.
3. Confidence interval estimate for the average number of deaths for the state in
Region 1 and Region 2
3.1. The average number of deaths for the states in Region 1
a. Basic statistical description of the number of deaths in the states in Region 1
From software Megastat/ Descriptive statistics. Then we enter the data, the number
of deaths in the first tables, which have the following data:
Descriptive statistics
The number of
Count
Mean
sample variance
sample standard deviation
Minimum
Maximum
Range

1st quartile
Median
3rd quartile
interquartile range
Mode

deaths
26
257.138
3,361.783
57.981
90.9
340.4
249.5
225.925
269.850
297.600
71.675
#N/A

b.Estimate the average number of deaths for the states in Region 1
From the data in the table above, we use The software Megastat/Confidence
interval-mean, enter data into the table, we have the following tables:
Confidence interval – mean
17


95%
257.138
57.981

26
2.060
23.419
280.557
233.719

confidence level
Mean
std. dev.
N
t (df = 25)
half-width
upper confidence limit
lower confidence limit

Based on the above results, we can estimate the confidence interval of the number
of deaths due to cardiovascular diseases related to the average in the states of Region 1 is
in the range (233 719; 280 557). In other words, we can estimate that 95% of the states in
Region 1 Number of deaths due to cardiovascular diseases related to range from 233.7 to
280.6 of 100,000 people.

3.2. The average number of deaths for the states in Region 2
Do the same as the above one.
a. Basic statistical description of the number of deaths in the state in Region 2.

Descriptive statistics
The number
Count
Mean
sample variance

sample standard deviation
Minimum
Maximum
Range
1st quartile
Median
3rd quartile
interquartile range
Mode

of deaths
24
260.921
3,138.122
56.019
130.8
377.5
246.7
222.000
260.100
295.400
73.400
#N/A

18


b. Estimate the average number of deaths for the states in Region 2.
Confidence interval - mean
95%

260.921
56.019
24
2.069
23.655
284.576
237.266

confidence level
mean
std. dev.
n
t (df = 23)
half-width
upper confidence limit
lower confidence limit

Confidence interval of the number of deaths due to cardiovascular diseases related
to the average in the states of Region 2 is in the range (237 266; 284 576). In other
words, we can estimate that 95% of the states in Region 2 deaths related to
cardiovascular diseases in the range from 237.3 to 284.6 of 100,000 people.
4.1 Comparison of the average number of deaths for the state in Zone 1 and Zone 2
From software Megastat / Hypothesis tets / Compare Two Independent Groups.
Then we enter the data of the dead zone 1 and zone 2 on the table, then we have the
following tables:
Hypothesis Test: Independent Groups (t-test, pooled variance)
The number of

Group 2


deaths: Group 1
257.138
57.981
26

260.921
56.019
24
48
-3.7824

mean
std. dev.
n
df
difference (The number of deaths -

0

Group 2)
pooled variance
pooled std. dev.
standard error of difference
hypothesized difference

-0.23
.8158

t
p-value (two-tailed)


3,254.6121
57.0492
16.1489

19


Based on the above results, we see: With a significance level α = 5%, the number of
deaths due to cardiovascular diseases related to state average in Zone 1 and Zone 2 is the
same (p-value>α).
4.2. Comparing the average income of the people of the state in Zone 1 and Zone 2
We do the same as Section 4.2 we have the following results:
Hypothesis Test: Independent Groups (t-test, pooled variance)
Income

Group 2

Group 1
28.40842
4.82352
26

29.27488
7.48108
24

mean
std. dev.
n


48
-0.866452
38.935179
6.239806
1.766297
0

df
difference (Thu nhập - Group 2)
pooled variance
pooled std. dev.
standard error of difference
hypothesized difference

-0.49
.6260

t
p-value (two-tailed)

Based on the above results, we see: With a significance level α = 5%, the average
income of the average people in the states of Zone 1 and Zone 2 is the same (p-value>
α -> nobasis reject the hypothesis H0).
5. Estimate a linear regression model with the dependent variable is the number
of deaths and the remaining independent variables are variables

Adjusted R²
R
Std. Error


0.774
0.759
0.880
27.731

SS

Df

N
K
Dep. Var.

50
3
Number of deaths

ANOVA
table
Source

Regression 121,024.7645

3

MS
40,341.588
2


F

p-value

52.46

6.92E-15
20


Residual
Total

35,375.1597
156,399.9242

46
49

769.0252

Regression output

confidence interval
std.

variables

coefficients


Intercept
Age 65
Income
Rate of

-60.1955
24.5202
-0.3757

error
32.6430
2.0904
0.6430

Colored

2.2768

0.4171

t (df=46)

p-value

95% lower 95% upper

-1.844
11.730
-0.584


.0716
2.01E-15
.5619

-125.9025
20.3124
-1.6700

5.5114
28.7280
0.9186

5.459

1.86E-06

1.4373

3.1163

People
5.1.Explain the significance of the regression coefficient and the coefficient of R2
Based on the above table we find:
The model obtained is:
(Number of deaths) = -60.2 + 24.5xAge65 - 0.4xIncome+ 2.3xRate of colored people
The meaning of the regression coefficients:
+ 24.5: If the income and the rate of color are kept constant, while the percentage of
the population aged 65 and older increased by 1%, themselves in civil 100.00, Number of
deaths related to cardiovascular disease increased by 24.5 people .
+ (-0.4): If the percentage of the population aged 65 years and older and the

proportion of color is held constant, while the average income increased $ 1,000,
themselves in civil 100.00, Number of deaths related to heart disease reduction circuit
0.4.
+ 2.3: If the percentage of the population aged 65 years and older and income is held
constant, while the percentage of people of color increased by 1%, themselves in civil
100.00, Number of deaths related to cardiovascular disease increased by 2.3 people.
- The meaning of R2 = 0774: With 03 independent variable is the percentage of the
population aged 65 years and older, the average income of the people and the percentage

21


of the population are people of color, the model explained 77.4% the change of the
number of deaths related to cardiovascular diseases.
5.2. Use the appropriate expertise to know which independent variables affect
and do not affect the dependent variable? Since it can be made the comment about
the factors can affect the rate of deaths due to cardiovascular diseases. Whether
there are other factors that can affect mortality rates this?
Based on the table above, to test the independent variables that affect and do not
affect the dependent variable, we build the 03 pairs following assumptions:
+ Pair of hypothesis 1:
H0: β1 = 0 (Percentage of the population aged 65 years and older did not affect the
number of deaths)
H1: β1 ≠ 0 (Percentage of the population aged 65 years and older can affect the
number of deaths)
+ Pair of hypothesis 2:
H0: β2 = 0 (average income does not affect the number of deaths)
H1: β2 ≠ 0 (average income may affect the number of deaths)
+ Pair of hypothesis 3:
H0: β3 = 0 (average income does not affect the number of deaths)

H1: β3 ≠ 0 (average income may affect the number of deaths)
To test 3 pairs assumptions, we observe the value of P - Value obtained in the original
spreadsheet:
+ With assumption 1 pair: P-Value = 2x10-15 <α = 0.05 -> reject the hypothesis H0
-> Percentage of the population aged 65 years and older may affect the number of deaths
due to heart-related diseases circuit.
+ With assumption 2 pairs: P-Value = 0.56> α = 0.05 -> accept the hypothesis H0 ->
The average income of the people does not affect the number of deaths due to
cardiovascular-related diseases.
22


+ With assumption 3 pairs: P-Value = 1.86x10-6 <α = 0.05 -> reject the hypothesis
H0 -> Percentage of the population are people of color can affect the number of deaths
due to heart-related diseases circuit.
Conclusion: With three independent variables of this study, only two variables is the
percentage of the population aged 65 years or more and the percentage of the population
are people of color can affect the number of deaths due to cardiovascular-related
diseases. Thus, there are other factors that can affect the number of deaths due to
cardiovascular-related diseases that we need more research as: The number of doctors per
100,000 population, the ratio of male / female per 100,000 population, ...

5.3. Use F for testing whether the model makes sense or not? If the meaning of
the results obtained.
Based on the test results, we built a linear regression model shows the
relationship between the number of deaths due to cardiovascular diseases related to
(dependent variable) with the percentage of the population aged 65 years up and the
percentage of the population are people of color (independent variables).

Regression Analysis


Adjusted R²
R
Std. Error

0.772
0.762
0.879
27.536

N
K
Dep. Var.

SS
120,762.185

Df

MS

F

p-value

2

60,381.0927

79.63


8.04E-16

47

758.2498

50
2
Number of deaths

ANOVA
table
Source
Regression
Residual
Total

4
35,637.7388
156,399.924
2

Regression output

49

confidence interval
23



std.

variables

coefficients

Intercept

-70.7553

Age 65

24.4814

2.0747

11.800

2.2987

0.4125

5.573

Error
26.993

t (df=47)


p-value

-2.621

.0118

1

Rate of
colored
people

1.18E15
1.18E06

95%

95%

lower
-

upper

125.0585

-16.4522

20.3077


28.6551

1.4689

3.1285

The model obtained is:
(Number of deaths) = -70.7 + 24.5 x Age 65 + 2.3x rate of colored people
To test whether the model makes sense or not, we use test pair of the following
assumptions:
H0: β1 = β2 = 0 (Percentage of the population aged 65 years and older and the rate of
color does not affect the number of deaths)
H1: At least one coefficient β ≠ 0 (There are at least 1 in 2 variables Percentage of
population aged 65 years or older or black rate affects the number of deaths)
Based on the results of testing expertise ANNOVA and F as the table above we have:
- The meaning of R2 = 0772 -> The model makes sense in explaining the variation of
the number of deaths due to cardiovascular-related diseases: With 02 independent
variable is the percentage of the population aged 65 years and older and percentage of the
population are people of color, the model explained 77.2% of the change of the number
of deaths related to cardiovascular diseases.
- The coefficient β> 0 that depends proportional relationship with the independent
variables.
- Value P-Value in testing F is 8.04x10-16 <α = 0.05 -> reject the hypothesis H0 ->
At least one of the two variables Percentage of population aged 65 years or more, or the
rate of color affect the number of deaths.

24


Conclusion: To reduce the rate of deaths from cardiovascular disease, the

government needs more investment attention to health systems, health care for the
elderly (over 65 years).

Conclusion: To reduce the rate of deaths from cardiovascular disease, the
government needs more investment attention to health systems, health care for the
elderly (over 65 years).

5.4. Predict the percentage of deaths in one state with the independent variables,
respectively:
15% aged 65 years or older and average income 25000usd, 4% black.
Explanation for the results received.

Predicted values for: Deaths

Age
65
15

95% Confidence

95% Prediction

Interval

Interval

Rate of
colored

Predicted


people
4

305.6603

lower

upper

292.1914 319.1291

lower

upper

248.6504 362.6701

Leverage
0.059

If one state Percentage of population aged 65 to 15 and the percentage of the
population are people of color is 4%, then: The number of deaths due to cardiovascularrelated diseases will range from 292 to 319 people in 100,000 people.
References:
1. Curriculum Decision Management - Dr. Nguyen Manh The - PGSM

25



×