Running head: DATA ANALYSIS SUN COAST PROJECT 1
Data Analysis Sun Coast Project
Nguyen Tien Thanh
ID: 280113
Columbia Southern University
DATA ANALYSIS SUN COAST PROJECT
2
Data Analysis: Correlation, Regression, t Test, and ANOVA
The Sun Coast Remediation’s data are meet assumption and appropriate for parametric
statistical procedures. For further conclusion, in this assignment we will use analysis including:
correlation analysis, simple regression analysis, multiple regression analysis, independent sample
t test, paired sample t test and ANOVA. The results, conclusions from these analysis will support
us to make right decisions.
Correlation Analysis
The hypotheses:
H01:There is not a relationship between size of PM and numbers of employee’s sick days.
HA1:There is a relationship between size of PM and numbers of employee’s sick days.
Data output results from Excel Toolpak:
mean annual sick
days per employee
Microns
Microns
mean annual sick days per
employee
1
-0.715984185
1
Regression Statistics
Multiple R
0.715984185
R Square
0.512633354
Adjusted R Square
0.507807941
Standard Error
1.327783455
Observations
103
ANOVA
Df
Regression
1
Residual
101
Total
102
SS
187.295323
9
178.063899
4
365.359223
3
MS
187.2953239
1.763008905
F
106.236175
8
Significance
F
1.89059E-17
DATA ANALYSIS SUN COAST PROJECT
Coefficients
10.0814448
3
0.52237655
4
Intercept
Microns
3
Standard
Error
0.31515696
9
31.9886464
1.16929E-54
9.456258184
0.05068126
7
-10.30709347
1.89059E-17
-0.622914554
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
10.7066314
8
0.42183855
4
Upper 95.0%
9.456258184
10.70663148
0.622914554
-0.421838554
The value of Pearson correlation coefficientr = -0.715. It meansthat particulate matter
size, as measured in microns, is strongly and negatively correlated with mean annual sick days
per employee. The value of r2=0.51, it means that 51% of the variability in employee sick days is
explained by particular matter size.
The value of p is 1.89E-17 for microns,it is smaller than the value of alpha 0.05.When the
p value is smaller than the alpha, the null hypothesis isrejected and the alternative hypothesis is
accepted that there is statistically significant relationship between particular matter size and
employee sick days.
Simple Regression Analysis
Restate the hypotheses:
H02:There is not a relationship between the safety training expenditure and the lost time hours.
HA2:There is a relationship between the safety training expenditure and the lost time hours.
Data output results from Excel Toolpak:
Regression Statistics
Multiple R
0.939559324
R Square
0.882771723
Adjusted R Square
0.882241279
ANOVA
Standard Error
24.61328875
Df
SS
Observations
223
Residual
221
1008202.10
5
133884.890
3
Total
222
1142086.996
Regression
1
Coefficients
Intercept
273.449419
Standard
Error
2.665261963
MS
1008202.105
F
1664.21068
7
Significance
F
7.6586E-105
605.8139831
t Stat
102.5975768
P-value
2.1412E-188
Lower 95%
Upper 95%
268.1968373
278.7020007
Lower 95.0%
268.1968373
Upper 95.0%
278.7020007
DATA ANALYSIS SUN COAST PROJECT
safety training
expenditure
-0.143367741
0.00351436
8
-40.79473848
4
7.6586E-105
-0.150293705
-0.136441778
-0.150293705
The value of Multiple R is0.939, close to 1, it means that there is strong correlation between
the safety training expenditureand the lost time hours. The value ofR square (R2) is 0.88 indicates
that 88% of the variation in the lost time hours is explained by the regression model. This is a
high R2.
The p value is7.65E-105smaller than the alpha value 0.05.Sothe null hypothesis is
rejected and the alternative hypothesis is accepted. There is a relationship between the safety
training expenditure and the lost time hours.
The coefficient for safety training expenditure is -0.143 indicating a negative correlation
between lost time hours and the safety training expenditure.The model can be expressed as a
predictive equation:
Y = a + bX
Lost time hours = 273.44 + (-0.143)(safety training expenditure).
Multiple Regression Analysis
Restate the hypotheses:
H03:There is not a relationship between frequency, angle in degrees, chord length,
velocity,displacement and decibel level.
HA3:There is a relationship between frequency, angle in degrees, chord length, velocity,
displacement and decibel level.
Data output results from Excel Toolpak:
Regression Statistics
Multiple R
0.601841822
R Square
0.362213579
Adjusted R Square
0.360083364
Standard Error
5.51856585
-0.136441778
DATA ANALYSIS SUN COAST PROJECT
Observations
5
1503
ANOVA
Df
SS
Regression
MS
5
25891.88784
5178.377569
Residual
1497
45590.48986
30.45456904
Total
1502
71482.3777
Coefficients
Intercept
Standard
Error
t Stat
F
170.036146
7
P-value
Significance
F
2.1289E-143
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
126.8224555
0.623820253
203.2996763
0
125.5988009
128.0461101
125.5988009
128.0461101
-0.0011169
4.7551E-05
-23.48846042
-0.001210174
-0.001023627
-0.001210174
-0.001023627
Angle in Degrees
0.047342353
0.037308069
1.268957462
-0.025839288
0.120523993
-0.025839288
0.120523993
Chord Length
Velocity (Meters
per Second)
-5.495318335
2.927962181
-1.876840613
4.0652E-104
0.20465350
1
0.06073430
9
-11.23866234
0.248025671
-11.23866234
0.248025671
0.083239634
0.009300188
8.950317436
1.02398E-18
0.064996851
0.101482417
0.064996851
0.101482417
Displacement
-240.5059086
16.51902666
-14.55932686
5.20583E-45
-272.9088041
-208.103013
-272.9088041
-208.103013
Frequency (Hz)
The value ofMultiple R is0.6reveals, it means thatthe frequency, angle in degrees, chord
length, velocity, displacement aremoderately correlated with decibel level. R square (R2) is 0.36,
it means 36% of the variability in the decibel levelexplained by frequency, angle in degrees,
chord length, velocity, displacement. This is a weak R2.
Using an alpha of 0.05 to compare with thep value of each variable:
for Frequency (Hz), a p value of 4.06E-104< 0.05, therefore, there is statistical
significance between Frequency and decibel level.
for Angle in Degrees, a p value of 0.2> 0.05,therefore, there is no statistical significance
between Angle in Degrees and decibel level.
for Chord Length, a p value of 0.06> 0.05, therefore, there is no statistical significance
between Chord Length and decibel level.
for Velocity (meters per second),a p value of 1.02E-18 < 0.05, therefore, there is
statistical significance between Velocity and decibel level.
DATA ANALYSIS SUN COAST PROJECT
6
and for Displacement, a p value of 5.2E-45 < 0.05, therefore, there is statistical
significance between Displacement and decibel level.
Summary, there is a statistically significant relationship between frequency, velocity,
displacement and decibel level. The coefficient for frequency is -0.001 and displacement is
-240.5 indicating a negative correlation between frequency, displacement and the decibel level.
The coefficient for velocity is 0.083 indicating a positive correlation between velocity and the
decibel level.
The predictive equation is expressed as following:
Y = a + b1X1 + b2X2 +…+ bnXn
Decibel level= 126.8 + (-0.001)(Frequency (Hz)) + (-240)(Displacement) +
0.083(Velocity).
Independent Sample t Test
Restate the hypotheses:
H04:The revised new employee training is not more effective than the prior training.
HA4:The revised new employee training is more effective than the prior training.
Data output results from Excel Toolpak:
t-Test: Two-Sample Assuming Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
Df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
Prior
Training
69.79032258
122.004495
62
0
87
-9.666557191
9.69914E-16
1.662557349
1.93983E-15
Revised
Training
84.77419355
26.96456901
62
DATA ANALYSIS SUN COAST PROJECT
t Critical two-tail
7
1.987608282
The mean value of the Prior Training group (69.8) is lower than mean value of Revised
Training group (84.8). Besides, a pvalue has been found at 1.94E-15 smaller than the alpha of
0.05. Thus, the null hypothesis is rejected and the alternative hypothesis is accepted. Therevised
new employee training is more effective than the prior training.
Dependent Sample t Test
Restate the hypotheses:
H05:There is not an increase in blood lead level from pre-exposure baseline measurements.
HA5:There is an increase in blood lead from pre-exposure baseline measurements.
Data output results from Excel Toolpak:
t-Test: Paired Two Sample for Means
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
Df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
Pre-Exposure
μg/dL
32.85714286
150.4583333
49
0.992236043
0
48
-1.929802563
0.029776357
1.677224196
0.059552714
2.010634758
Post-Exposure
μg/dL
33.28571429
155.5
49
There is a very slightly increase in the mean values between the two groups from 32.8
ofPre-Exposure to 33.3 of Post-Exposure.Furthermore, the p value of 0.059is greater than the
alpha of 0.05, the null hypothesis is accepted that there is no statistically significant difference in
DATA ANALYSIS SUN COAST PROJECT
8
blood lead levels between the Pre-Exposure groupand the Post-Exposure group, and the
alternative hypothesis is rejected.
ANOVA
Restate the hypotheses:
H06:There are not differences in return-on-investment between air monitoring, soil remediation,
water reclamation, and health and safety training.
HA6:There are differences in return-on-investment between air monitoring, soil remediation,
water reclamation, and health and safety training.
Data output results from Excel Toolpak:
Anova: Single Factor
SUMMARY
Groups
A = Air
B = Soil
C = Water
D = Training
Coun
t
20
20
20
20
Su
m
178
182
140
108
ANOVA
Source of Variation
Between Groups
Within Groups
SS
182.8
388.4
df
Total
571.2
Average
8.9
9.1
7
5.4
Variance
9.357894737
3.042105263
6.631578947
1.410526316
MS
F
P-value
F crit
3 60.93333333 11.92310333 1.75888E-06 2.72494392
76 5.110526316
79
There are obvious differences between average values of air monitoring (8.9), soil
remediation (9.1), water reclamation (7) and health and safety training (5.4). On the other hand,
the ANOVA p value of 1.75E-06 < 0.05 (alpha), therefore, the null hypothesis is rejected and the
alternative hypothesis is accepted that there are statistically significant differences in return-on-
DATA ANALYSIS SUN COAST PROJECT
9
investment between air monitoring, soil remediation, water reclamation and health and safety
training.
DATA ANALYSIS SUN COAST PROJECT
10
References
Creswell, J. W., & Creswell, J. D. (2018).Research design: Qualitative, quantitative, and mixed
method approaches (5th ed.). Los Angeles, CA: Sage.