Chapter 16
Regression Analysis: Model Building
Learning Objectives
1.
Learn how the general linear model can be used to model problems involving curvilinear
relationships.
2.
Understand the concept of interaction and how it can be accounted for in the general linear model.
3.
Understand how an F test can be used to determine when to add or delete one or more variables.
4.
Develop an appreciation for the complexities involved in solving larger regression analysis
problems.
5.
Understand how variable selection procedures can be used to choose a set of independent variables
for an estimated regression equation.
6.
Learn how analysis of variance and experimental design problems can be analyzed using a regression
model.
7.
Know how the DurbinWatson test can be used to test for autocorrelation.
16 - 1
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Solutions:
1.
a.
b.
The Minitab output is shown below:
The regression equation is
Y = 6.8 + 1.23 X
Predictor Coef SE Coef T p
Constant 6.77 14.17 0.48 0.658
X 1.2296 0.4697 2.62 0.059
S = 7.269 Rsq = 63.1% Rsq(adj) = 53.9%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 362.13 362.13 6.85 0.059
Residual Error 4 211.37 52.84
Total 5 573.50
Since the pvalue corresponding to F = 6.85 is 0.59 > the relationship is not significant.
c.
The scatter diagram suggests that a curvilinear relationship may be appropriate.
d.
The Minitab output is shown below:
16 2
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
The regression equation is
Y = 169 + 12.2 X 0.177 XSQ
Predictor Coef SE Coef T p
Constant 168.88 39.79 4.24 0.024
X 12.187 2.663 4.58 0.020
XSQ 0.17704 0.04290 4.13 0.026
S = 3.248 Rsq = 94.5% Rsq(adj) = 90.8%
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 541.85 270.92 25.68 0.013
Residual Error 3 31.65 10.55
Total 5 573.50
e. Since the pvalue corresponding to F = 25.68 is .013 < the relationship is significant.
2.
f.
y�= 168.88 + 12.187(25) 0.17704(25)2 = 25.145
a.
The Minitab output is shown below:
The regression equation is
Y = 9.32 + 0.424 X
Predictor Coef SE Coef T p
Constant 9.315 4.196 2.22 0.113
X 0.4242 0.1944 2.18 0.117
S = 3.531 Rsq = 61.4% Rsq(adj) = 48.5%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 59.39 59.39 4.76 0.117
Residual Error 3 37.41 12.47
Total 4 96.80
The high pvalue (.117) indicates a weak relationship; note that 61.4% of the variability in y has
been explained by x.
b.
The Minitab output is shown below:
The regression equation is
Y = 8.10 + 2.41 X 0.0480 XSQ
Predictor Coef SE Coef T p
Constant 8.101 4.104 1.97 0.187
X 2.4127 0.4409 5.47 0.032
XSQ 0.04797 0.01050 4.57 0.045
S = 1.279 Rsq = 96.6% Rsq(adj) = 93.2%
Analysis of Variance
SOURCE DF SS MS F p
16 3
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Regression 2 93.529 46.765 28.60 0.034
Residual Error 2 3.271 1.635
Total 4 96.800
At the .05 level of significance, the relationship is significant; the fit is excellent.
y�= 8.101 + 2.4127(20) 0.04797(20)2 = 20.965
a.
The scatter diagram shows some evidence of a possible linear relationship.
b.
The Minitab output is shown below:
The regression equation is
Y = 2.32 + 0.637 X
Predictor Coef SE Coef T p
Constant 2.322 1.887 1.23 0.258
X 0.6366 0.3044 2.09 0.075
S = 2.054 Rsq = 38.5% Rsq(adj) = 29.7%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 18.461 18.461 4.37 0.075
Residual Error 7 29.539 4.220
Total 8 48.000
c. The following standardized residual plot indicates that the constant variance assumption is
not satisfied.
2.0
1.5
Standardized Residual
3.
c.
1.0
0.5
0.0
-0.5
-1.0
-1.5
3
d.
4
5
6
Fitted Value
7
8
The logarithmic transformation does not appear to eliminate the wedgedshaped pattern in the
above residual plot. The reciprocal transformation does, however, remove the wedgeshaped
16 4
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
pattern. Neither transformation provides a good fit. The Minitab output for the reciprocal
transformation and the corresponding standardized residual pot are shown below.
The regression equation is
1/Y = 0.275 0.0152 X
Predictor Coef SE Coef T p
Constant 0.27498 0.04601 5.98 0.000
X 0.015182 0.007421 2.05 0.080
S = 0.05009 Rsq = 37.4% Rsq(adj) = 28.5%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 0.010501 0.010501 4.19 0.080
Residual Error 7 0.017563 0.002509
Total 8 0.028064
2.0
Standardized Residual
1.5
1.0
0.5
0.0
-0.5
-1.0
0.150
4.
a.
0.175
0.200
Fitted Value
0.225
0.250
The Minitab output is shown below:
The regression equation is
Y = 943 + 8.71 X
Predictor Coef SE Coef T p
Constant 943.05 59.38 15.88 0.000
X 8.714 1.544 5.64 0.005
S = 32.29 Rsq = 88.8% Rsq(adj) = 86.1%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 33223 33223 31.86 0.005
Residual Error 4 4172 1043
Total 5 37395
16 5
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
b. pvalue = .005 < = .01; reject H0
5.
The Minitab output is shown below:
The regression equation is
Y = 433 + 37.4 X 0.383 XSQ
Predictor Coef SE Coef T p
Constant 432.6 141.2 3.06 0.055
X 37.429 7.807 4.79 0.017
XSQ 0.3829 0.1036 3.70 0.034
S = 15.83 Rsq = 98.0% Rsq(adj) = 96.7%
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 36643 18322 73.15 0.003
Residual Error 3 751 250
Total 5 37395
b. Since the linear relationship was significant (Exercise 4), this relationship must be significant. Note
also that since the pvalue of .003 < = .05, we can reject H0.
6.
c.
The fitted value is 1302.01, with a standard deviation of 9.93. The 95% confidence interval is
1270.41 to 1333.61; the 95% prediction interval is 1242.55 to 1361.47.
a.
The scatter diagram is shown below:
b.
No; the relationship appears to be curvilinear.
c.
Several possible models can be fitted to these data, as shown below:
y�= 2.90 0.185x + .00351x2 Ra2 .91
16 6
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
1
y� 0.0468 14.4 Ra2 .91
x
a.
The scatter diagram is shown below:
250
200
Weekday Ridership
7.
150
100
50
0
0
20
40
60
80
Miles of Track
A simple linear regression model does not appear to be appropriate. There appears to be a
curvilinear relationship between the two variables.
b.
The Minitab output is shown below:
The regression equation is
Riders = - 7.9 + 2.14 Miles
Predictor
Constant
Miles
S = 41.45
Coef
-7.95
2.1362
SE Coef
19.00
0.5183
R-Sq = 53.1%
T
-0.42
4.12
P
0.682
0.001
R-Sq(adj) = 50.0%
Analysis of Variance
Source
Regression
Residual Error
DF
1
15
SS
29187
25769
16 7
MS
29187
1718
F
16.99
P
0.001
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Total
16
54956
Unusual Observations
Obs
Miles
Riders
8
51.0
231.0
Fit
101.0
SE Fit
14.4
Residual
130.0
St Resid
3.34R
R denotes an observation with a large standardized residual
The corresponding standardized residual plot is shown below:
4
Standardized Residual
3
2
1
0
-1
-2
0
20
40
60
80
100
Fitted Value
120
140
160
There is an unusual trend in the points. There is also some indication that the variance may not be
constant.
c.
The Minitab output is shown below:
The regression equation is
lnRiders = 2.59 + 0.0357 Miles
Predictor
Constant
Miles
Coef
2.5864
0.035670
S = 0.5267
SE Coef
0.2415
0.006586
R-Sq = 66.2%
T
10.71
5.42
P
0.000
0.000
R-Sq(adj) = 63.9%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
15
16
SS
8.1376
4.1612
12.2988
16 8
MS
8.1376
0.2774
F
29.33
P
0.000
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
Unusual Observations
Obs
Miles
lnRiders
8
51.0
5.442
Fit
4.406
SE Fit
0.183
Residual
1.037
St Resid
2.10R
R denotes an observation with a large standardized residual
The corresponding standardized residual plot is shown below:
Standardized Residual
2
1
0
-1
-2
3.0
3.5
4.0
Fitted Value
4.5
5.0
5.5
The standardized residual plot indicates that the transformation has eliminated the problem
identified in the residual plot constructed in part (b).
d.
The Minitab output is shown below:
The regression equation is
1/Riders = 0.0656 -0.000998 Miles
Predictor
Constant
Miles
Coef
0.06564
-0.0009976
S = 0.02330
SE Coef
0.01068
0.0002914
R-Sq = 43.9%
T
6.14
-3.42
P
0.000
0.004
R-Sq(adj) = 40.1%
Analysis of Variance
Source
Regression
Residual Error
DF
1
15
SS
0.0063645
0.0081456
16 9
MS
0.0063645
0.0005430
F
11.72
P
0.004
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Total
16
0.0145101
Unusual Observations
Obs
Miles
1/Riders
17
9.0
0.12500
Fit
0.05666
SE Fit
0.00857
Residual
0.06834
St Resid
3.15R
R denotes an observation with a large standardized residual
The corresponding standardized residual plot is shown below:
Standardized Residual
3
2
1
0
-1
-0.01
8.
0.00
0.01
0.02
0.03
Fitted Value
0.04
0.05
0.06
e.
The standardized residual plot corresponding to the reciprocal transformation indicates an unusual
pattern that is not evident in the standardized residual plot for the logarithmic transformation. The
estimated regression equation for the logarithmic transformation also provides a better fit. We
recommend using the logarithmic transformation.
a.
The scatter diagram is shown below:
16 10
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
A simple linear regression model does not appear to be appropriate. There appears to be a
curvilinear relationship between the two variables.
b.
The Minitab output is shown below:
The regression equation is
Price = 33829 - 4571 Rating + 154 RatingSq
Predictor
Constant
Rating
RatingSq
Coef
33829
-4571
153.55
S = 668.312
SE Coef
13657
1688
51.72
R-Sq = 70.2%
T
2.48
-2.71
2.97
P
0.029
0.019
0.012
R-Sq(adj) = 65.3%
Analysis of Variance
Source
Regression
Residual Error
Total
c.
DF
2
12
14
SS
12643604
5359686
18003290
MS
6321802
446641
F
14.15
P
0.001
The Minitab output is shown below:
The regression equation is
logPrice = - 10.2 + 10.4 logRating
Predictor
Constant
logRating
Coef
-10.152
10.422
S = 0.283438
SE Coef
1.890
1.544
R-Sq = 77.8%
T
-5.37
6.75
P
0.000
0.000
R-Sq(adj) = 76.1%
16 11
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Analysis of Variance
d.
9.
Source
Regression
Residual Error
Total
DF
1
13
14
SS
3.6595
1.0444
4.7038
MS
3.6595
0.0803
F
45.55
P
0.000
The model in part (c) is preferred because it provides a better fit.
a.
A simple linear regression model appears to be appropriate.
b.
Note the line drawn through the data. This line indicates a possible curvilinar relationship
between these two variables.
c.
In the Minitab output that follows IndexSq denotes the square of the Cost-of-Living Index.
The regression equation is
16 12
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
Creative Class (%) = 49.2 - 0.673 Cost-of-Living Index + 0.00282 IndexSq
+ 0.404 Income
Predictor
Constant
Cost-of-Living Index
IndexSq
Income
S = 2.70934
Coef
49.24
-0.6725
0.002821
0.40418
R-Sq = 64.4%
SE Coef
17.25
0.2888
0.001223
0.06772
T
2.85
-2.33
2.31
5.97
P
0.006
0.024
0.026
0.000
R-Sq(adj) = 62.0%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
3
46
49
SS
609.96
337.67
947.62
MS
203.32
7.34
F
27.70
P
0.000
At the .05 level of significance there is overall significance. And, each of the three independent
variables (Cost-of-Living Index, IndexSq, and Income) is significant.
d.
Cost-of-Living Index = 99, IndexSq = 9801, and Income = 42.984
Estimate = 49.24 - .6725(99)+.002821(9801) + .40418(42.984)= 27.7%
The primary concern of using this estimate is that the estimated regression equation was developed for
metropolitan areas with a population of 1,000,000 or more. But, the population for Tucson is so close to
1,000,000 that the estimated regression equation should still provide a good estimate. Note to Instructor: the
actual value of the percentage of the workforce in creative fields for Tucson reported by Kiplinger was
31.1%.
10. a.
SSR = SST SSE = 1030
MSR = 1030
MSE = 520/25 = 20.8
F = 1030/20.8 = 49.52
Using Excel or Minitab, the pvalue corresponding to F = 49.52 is .000.
Because pvalue ≤ α, x1 is significant.
b.
F
(520 100) / 2
48.3
100 / 23
Using Excel or Minitab, the pvalue corresponding to F = 48.3 is .000.
Because pvalue ≤ α, the addition of variables x2 and x3 is significant.
11. a.
SSE = SST SSR = 1805 1760 = 45
MSR = 1760/4 = 440
MSE =45/25 = 1.8
F = 440/1.8 = 244.44
Using Excel or Minitab, the pvalue corresponding to F = 244.44 is .000.
16 13
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Because pvalue ≤ α, the overall relationship is significant.
b.
SSE(x1, x2, x3, x4) = 45
c.
d.
SSE(x2, x3) = 1805 1705 = 100
(100 45) / 2
15.28
1.8
Using Excel or Minitab, the pvalue corresponding to F = 15.28 is .000.
F
Because pvalue ≤ α, x1 and x4 contribute significantly to the model.
12. a.
A portion of the Minitab output follows:
The regression equation is
Scoring Avg. = 46.3 + 14.1 Putting Avg.
Predictor
Constant
Putting Avg.
S = 0.510596
Coef
46.277
14.103
SE Coef
6.026
3.356
T
7.68
4.20
R-Sq = 38.7%
P
0.000
0.000
R-Sq(adj) = 36.5%
Analysis of Variance
Source
Regression
Residual Error
Total
b.
DF
1
28
29
SS
4.6036
7.2998
11.9035
MS
4.6036
0.2607
F
17.66
P
0.000
A portion of the Minitab output follows:
The regression equation is
Scoring Avg. = 59.0 - 10.3 Greens in Reg. + 11.4 Putting Avg. - 1.81 Sand
Saves
Predictor
Constant
Greens in Reg.
Putting Avg.
Sand Saves
S = 0.407808
Coef
59.022
-10.281
11.413
-1.8130
SE Coef
5.774
2.877
2.760
0.9210
R-Sq = 63.7%
T
10.22
-3.57
4.14
-1.97
P
0.000
0.001
0.000
0.060
R-Sq(adj) = 59.5%
Analysis of Variance
Source
Regression
Residual Error
Total
c.
DF
3
26
29
SS
7.5795
4.3240
11.9035
MS
2.5265
0.1663
F
15.19
P
0.000
SSE(reduced) = 7.2998
16 14
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
SSE(full) = 4.3240
MSE(full) = .1663
SSE(reduced) - SSE(full) 7.2998 - 4.3240
2
F number of extra terms
8.95
MSE(full)
.1663
The p-value associated with F = 8.95 (2 degrees of freedom numerator and 26 denominator) is .
001. With a p-value < α =.05, the addition of the two independent variables is statistically
significant.
13. a.
A portion of the Minitab output follows:
The regression equation is
Earnings ($1000) = 14528 - 7640 Putting Avg.
Predictor
Constant
Putting Avg.
Coef
14528
-7640
SE Coef
4410
2456
S = 373.671
R-Sq = 25.7%
T
3.29
-3.11
P
0.003
0.004
R-Sq(adj) = 23.0%
Analysis of Variance
Source
Regression
Residual Error
Total
b.
DF
1
28
29
SS
1350901
3909645
5260546
MS
1350901
139630
F
9.67
P
0.004
A portion of the Minitab output follows:
The regression equation is
Earnings ($1000) = 5214 + 6873 Greens in Reg. - 5623 Putting Avg.
+ 2217 Sand Saves
Predictor
Constant
Greens in Reg.
Putting Avg.
Sand Saves
S = 265.305
Coef
5214
6873
-5623
2216.6
SE Coef
3757
1871
1795
599.2
R-Sq = 65.2%
T
1.39
3.67
-3.13
3.70
P
0.177
0.001
0.004
0.001
R-Sq(adj) = 61.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
3
26
29
SS
3430493
1830053
5260546
MS
1143498
70387
16 15
F
16.25
P
0.000
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
c.
SSE(reduced) = 3,909,645
SSE(full) = 1,830,053
MSE(full) = 70,387
SSE(reduced) - SSE(full) 3,909,645 - 1,830,053
2
F number of extra terms
14.773
MSE(full)
70,387
The p-value associated with F = 16.25 (2 degrees of freedom numerator and 26 denominator) is .
000. With a p-value < α =.05, the addition of the two independent variables is statistically
significant.
d.
A portion of the Minitab output follows:
The regression equation is
Earnings ($1000) = 36697 - 501 Scoring Avg.
Predictor
Constant
Scoring Avg.
Coef
36697
-501.20
SE Coef
5909
82.53
S = 284.751
R-Sq = 56.8%
T
6.21
-6.07
P
0.000
0.000
R-Sq(adj) = 55.3%
Analysis of Variance
Source
Regression
Residual Error
Total
14. a.
DF
1
28
29
SS
2990221
2270325
5260546
MS
2990221
81083
F
36.88
P
0.000
Because the equation developed in part (b) provides a better fit, it is preferred over the equation
developed in part (d).
The Minitab output is shown below:
Risk = - 111 + 1.32 Age + 0.296 Pressure
Predictor
Constant
Age
Pressure
Coef
-110.94
1.3150
0.29640
S = 6.908
SE Coef
16.47
0.1733
0.05107
R-Sq = 80.6%
T
-6.74
7.59
5.80
P
0.000
0.000
0.000
R-Sq(adj) = 78.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
DF
1
DF
2
17
19
SS
3379.6
811.3
4190.9
MS
1689.8
47.7
F
35.41
Seq SS
1772.0
16 16
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
P
0.000
Regression Analysis: Model Building
Pressure
1
1607.7
Unusual Observations
Obs
Age
Risk
17
66.0
8.00
Fit
25.05
SE Fit
1.67
Residual
-17.05
St Resid
-2.54R
R denotes an observation with a large standardized residual
b.
The Minitab output is shown below:
Risk = - 123 + 1.51 Age + 0.448 Pressure + 8.87 Smoker 0.00276 AgePress
Predictor
Constant
Age
Pressure
Smoker
AgePress
Coef
-123.16
1.5130
0.4483
8.866
-0.002756
S = 5.881
SE Coef
56.94
0.7796
0.3457
3.074
0.004807
R-Sq = 87.6%
T
-2.16
1.94
1.30
2.88
-0.57
P
0.047
0.071
0.214
0.011
0.575
R-Sq(adj) = 84.3%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Pressure
Smoker
AgePress
DF
1
1
1
1
DF
4
15
19
SS
3672.11
518.84
4190.95
F
26.54
Fit
20.91
SE Fit
2.01
Residual
-12.91
St Resid
-2.34R
R denotes an observation with a large standardized residual
SSE(reduced) - SSE(full) 811.3 518.84
# extra terms
2
F
4.23
MSE(full)
34.59
The pvalue associated with F = 4.23 (2 numerator and 15 denominator DF) is .000
Because pvalue ≤ α = .05, the addition of the two terms is significant.
15. a.
P
0.000
Seq SS
1771.98
1607.66
281.10
11.37
Unusual Observations
Obs
Age
Risk
17
66.0
8.00
c.
MS
918.03
34.59
A portion of the Minitab output follows:
The regression equation is
ERA = - 0.253 + 0.453 H/9
Predictor
Coef
SE Coef
T
16 17
P
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Constant
H/9
-0.2535
0.45271
S = 0.466619
0.7351
0.08347
-0.34
5.42
R-Sq = 38.0%
0.732
0.000
R-Sq(adj) = 36.7%
Analysis of Variance
Source
Regression
Residual Error
Total
b.
DF
1
48
49
SS
6.4044
10.4512
16.8556
MS
6.4044
0.2177
F
29.41
P
0.000
A portion of the Minitab output follows:
The regression equation is
ERA = - 2.56 + 0.512 H/9 + 0.980 HR/9 + 0.340 BB/9
Predictor
Constant
H/9
HR/9
BB/9
Coef
-2.5639
0.51213
0.9799
0.34000
S = 0.285210
SE Coef
0.5383
0.05506
0.1657
0.05067
R-Sq = 77.8%
T
-4.76
9.30
5.91
6.71
P
0.000
0.000
0.000
0.000
R-Sq(adj) = 76.4%
Analysis of Variance
Source
Regression
Residual Error
Total
c.
DF
3
46
49
SS
13.1137
3.7419
16.8556
MS
4.3712
0.0813
F
53.74
P
0.000
SSE(reduced) = 10.4512
SSE(full) = 3.7419
MSE(full) = .0813
SSE(reduced) - SSE(full) 10.4512 - 3.7419
2
F number of extra terms
41.26
MSE(full)
.0813
The p-value associated with F = 41.26 (2 degrees of freedom numerator and 46 denominator) is .
000. With a p-value < α =.05, the addition of the two independent variables is statistically
significant.
16. a.
The sample correlation coefficients are as follows:
Age
Educ
Married
Weeks
0.577
0.000
Age
0.007
0.962
0.100
0.490
-0.130
0.370
-0.209
0.145
Educ
Married
Head
Tenure
-0.151
0.296
16 18
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Manager
Regression Analysis: Model Building
Head
-0.205
0.153
0.027
0.854
-0.156
0.280
-0.449
0.001
0.398
0.004
0.459
0.001
0.174
0.228
-0.057
0.692
-0.046
0.750
Manager
-0.198
0.167
0.097
0.504
0.160
0.266
0.073
0.616
-0.200
0.164
-0.113
0.435
Sales
-0.134
0.354
0.137
0.343
0.124
0.393
-0.148
0.306
-0.013
0.926
0.097
0.504
Tenure
-0.156
0.279
Cell Contents: Pearson correlation
P-Value
The independent variable most correlated with Weeks is Age. The Minitab output corresponding to
using Age as the independent variable is shown below:
The regression equation is
Weeks = - 8.9 + 1.51 Age
Predictor
Constant
Age
Coef
-8.86
1.5092
S = 19.5342
SE Coef
11.01
0.3080
T
-0.80
4.90
R-Sq = 33.3%
P
0.425
0.000
R-Sq(adj) = 32.0%
Analysis of Variance
Source
Regression
Residual Error
Total
b.
DF
1
48
49
SS
9161.4
18316.1
27477.5
MS
9161.4
381.6
F
24.01
P
0.000
The Minitab Stepwise Regression output is shown below.
Alpha-to-Enter: 0.05
Alpha-to-Remove: 0.05
Response is Weeks on 7 predictors, with N = 50
Step
Constant
Age
T-Value
P-Value
1
-8.86002
2
-9.09741
3
-0.10922
4
-0.06890
1.51
4.90
0.000
1.57
5.30
0.000
1.61
5.74
0.000
1.73
6.51
0.000
16 19
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Manager
T-Value
P-Value
-20.1
-2.26
0.029
Head
T-Value
P-Value
-24.6
-2.88
0.006
-28.7
-3.53
0.001
-14.3
-2.61
0.012
-15.1
-2.95
0.005
Sales
T-Value
P-Value
-17.4
-2.79
0.008
S
R-Sq
R-Sq(adj)
Mallows C-p
19.5
33.34
31.95
22.5
18.7
39.87
37.31
17.8
17.7
47.64
44.22
11.8
16.5
55.38
51.41
5.9
The results suggest a model using four independent variables: Age, Manager, Head, and Sales. The
corresponding Minitab output is shown below:
The regression equation is
Weeks = - 0.07 + 1.73 Age - 28.7 Manager - 15.1 Head - 17.4 Sales
Predictor
Constant
Age
Manager
Head
Sales
Coef
-0.069
1.7252
-28.672
-15.086
-17.421
S = 16.5069
SE Coef
9.843
0.2651
8.117
5.121
6.236
R-Sq = 55.4%
T
-0.01
6.51
-3.53
-2.95
-2.79
P
0.994
0.000
0.001
0.005
0.008
R-Sq(adj) = 51.4%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
4
45
49
SS
15216.0
12261.5
27477.5
MS
3804.0
272.5
F
13.96
P
0.000
c.
The results using Minitab’s Forward Selection procedure are the same as the results using Minitab’s
Stepwise procedure in part (b).
d.
The results using Minitab’s Backward Elimination procedure are shown below:
Backward elimination.
Alpha-to-Remove: 0.05
Response is Weeks on 7 predictors, with N = 50
Step
Constant
1
22.85070
2
13.62308
3
13.06817
16 20
4
-0.06890
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
Age
T-Value
P-Value
1.51
4.96
0.000
1.52
5.04
0.000
1.64
6.18
0.000
Educ
T-Value
P-Value
-0.61
-0.66
0.516
Married
T-Value
P-Value
1.73
6.51
0.000
-10.7
-1.79
0.081
-9.9
-1.69
0.098
-9.8
-1.69
0.099
Head
T-Value
P-Value
-19.8
-3.39
0.002
-19.0
-3.35
0.002
-19.4
-3.44
0.001
-15.1
-2.95
0.005
Tenure
T-Value
P-Value
0.43
0.91
0.366
0.37
0.82
0.418
Manager
T-Value
P-Value
-26.7
-3.21
0.003
-27.7
-3.40
0.001
-29.0
-3.64
0.001
-28.7
-3.53
0.001
Sales
T-Value
P-Value
-18.6
-2.96
0.005
-19.0
-3.06
0.004
-19.0
-3.07
0.004
-17.4
-2.79
0.008
S
R-Sq
R-Sq(adj)
Mallows C-p
16.3
59.14
52.33
8.0
16.2
58.72
52.96
6.4
16.2
58.08
53.32
5.1
16.5
55.38
51.41
5.9
These results also suggest using the model with four independent variables: Age, Head, Manager, and
Sales.
e.
The results using Mintab’s BestSubset procedure are shown below:
M
a
r
16 21
M
T a
e n S
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Vars
1
1
2
2
3
3
4
4
5
5
6
6
7
R-Sq
33.3
15.8
39.9
38.2
47.6
46.8
55.4
49.1
58.1
56.0
58.7
58.3
59.1
R-Sq(adj)
32.0
14.0
37.3
35.6
44.2
43.3
51.4
44.6
53.3
51.0
53.0
52.5
52.3
Mallows
C-p
22.5
40.6
17.8
19.5
11.8
12.7
5.9
12.3
5.1
7.3
6.4
6.8
8.0
S
19.534
21.954
18.749
19.005
17.686
17.831
16.507
17.628
16.179
16.582
16.241
16.318
16.350
E r H
A d i e
g u e a
e c d d
X
n
u
r
e
a
g
e
r
a
l
e
s
X
X
X
X
X
X
X
X
X
X
X
X
X
X X X
X X X
X
X
X
X
X
X
X
X
X
X
X
X X X
X X X
X
X
X X X
X
X
X
X
X
X
X
The results suggest a model using five independent variables: Age, Married, Head, Manager, and Sales.
The corresponding Minitab output is shown below:
The regression equation is
Weeks = 13.1 + 1.64 Age - 9.76 Married - 19.4 Head - 29.0 Manager - 19.0
Sales
Predictor
Constant
Age
Married
Head
Manager
Sales
Coef
13.07
1.6369
-9.764
-19.405
-28.986
-18.967
S = 16.1794
SE Coef
12.40
0.2651
5.794
5.636
7.958
6.181
R-Sq = 58.1%
T
1.05
6.18
-1.69
-3.44
-3.64
-3.07
P
0.298
0.000
0.099
0.001
0.001
0.004
R-Sq(adj) = 53.3%
Analysis of Variance
Source
Regression
Residual Error
Total
17.
DF
5
44
49
SS
15959.5
11518.0
27477.5
MS
3191.9
261.8
F
12.19
P
0.000
The output obtained using Minitab’s Best Subset Regression is shown below:
16 22
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
Response is Scoring Avg.
D
r
i
v
e
Vars
1
1
2
2
3
3
4
4
5
R-Sq
38.7
33.0
58.3
53.9
63.7
60.3
72.0
64.7
72.9
R-Sq(adj)
36.5
30.7
55.2
50.5
59.5
55.7
67.5
59.0
67.2
Mallows
C-p
28.3
33.3
12.9
16.8
10.2
13.2
4.8
11.3
6.0
S
0.51060
0.53350
0.42897
0.45059
0.40781
0.42659
0.36514
0.41015
0.36672
G
r
e
e
n
s
P
u
t
t
i
A i n
v n g
e
r R A
a e v
g g g
e . .
X
X
X X
X
X X
X
X X X
X
X
X X X
S
a
n
d
S
a
v
e
s
D
r
i
v
e
G
r
e
e
n
s
X
X
X X
X
X X
X X
The Best Subset Regression output indicates that a model using four independent variables, Drive
Average, Greens in Reg., Putting Average, and DriveGreens, may be a good choice. The Minitab
output for this model is shown below:
The regression equation is
Scoring Avg. = - 88.1 + 0.591 Drive Average + 209 Greens in Reg.
+ 9.74 Putting Avg. - 0.868 DriveGreens
Predictor
Constant
Drive Average
Greens in Reg.
Putting Avg.
DriveGreens
S = 0.365139
Coef
-88.10
0.5907
209.19
9.736
-0.8677
SE Coef
42.20
0.1692
62.85
2.575
0.2478
R-Sq = 72.0%
T
-2.09
3.49
3.33
3.78
-3.50
P
0.047
0.002
0.003
0.001
0.002
R-Sq(adj) = 67.5%
Analysis of Variance
Source
Regression
Residual Error
Total
18. a.
DF
4
25
29
SS
8.5703
3.3332
11.9035
MS
2.1426
0.1333
F
16.07
P
0.000
Because the independent variable most highly correlated with RPG is OBP, it
will provide the best one-variable estimated regression equation. The Minitab
output using OBP to predict RPG is shown below:
The regression equation is
RPG = - 4.05 + 27.6 OBP
Predictor
Constant
OBP
Coef
-4.049
27.555
S = 0.956308
SE Coef
1.006
3.103
R-Sq = 81.4%
T
-4.02
8.88
P
0.001
0.000
R-Sq(adj) = 80.4%
16 23
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 16
Analysis of Variance
Source
Regression
Residual Error
Total
b.
DF
1
18
19
SS
72.108
16.461
88.569
MS
72.108
0.915
F
78.85
P
0.000
The output using Minitab’s Stepwise Regression procedure using Alpha-to-Enter = 0.05 and
Alpha-to-Remove = 0.05 is shown below:
Alpha-to-Enter: 0.05
Alpha-to-Remove: 0.05
Response is RPG on 12 predictors, with N = 20
Step
Constant
OBP
T-Value
P-Value
1
-4.0491
2
-1.5951
3
-0.9808
27.6
8.88
0.000
17.2
5.10
0.000
25.1
6.88
0.000
0.071
4.16
0.001
0.069
5.06
0.000
HR
T-Value
P-Value
AVG
T-Value
P-Value
S
R-Sq
R-Sq(adj)
Mallows C-p
-12.6
-3.23
0.005
0.956
81.41
80.38
66.6
0.693
90.78
89.70
27.0
0.556
94.43
93.38
12.8
Using less sensitive values for Alpha-to-Enter and Alpha-to-Remove will provide a model with
additional independent variables. For example, the output using Minitab’s Stepwise Regression
procedure using Alpha-to-Enter = 0.10 and Alpha-to-Remove = 0.10 is shown below:
Alpha-to-Enter: 0.1
Alpha-to-Remove: 0.1
Response is RPG on 12 predictors, with N = 20
Step
Constant
OBP
T-Value
P-Value
HR
T-Value
P-Value
AVG
T-Value
P-Value
1
-4.0491
2
-1.5951
3
-0.9808
4
-0.6161
5
-0.9088
27.6
8.88
0.000
17.2
5.10
0.000
25.1
6.88
0.000
26.6
7.64
0.000
32.2
9.40
0.000
0.071
4.16
0.001
0.069
5.06
0.000
0.068
5.34
0.000
0.109
6.26
0.000
-12.6
-3.23
0.005
-16.5
-3.96
0.001
-21.5
-5.65
0.000
0.182
1.88
0.079
0.244
2.99
0.010
3B
T-Value
P-Value
BB
-0.0223
16 24
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Regression Analysis: Model Building
T-Value
P-Value
-2.92
0.011
S
R-Sq
R-Sq(adj)
Mallows C-p
0.956
81.41
80.38
66.6
0.693
90.78
89.70
27.0
0.556
94.43
93.38
12.8
0.516
95.49
94.29
10.0
0.421
97.20
96.20
4.5
The following output using Minitab’s Best Subset procedure also confirms that a variety of
models will provide a good fit.
Vars
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
R-Sq
81.4
78.9
90.8
88.4
94.4
94.4
95.8
95.5
97.2
97.2
97.6
97.5
98.2
98.2
98.3
98.3
98.4
98.4
98.4
98.4
98.4
98.4
98.4
R-Sq(adj)
80.4
77.7
89.7
87.0
93.4
93.3
94.6
94.3
96.2
96.2
96.6
96.4
97.2
97.1
97.1
97.0
97.0
96.9
96.7
96.7
96.3
96.2
95.7
Mallows
C-p
66.6
77.9
27.0
37.8
12.8
13.0
8.8
10.0
4.5
4.6
4.5
5.1
3.9
4.1
5.3
5.7
7.1
7.3
9.0
9.0
11.0
11.0
13.0
R
O S A
2 3 H B B S S C B L V
H B B R I B O B S P G G
X
X
X
X
X X
X
X
X
X X X
X X X X
X X
X
X
X X
X
X
X
X X
X
X
X
X X
X
X X X
X X
X X
X
X
X X
X X
X X X
X X
X
X X X X
X
X X
X X
X X X
X
X X
X
X X X X
X
X X X
X X
X X X
X X X X
X X
X X X
X
X X X
X X X X X X
X
X X X X X X
X X X
X X X X X X X X
X X X
X X X X X
X X X X X X
X X X X X X X X X X X X
S
0.95631
1.0192
0.69299
0.77872
0.55552
0.55820
0.50014
0.51589
0.42096
0.42336
0.40042
0.41198
0.36245
0.36664
0.36471
0.37269
0.37506
0.38077
0.39477
0.39496
0.41758
0.41848
0.44629
It would be hard to make an argument that there is one best model given these results. The five
variable model identified using Minitab’s Stepwise Regression procedure with Alpha-to-Enter =
0.10 and Alpha-to-Remove = 0.10 seems like a reasonable choice. The Minitab regression output
corresponding to this model is shown below:
The regression equation is
RPG = - 0.909 + 32.2 OBP + 0.109 HR - 21.5 AVG + 0.244 3B - 0.0223 BB
Predictor
Constant
OBP
HR
AVG
3B
BB
Coef
-0.9088
32.184
0.10877
-21.511
0.24388
-0.022306
S = 0.420960
SE Coef
0.6169
3.423
0.01739
3.810
0.08168
0.007638
R-Sq = 97.2%
T
-1.47
9.40
6.26
-5.65
2.99
-2.92
P
0.163
0.000
0.000
0.000
0.010
0.011
R-Sq(adj) = 96.2%
Analysis of Variance
Source
Regression
Residual Error
DF
5
14
SS
86.088
2.481
MS
17.218
0.177
16 25
F
97.16
P
0.000
© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.