Statistics for Business and Economics chapter 16 Regression Analysis: Model Building

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.13 MB, 38 trang )

Chapter 16
Regression Analysis: Model Building

Learning Objectives
1.

Learn how the general linear model can be used to model problems involving curvilinear
relationships.

2.

Understand the concept of interaction and how it can be accounted for in the general linear model.

3.

Understand how an F test can be used to determine when to add or delete one or more variables.

4.

Develop an appreciation for the complexities involved in solving larger regression analysis
problems.

5.

Understand how variable selection procedures can be used to choose a set of independent variables
for an estimated regression equation.

6.

Learn how analysis of variance and experimental design problems can be analyzed using a regression
model.

7.

Know how the DurbinWatson test can be used to test for autocorrelation.

16 - 1

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16

Solutions:
1.

a.

b.

The Minitab output is shown below:
The regression equation is
Y = 6.8 + 1.23 X

Predictor       Coef     SE Coef          T        p
Constant       6.77       14.17      0.48    0.658
X             1.2296      0.4697       2.62    0.059

S = 7.269       Rsq = 63.1%     Rsq(adj) = 53.9%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
Regression     1      362.13      362.13      6.85    0.059
Residual Error 4      211.37       52.84
Total          5      573.50
Since the pvalue corresponding to F = 6.85 is 0.59 >  the relationship is not significant.

c.

The scatter diagram suggests that a curvilinear relationship may be appropriate.
d.

The Minitab output is shown below:

16 2

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
The regression equation is
Y = 169 + 12.2 X 0.177 XSQ

Predictor       Coef     SE Coef          T        p
Constant     168.88       39.79      4.24    0.024

X             12.187       2.663       4.58    0.020
XSQ         0.17704     0.04290      4.13    0.026

S = 3.248       Rsq = 94.5%     Rsq(adj) = 90.8%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
Regression     2      541.85      270.92     25.68    0.013
Residual Error 3       31.65       10.55
Total          5      573.50

e.   Since the pvalue corresponding to F = 25.68 is .013 <   the relationship is significant.

2.

f.

y�= 168.88 + 12.187(25) 0.17704(25)2 = 25.145

a.

The Minitab output is shown below:
The regression equation is
Y = 9.32 + 0.424 X

Predictor       Coef     SE Coef          T        p
Constant       9.315       4.196       2.22    0.113
X             0.4242      0.1944       2.18    0.117

S = 3.531       Rsq = 61.4%     Rsq(adj) = 48.5%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
Regression     1       59.39       59.39      4.76    0.117
Residual Error 3       37.41       12.47
Total          4       96.80

The high pvalue (.117) indicates a weak relationship; note that 61.4% of the variability in y has
been explained by x.

b.

The Minitab output is shown below:
The regression equation is
Y = 8.10 + 2.41 X 0.0480 XSQ

Predictor       Coef     SE Coef          T        p
Constant      8.101       4.104      1.97    0.187
X             2.4127      0.4409       5.47    0.032
XSQ         0.04797     0.01050      4.57    0.045

S = 1.279       Rsq = 96.6%     Rsq(adj) = 93.2%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
16 3

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Regression     2      93.529      46.765     28.60    0.034
Residual Error 2       3.271       1.635
Total          4      96.800

At the .05 level of significance, the relationship is significant; the fit is excellent.
y�= 8.101 + 2.4127(20) 0.04797(20)2 = 20.965

a.

The scatter diagram shows some evidence of a possible linear relationship.

b.

The Minitab output is shown below:

The regression equation is
Y = 2.32 + 0.637 X

Predictor       Coef     SE Coef          T        p
Constant       2.322       1.887       1.23    0.258
X             0.6366      0.3044       2.09    0.075

S = 2.054       Rsq = 38.5%     Rsq(adj) = 29.7%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
Regression     1      18.461      18.461      4.37    0.075
Residual Error 7      29.539       4.220
Total          8      48.000

c. The following  standardized residual plot indicates that the constant variance assumption is
not satisfied.
2.0
1.5
Standardized Residual

3.

c.

1.0
0.5
0.0
-0.5
-1.0
-1.5
3

d.

4

5

6
Fitted Value

7

8

The logarithmic transformation does not appear to eliminate the wedgedshaped pattern in the
above residual plot.  The reciprocal transformation does, however, remove the wedgeshaped
16 4

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
pattern.  Neither transformation provides a good fit.  The Minitab output for the reciprocal
transformation and the corresponding standardized residual pot are shown below.
The regression equation is
1/Y = 0.275 0.0152 X

Predictor       Coef     SE Coef          T        p
Constant     0.27498     0.04601       5.98    0.000
X          0.015182    0.007421      2.05    0.080

S = 0.05009     Rsq = 37.4%     Rsq(adj) = 28.5%

Analysis of Variance

SOURCE        DF          SS          MS         F        p

Regression     1    0.010501    0.010501      4.19    0.080
Residual Error 7    0.017563    0.002509
Total          8    0.028064
2.0

Standardized Residual

1.5
1.0
0.5
0.0
-0.5
-1.0
0.150

4.

a.

0.175

0.200
Fitted Value

0.225

0.250

The Minitab output is shown below:
The regression equation is
Y = 943 + 8.71 X

Predictor       Coef     SE Coef          T        p
Constant      943.05       59.38      15.88    0.000
X              8.714       1.544       5.64    0.005

S = 32.29       Rsq = 88.8%     Rsq(adj) = 86.1%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
Regression     1       33223       33223     31.86    0.005
Residual Error 4        4172        1043
Total          5       37395
16 5

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16

b. pvalue = .005 <  = .01; reject H0
5.

The Minitab output is shown below:
The regression equation is
Y = 433 + 37.4 X 0.383 XSQ

Predictor       Coef     SE Coef          T        p
Constant       432.6       141.2       3.06    0.055
X             37.429       7.807       4.79    0.017
XSQ          0.3829      0.1036      3.70    0.034

S = 15.83       Rsq = 98.0%     Rsq(adj) = 96.7%

Analysis of Variance

SOURCE        DF          SS          MS         F        p
Regression     2       36643       18322     73.15    0.003
Residual Error 3         751         250
Total          5       37395

b. Since the linear relationship was significant (Exercise 4), this relationship must be significant.  Note
also that since the pvalue of .003 <  = .05, we can reject H0.

6.

c.

The fitted value is 1302.01, with a standard deviation of 9.93.  The 95% confidence interval is
1270.41 to 1333.61; the 95% prediction interval is 1242.55 to 1361.47.

a.

The scatter diagram is shown below:

b.

No; the relationship appears to be curvilinear.

c.

Several possible models can be fitted to these data, as shown below:
y�= 2.90 0.185x + .00351x2 Ra2 .91
16 6

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building

1
y� 0.0468  14.4   Ra2 .91
 x

a.

The scatter diagram is shown below:
250

200
Weekday Ridership

7.

150

100

50

0
0

20

40

60

80

Miles of Track

A simple linear regression model does not appear to be appropriate. There appears to be a
curvilinear relationship between the two variables.
b.

The Minitab output is shown below:
The regression equation is
Riders = - 7.9 + 2.14 Miles
Predictor
Constant
Miles
S = 41.45

Coef
-7.95
2.1362

SE Coef
19.00
0.5183

R-Sq = 53.1%

T
-0.42
4.12

P
0.682
0.001

R-Sq(adj) = 50.0%

Analysis of Variance
Source
Regression
Residual Error

DF
1
15

SS

29187
25769

16 7

MS
29187
1718

F
16.99

P
0.001

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Total

16

54956

Unusual Observations
Obs
Miles
Riders

8
51.0
231.0

Fit
101.0

SE Fit
14.4

Residual
130.0

St Resid

3.34R
R denotes an observation with a large standardized residual

The corresponding standardized residual plot is shown below:
4

Standardized Residual

3

2

1

0

-1

-2

0

20

40

60

80
100
Fitted Value

120

140

160

There is an unusual trend in the points. There is also some indication that the variance may not be
constant.
c.

The Minitab output is shown below:
The regression equation is
lnRiders = 2.59 + 0.0357 Miles

Predictor
Constant
Miles

Coef
2.5864
0.035670

S = 0.5267

SE Coef
0.2415
0.006586

R-Sq = 66.2%

T
10.71
5.42

P
0.000
0.000

R-Sq(adj) = 63.9%

Analysis of Variance
Source
Regression
Residual Error

Total

DF
1
15
16

SS
8.1376
4.1612
12.2988

16 8

MS
8.1376
0.2774

F
29.33

P
0.000

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building

Unusual Observations
Obs
Miles
lnRiders
8
51.0
5.442

Fit
4.406

SE Fit
0.183

Residual
1.037

St Resid

2.10R
R denotes an observation with a large standardized residual

The corresponding standardized residual plot is shown below:

Standardized Residual

2

1

0

-1

-2

3.0

3.5

4.0
Fitted Value

4.5

5.0

5.5

The standardized residual plot indicates that the transformation has eliminated the problem
identified in the residual plot constructed in part (b).
d.

The Minitab output is shown below:
The regression equation is
1/Riders = 0.0656 -0.000998 Miles
Predictor
Constant
Miles

Coef
0.06564
-0.0009976

S = 0.02330

SE Coef
0.01068
0.0002914

R-Sq = 43.9%

T
6.14
-3.42

P
0.000
0.004

R-Sq(adj) = 40.1%

Analysis of Variance
Source
Regression
Residual Error

DF
1
15

SS
0.0063645
0.0081456

16 9

MS
0.0063645
0.0005430

F
11.72

P
0.004

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Total

16

0.0145101

Unusual Observations
Obs

Miles
1/Riders
17
9.0
0.12500

Fit
0.05666

SE Fit
0.00857

Residual
0.06834

St Resid

3.15R
R denotes an observation with a large standardized residual

The corresponding standardized residual plot is shown below:

Standardized Residual

3

2

1

0

-1
-0.01

8.

0.00

0.01

0.02
0.03
Fitted Value

0.04

0.05

0.06

e.

The standardized residual plot corresponding to the reciprocal transformation indicates an unusual
pattern that is not evident in the standardized residual plot for the logarithmic transformation. The
estimated regression equation for the logarithmic transformation also provides a better fit. We
recommend using the logarithmic transformation.

a.

The scatter diagram is shown below:

16 10

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building

A simple linear regression model does not appear to be appropriate. There appears to be a
curvilinear relationship between the two variables.

b.

The Minitab output is shown below:
The regression equation is
Price = 33829 - 4571 Rating + 154 RatingSq
Predictor
Constant
Rating
RatingSq

Coef
33829
-4571
153.55

S = 668.312

SE Coef
13657
1688
51.72

R-Sq = 70.2%

T
2.48
-2.71
2.97

P
0.029
0.019
0.012

R-Sq(adj) = 65.3%

Analysis of Variance
Source
Regression
Residual Error
Total

c.

DF
2
12

14

SS
12643604
5359686
18003290

MS
6321802
446641

F
14.15

P
0.001

The Minitab output is shown below:
The regression equation is
logPrice = - 10.2 + 10.4 logRating
Predictor
Constant
logRating

Coef
-10.152
10.422

S = 0.283438

SE Coef
1.890
1.544

R-Sq = 77.8%

T
-5.37
6.75

P
0.000
0.000

R-Sq(adj) = 76.1%

16 11

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Analysis of Variance

d.
9.

Source
Regression

Residual Error
Total

DF
1
13
14

SS
3.6595
1.0444
4.7038

MS
3.6595
0.0803

F
45.55

P
0.000

The model in part (c) is preferred because it provides a better fit.

a.

A simple linear regression model appears to be appropriate.
b.

Note the line drawn through the data. This line indicates a possible curvilinar relationship
between these two variables.
c.

In the Minitab output that follows IndexSq denotes the square of the Cost-of-Living Index.
The regression equation is

16 12

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
Creative Class (%) = 49.2 - 0.673 Cost-of-Living Index + 0.00282 IndexSq
+ 0.404 Income
Predictor
Constant
Cost-of-Living Index
IndexSq
Income
S = 2.70934

Coef
49.24
-0.6725
0.002821
0.40418

R-Sq = 64.4%

SE Coef
17.25
0.2888
0.001223
0.06772

T
2.85
-2.33
2.31
5.97

P
0.006
0.024
0.026
0.000

R-Sq(adj) = 62.0%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
3
46

49

SS
609.96
337.67
947.62

MS
203.32
7.34

F
27.70

P
0.000

At the .05 level of significance there is overall significance. And, each of the three independent
variables (Cost-of-Living Index, IndexSq, and Income) is significant.
d.

Cost-of-Living Index = 99, IndexSq = 9801, and Income = 42.984
Estimate = 49.24 - .6725(99)+.002821(9801) + .40418(42.984)= 27.7%
The primary concern of using this estimate is that the estimated regression equation was developed for
metropolitan areas with a population of 1,000,000 or more. But, the population for Tucson is so close to
1,000,000 that the estimated regression equation should still provide a good estimate. Note to Instructor: the
actual value of the percentage of the workforce in creative fields for Tucson reported by Kiplinger was
31.1%.

10. a.

SSR = SST SSE = 1030
MSR = 1030

MSE = 520/25 = 20.8

F = 1030/20.8 = 49.52

Using Excel or Minitab, the pvalue corresponding to F = 49.52 is .000.
Because pvalue ≤ α, x1 is significant.

b.

F

(520  100) / 2
48.3
100 / 23

Using Excel or Minitab, the pvalue corresponding to F = 48.3 is .000.
Because pvalue ≤ α, the addition of variables x2 and x3 is significant.
11. a.

SSE = SST SSR = 1805 1760 = 45
MSR = 1760/4 = 440

MSE =45/25 = 1.8

F = 440/1.8 = 244.44

Using Excel or Minitab, the pvalue corresponding to F = 244.44 is .000.

16 13

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Because pvalue ≤ α, the overall relationship is significant.
b.

SSE(x1, x2, x3, x4) = 45

c.

d.

SSE(x2, x3) = 1805 1705 = 100

(100  45) / 2
15.28
1.8
Using Excel or Minitab, the pvalue corresponding to F = 15.28 is .000.
F

Because pvalue ≤ α, x1 and x4 contribute significantly to the model.
12. a.

A portion of the Minitab output follows:
The regression equation is
Scoring Avg. = 46.3 + 14.1 Putting Avg.
Predictor
Constant
Putting Avg.
S = 0.510596

Coef
46.277
14.103

SE Coef
6.026
3.356

T
7.68
4.20

R-Sq = 38.7%

P
0.000
0.000

R-Sq(adj) = 36.5%

Analysis of Variance

Source
Regression
Residual Error
Total

b.

DF
1
28
29

SS
4.6036
7.2998
11.9035

MS
4.6036
0.2607

F
17.66

P
0.000

A portion of the Minitab output follows:
The regression equation is
Scoring Avg. = 59.0 - 10.3 Greens in Reg. + 11.4 Putting Avg. - 1.81 Sand

Saves
Predictor
Constant
Greens in Reg.
Putting Avg.
Sand Saves
S = 0.407808

Coef
59.022
-10.281
11.413
-1.8130

SE Coef
5.774
2.877
2.760
0.9210

R-Sq = 63.7%

T
10.22
-3.57
4.14
-1.97

P
0.000

0.001
0.000
0.060

R-Sq(adj) = 59.5%

Analysis of Variance
Source
Regression
Residual Error
Total

c.

DF
3
26
29

SS
7.5795
4.3240
11.9035

MS
2.5265
0.1663

F
15.19

P
0.000

SSE(reduced) = 7.2998
16 14

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
SSE(full) = 4.3240
MSE(full) = .1663
SSE(reduced) - SSE(full) 7.2998 - 4.3240
2
F  number of extra terms 
 8.95
MSE(full)
.1663
The p-value associated with F = 8.95 (2 degrees of freedom numerator and 26 denominator) is .
001. With a p-value < α =.05, the addition of the two independent variables is statistically
significant.
13. a.

A portion of the Minitab output follows:
The regression equation is
Earnings ($1000) = 14528 - 7640 Putting Avg.
Predictor
Constant

Putting Avg.

Coef
14528
-7640

SE Coef
4410
2456

S = 373.671

R-Sq = 25.7%

T
3.29
-3.11

P
0.003
0.004

R-Sq(adj) = 23.0%

Analysis of Variance
Source
Regression
Residual Error
Total

b.

DF
1
28
29

SS
1350901
3909645
5260546

MS
1350901
139630

F
9.67

P
0.004

A portion of the Minitab output follows:
The regression equation is
Earnings ($1000) = 5214 + 6873 Greens in Reg. - 5623 Putting Avg.
+ 2217 Sand Saves
Predictor
Constant
Greens in Reg.
Putting Avg.

Sand Saves
S = 265.305

Coef
5214
6873
-5623
2216.6

SE Coef
3757
1871
1795
599.2

R-Sq = 65.2%

T
1.39
3.67
-3.13
3.70

P
0.177
0.001
0.004
0.001

R-Sq(adj) = 61.2%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
3
26
29

SS
3430493
1830053
5260546

MS
1143498
70387

16 15

F
16.25

P
0.000

© 2010 Cengage Learning. All Rights Reserved.

May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16

c.

SSE(reduced) = 3,909,645
SSE(full) = 1,830,053
MSE(full) = 70,387
SSE(reduced) - SSE(full) 3,909,645 - 1,830,053
2
F  number of extra terms 
 14.773
MSE(full)
70,387
The p-value associated with F = 16.25 (2 degrees of freedom numerator and 26 denominator) is .
000. With a p-value < α =.05, the addition of the two independent variables is statistically
significant.

d.

A portion of the Minitab output follows:
The regression equation is
Earnings ($1000) = 36697 - 501 Scoring Avg.
Predictor
Constant
Scoring Avg.

Coef

36697
-501.20

SE Coef
5909
82.53

S = 284.751

R-Sq = 56.8%

T
6.21
-6.07

P
0.000
0.000

R-Sq(adj) = 55.3%

Analysis of Variance
Source
Regression
Residual Error
Total

14. a.

DF

1
28
29

SS
2990221
2270325
5260546

MS
2990221
81083

F
36.88

P
0.000

Because the equation developed in part (b) provides a better fit, it is preferred over the equation
developed in part (d).
The Minitab output is shown below:
Risk = - 111 + 1.32 Age + 0.296 Pressure
Predictor
Constant
Age
Pressure

Coef
-110.94

1.3150
0.29640

S = 6.908

SE Coef
16.47
0.1733
0.05107

R-Sq = 80.6%

T
-6.74
7.59
5.80

P
0.000
0.000
0.000

R-Sq(adj) = 78.4%

Analysis of Variance
Source
Regression
Residual Error
Total
Source

Age

DF
1

DF
2
17
19

SS
3379.6
811.3
4190.9

MS
1689.8
47.7

F
35.41

Seq SS
1772.0
16 16

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

P

0.000

Regression Analysis: Model Building
Pressure

1

1607.7

Unusual Observations
Obs
Age
Risk
17
66.0
8.00

Fit
25.05

SE Fit
1.67

Residual
-17.05

St Resid
-2.54R

R denotes an observation with a large standardized residual
b.

The Minitab output is shown below:
Risk = - 123 + 1.51 Age + 0.448 Pressure + 8.87 Smoker 0.00276 AgePress
Predictor
Constant
Age
Pressure
Smoker
AgePress

Coef
-123.16
1.5130
0.4483
8.866
-0.002756

S = 5.881

SE Coef
56.94
0.7796
0.3457
3.074
0.004807

R-Sq = 87.6%

T
-2.16
1.94
1.30
2.88
-0.57

P
0.047
0.071
0.214
0.011
0.575

R-Sq(adj) = 84.3%

Analysis of Variance
Source
Regression
Residual Error
Total
Source
Age
Pressure
Smoker
AgePress

DF
1
1

1
1

DF
4
15
19

SS
3672.11
518.84
4190.95

F
26.54

Fit
20.91

SE Fit
2.01

Residual
-12.91

St Resid
-2.34R

R denotes an observation with a large standardized residual
SSE(reduced) - SSE(full) 811.3  518.84

# extra terms
2
F

4.23
MSE(full)
34.59
The pvalue associated with F = 4.23 (2 numerator and 15 denominator DF) is .000
Because pvalue ≤ α = .05, the addition of the two terms is significant.

15. a.

P
0.000

Seq SS
1771.98
1607.66
281.10
11.37

Unusual Observations
Obs
Age
Risk
17
66.0
8.00

c.

MS
918.03
34.59

A portion of the Minitab output follows:
The regression equation is
ERA = - 0.253 + 0.453 H/9
Predictor

Coef

SE Coef

T

16 17

P

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Constant
H/9

-0.2535
0.45271

S = 0.466619

0.7351
0.08347

-0.34
5.42

R-Sq = 38.0%

0.732
0.000

R-Sq(adj) = 36.7%

Analysis of Variance
Source
Regression
Residual Error
Total

b.

DF
1
48
49

SS

6.4044
10.4512
16.8556

MS
6.4044
0.2177

F
29.41

P
0.000

A portion of the Minitab output follows:
The regression equation is
ERA = - 2.56 + 0.512 H/9 + 0.980 HR/9 + 0.340 BB/9
Predictor
Constant
H/9
HR/9
BB/9

Coef
-2.5639
0.51213
0.9799
0.34000

S = 0.285210

SE Coef
0.5383
0.05506
0.1657
0.05067

R-Sq = 77.8%

T
-4.76
9.30
5.91
6.71

P
0.000
0.000
0.000
0.000

R-Sq(adj) = 76.4%

Analysis of Variance
Source
Regression
Residual Error
Total

c.

DF
3
46
49

SS
13.1137
3.7419
16.8556

MS
4.3712
0.0813

F
53.74

P
0.000

SSE(reduced) = 10.4512
SSE(full) = 3.7419
MSE(full) = .0813
SSE(reduced) - SSE(full) 10.4512 - 3.7419
2
F  number of extra terms 
 41.26
MSE(full)
.0813

The p-value associated with F = 41.26 (2 degrees of freedom numerator and 46 denominator) is .
000. With a p-value < α =.05, the addition of the two independent variables is statistically
significant.

16. a.

The sample correlation coefficients are as follows:
Age
Educ
Married

Weeks
0.577
0.000

Age

0.007
0.962

0.100
0.490

-0.130
0.370

-0.209
0.145

Educ

Married

Head

Tenure

-0.151
0.296

16 18

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Manager

Regression Analysis: Model Building
Head

-0.205
0.153

0.027
0.854

-0.156
0.280

-0.449
0.001

0.398
0.004

0.459
0.001

0.174
0.228

-0.057
0.692

-0.046
0.750

Manager

-0.198
0.167

0.097
0.504

0.160
0.266

0.073

0.616

-0.200
0.164

-0.113
0.435

Sales

-0.134
0.354

0.137
0.343

0.124
0.393

-0.148
0.306

-0.013
0.926

0.097
0.504

Tenure

-0.156
0.279

Cell Contents: Pearson correlation
P-Value

The independent variable most correlated with Weeks is Age. The Minitab output corresponding to
using Age as the independent variable is shown below:
The regression equation is
Weeks = - 8.9 + 1.51 Age
Predictor
Constant
Age

Coef
-8.86
1.5092

S = 19.5342

SE Coef
11.01
0.3080

T
-0.80
4.90

R-Sq = 33.3%

P
0.425
0.000

R-Sq(adj) = 32.0%

Analysis of Variance
Source
Regression
Residual Error
Total

b.

DF
1
48
49

SS
9161.4
18316.1
27477.5

MS
9161.4
381.6

F
24.01

P
0.000

The Minitab Stepwise Regression output is shown below.
Alpha-to-Enter: 0.05

Alpha-to-Remove: 0.05

Response is Weeks on 7 predictors, with N = 50
Step
Constant
Age
T-Value
P-Value

1
-8.86002

2
-9.09741

3
-0.10922

4
-0.06890

1.51
4.90

0.000

1.57
5.30
0.000

1.61
5.74
0.000

1.73
6.51
0.000

16 19

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Manager
T-Value
P-Value

-20.1
-2.26
0.029

Head

T-Value
P-Value

-24.6
-2.88
0.006

-28.7
-3.53
0.001

-14.3
-2.61
0.012

-15.1
-2.95
0.005

Sales
T-Value
P-Value

-17.4
-2.79
0.008

S
R-Sq
R-Sq(adj)

Mallows C-p

19.5
33.34
31.95
22.5

18.7
39.87
37.31
17.8

17.7
47.64
44.22
11.8

16.5
55.38
51.41
5.9

The results suggest a model using four independent variables: Age, Manager, Head, and Sales. The
corresponding Minitab output is shown below:
The regression equation is
Weeks = - 0.07 + 1.73 Age - 28.7 Manager - 15.1 Head - 17.4 Sales
Predictor
Constant
Age
Manager

Head
Sales

Coef
-0.069
1.7252
-28.672
-15.086
-17.421

S = 16.5069

SE Coef
9.843
0.2651
8.117
5.121
6.236

R-Sq = 55.4%

T
-0.01
6.51
-3.53
-2.95
-2.79

P
0.994

0.000
0.001
0.005
0.008

R-Sq(adj) = 51.4%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
4
45
49

SS
15216.0
12261.5
27477.5

MS
3804.0
272.5

F
13.96

P
0.000

c.

The results using Minitab’s Forward Selection procedure are the same as the results using Minitab’s
Stepwise procedure in part (b).

d.

The results using Minitab’s Backward Elimination procedure are shown below:
Backward elimination.

Alpha-to-Remove: 0.05

Response is Weeks on 7 predictors, with N = 50
Step
Constant

1
22.85070

2
13.62308

3
13.06817

16 20

4
-0.06890

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
Age
T-Value
P-Value

1.51
4.96
0.000

1.52
5.04
0.000

1.64
6.18
0.000

Educ
T-Value
P-Value

-0.61
-0.66

0.516

Married
T-Value
P-Value

1.73
6.51
0.000

-10.7
-1.79
0.081

-9.9
-1.69
0.098

-9.8
-1.69
0.099

Head
T-Value
P-Value

-19.8
-3.39
0.002

-19.0
-3.35
0.002

-19.4
-3.44
0.001

-15.1
-2.95
0.005

Tenure
T-Value
P-Value

0.43
0.91
0.366

0.37
0.82
0.418

Manager
T-Value
P-Value

-26.7
-3.21

0.003

-27.7
-3.40
0.001

-29.0
-3.64
0.001

-28.7
-3.53
0.001

Sales
T-Value
P-Value

-18.6
-2.96
0.005

-19.0
-3.06
0.004

-19.0
-3.07
0.004

-17.4
-2.79
0.008

S
R-Sq
R-Sq(adj)
Mallows C-p

16.3
59.14
52.33
8.0

16.2
58.72
52.96
6.4

16.2
58.08
53.32
5.1

16.5
55.38
51.41
5.9

These results also suggest using the model with four independent variables: Age, Head, Manager, and

Sales.

e.

The results using Mintab’s BestSubset procedure are shown below:
M
a
r

16 21

M
T a
e n S

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16

Vars
1
1
2
2
3
3
4
4

5
5
6
6
7

R-Sq
33.3
15.8
39.9
38.2
47.6
46.8
55.4
49.1
58.1
56.0
58.7
58.3
59.1

R-Sq(adj)
32.0
14.0
37.3
35.6
44.2
43.3
51.4
44.6

53.3
51.0
53.0
52.5
52.3

Mallows
C-p
22.5
40.6
17.8
19.5
11.8
12.7
5.9
12.3
5.1
7.3
6.4
6.8
8.0

S
19.534
21.954
18.749
19.005
17.686
17.831
16.507

17.628
16.179
16.582
16.241
16.318
16.350

E r H
A d i e
g u e a
e c d d
X

n
u
r
e

a
g
e
r

a
l
e
s

X
X

X
X
X
X
X
X
X
X
X
X
X
X X X
X X X

X
X
X

X
X
X
X
X
X
X
X
X X X
X X X
X
X

X X X

X
X
X
X
X
X
X

The results suggest a model using five independent variables: Age, Married, Head, Manager, and Sales.
The corresponding Minitab output is shown below:
The regression equation is
Weeks = 13.1 + 1.64 Age - 9.76 Married - 19.4 Head - 29.0 Manager - 19.0
Sales
Predictor
Constant
Age
Married
Head
Manager
Sales

Coef
13.07
1.6369
-9.764
-19.405
-28.986
-18.967

S = 16.1794

SE Coef
12.40
0.2651
5.794
5.636
7.958
6.181

R-Sq = 58.1%

T
1.05
6.18
-1.69
-3.44
-3.64
-3.07

P
0.298
0.000
0.099
0.001
0.001
0.004

R-Sq(adj) = 53.3%

Analysis of Variance
Source
Regression
Residual Error
Total

17.

DF
5
44
49

SS
15959.5
11518.0
27477.5

MS
3191.9
261.8

F
12.19

P
0.000

The output obtained using Minitab’s Best Subset Regression is shown below:

16 22

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
Response is Scoring Avg.
D
r
i
v
e

Vars
1
1
2
2
3
3
4
4
5

R-Sq
38.7
33.0
58.3
53.9

63.7
60.3
72.0
64.7
72.9

R-Sq(adj)
36.5
30.7
55.2
50.5
59.5
55.7
67.5
59.0
67.2

Mallows
C-p
28.3
33.3
12.9
16.8
10.2
13.2
4.8
11.3
6.0

S

0.51060
0.53350
0.42897
0.45059
0.40781
0.42659
0.36514
0.41015
0.36672

G
r
e
e
n
s

P
u
t
t
i
A i n
v n g
e
r R A
a e v
g g g
e . .
X

X
X X
X
X X
X
X X X
X
X
X X X

S
a
n
d
S
a
v
e
s

D
r
i
v
e
G
r
e
e
n

s

X
X
X X
X
X X
X X

The Best Subset Regression output indicates that a model using four independent variables, Drive
Average, Greens in Reg., Putting Average, and DriveGreens, may be a good choice. The Minitab
output for this model is shown below:
The regression equation is
Scoring Avg. = - 88.1 + 0.591 Drive Average + 209 Greens in Reg.
+ 9.74 Putting Avg. - 0.868 DriveGreens
Predictor
Constant
Drive Average
Greens in Reg.
Putting Avg.
DriveGreens
S = 0.365139

Coef
-88.10
0.5907
209.19
9.736
-0.8677

SE Coef
42.20
0.1692
62.85
2.575
0.2478

R-Sq = 72.0%

T
-2.09
3.49
3.33
3.78
-3.50

P
0.047
0.002
0.003
0.001
0.002

R-Sq(adj) = 67.5%

Analysis of Variance
Source
Regression
Residual Error
Total

18. a.

DF
4
25
29

SS
8.5703
3.3332
11.9035

MS
2.1426
0.1333

F
16.07

P
0.000

Because the independent variable most highly correlated with RPG is OBP, it
will provide the best one-variable estimated regression equation. The Minitab
output using OBP to predict RPG is shown below:
The regression equation is
RPG = - 4.05 + 27.6 OBP
Predictor
Constant

OBP

Coef
-4.049
27.555

S = 0.956308

SE Coef
1.006
3.103

R-Sq = 81.4%

T
-4.02
8.88

P
0.001
0.000

R-Sq(adj) = 80.4%

16 23

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16
Analysis of Variance
Source
Regression
Residual Error
Total

b.

DF
1
18
19

SS
72.108
16.461
88.569

MS
72.108
0.915

F
78.85

P
0.000

The output using Minitab’s Stepwise Regression procedure using Alpha-to-Enter = 0.05 and

Alpha-to-Remove = 0.05 is shown below:
Alpha-to-Enter: 0.05

Alpha-to-Remove: 0.05

Response is RPG on 12 predictors, with N = 20
Step
Constant
OBP
T-Value
P-Value

1
-4.0491

2
-1.5951

3
-0.9808

27.6
8.88
0.000

17.2
5.10
0.000

25.1

6.88
0.000

0.071
4.16
0.001

0.069
5.06
0.000

HR
T-Value
P-Value
AVG
T-Value
P-Value
S
R-Sq
R-Sq(adj)
Mallows C-p

-12.6
-3.23
0.005
0.956
81.41
80.38
66.6

0.693
90.78
89.70
27.0

0.556
94.43
93.38
12.8

Using less sensitive values for Alpha-to-Enter and Alpha-to-Remove will provide a model with
additional independent variables. For example, the output using Minitab’s Stepwise Regression
procedure using Alpha-to-Enter = 0.10 and Alpha-to-Remove = 0.10 is shown below:
Alpha-to-Enter: 0.1

Alpha-to-Remove: 0.1

Response is RPG on 12 predictors, with N = 20
Step
Constant
OBP
T-Value
P-Value
HR
T-Value
P-Value
AVG
T-Value
P-Value

1
-4.0491

2
-1.5951

3
-0.9808

4
-0.6161

5
-0.9088

27.6
8.88
0.000

17.2
5.10
0.000

25.1
6.88
0.000

26.6
7.64
0.000

32.2
9.40
0.000

0.071
4.16
0.001

0.069
5.06
0.000

0.068
5.34
0.000

0.109
6.26
0.000

-12.6
-3.23
0.005

-16.5
-3.96
0.001

-21.5

-5.65
0.000

0.182
1.88
0.079

0.244
2.99
0.010

3B
T-Value
P-Value
BB

-0.0223

16 24

© 2010 Cengage Learning. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Regression Analysis: Model Building
T-Value
P-Value

-2.92
0.011

S
R-Sq
R-Sq(adj)
Mallows C-p

0.956
81.41
80.38
66.6

0.693
90.78
89.70
27.0

0.556
94.43
93.38
12.8

0.516
95.49
94.29
10.0

0.421
97.20
96.20
4.5

The following output using Minitab’s Best Subset procedure also confirms that a variety of
models will provide a good fit.

Vars
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12

R-Sq

81.4
78.9
90.8
88.4
94.4
94.4
95.8
95.5
97.2
97.2
97.6
97.5
98.2
98.2
98.3
98.3
98.4
98.4
98.4
98.4
98.4
98.4
98.4

R-Sq(adj)
80.4
77.7
89.7
87.0
93.4

93.3
94.6
94.3
96.2
96.2
96.6
96.4
97.2
97.1
97.1
97.0
97.0
96.9
96.7
96.7
96.3
96.2
95.7

Mallows
C-p
66.6
77.9
27.0
37.8
12.8
13.0
8.8
10.0
4.5

4.6
4.5
5.1
3.9
4.1
5.3
5.7
7.1
7.3
9.0
9.0
11.0
11.0
13.0

R
O S A
2 3 H B B S S C B L V
H B B R I B O B S P G G
X
X
X
X
X X
X
X
X
X X X
X X X X
X X

X
X
X X
X
X
X
X X
X
X
X
X X
X
X X X
X X
X X
X
X
X X
X X
X X X
X X
X
X X X X
X
X X
X X
X X X
X
X X
X

X X X X
X
X X X
X X
X X X
X X X X
X X
X X X
X
X X X
X X X X X X
X
X X X X X X
X X X
X X X X X X X X
X X X
X X X X X
X X X X X X
X X X X X X X X X X X X

S
0.95631
1.0192
0.69299
0.77872
0.55552
0.55820
0.50014
0.51589
0.42096

0.42336
0.40042
0.41198
0.36245
0.36664
0.36471
0.37269
0.37506
0.38077
0.39477
0.39496
0.41758
0.41848
0.44629

It would be hard to make an argument that there is one best model given these results. The five
variable model identified using Minitab’s Stepwise Regression procedure with Alpha-to-Enter =
0.10 and Alpha-to-Remove = 0.10 seems like a reasonable choice. The Minitab regression output
corresponding to this model is shown below:
The regression equation is
RPG = - 0.909 + 32.2 OBP + 0.109 HR - 21.5 AVG + 0.244 3B - 0.0223 BB
Predictor
Constant
OBP
HR
AVG
3B
BB

Coef

-0.9088
32.184
0.10877
-21.511
0.24388
-0.022306

S = 0.420960

SE Coef
0.6169
3.423
0.01739
3.810
0.08168
0.007638

R-Sq = 97.2%

T
-1.47
9.40
6.26
-5.65
2.99
-2.92

P
0.163
0.000

0.000
0.000
0.010
0.011

R-Sq(adj) = 96.2%

Analysis of Variance
Source
Regression
Residual Error

DF
5
14

SS
86.088
2.481

MS
17.218
0.177

16 25

F
97.16

P

Statistics for Business and Economics chapter 16 Regression Analysis: Model Building

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về