Tải bản đầy đủ (.pdf) (679 trang)

A first course in design and analysis of experiments

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.08 MB, 679 trang )

A First Course in
Design and Analysis
of Experiments



A First Course in
Design and Analysis
of Experiments

Gary W. Oehlert
University of Minnesota


Cover design by Victoria Tomaselli
Cover illustration by Peter Hamlin

Minitab is a registered trademark of Minitab, Inc.
SAS is a registered trademark of SAS Institute, Inc.
S-Plus is a registered trademark of Mathsoft, Inc.
Design-Expert is a registered trademark of Stat-Ease, Inc.

Library of Congress Cataloging-in-Publication Data.

Oehlert, Gary W.
A first course in design and analysis of experiments / Gary W. Oehlert.
p. cm.
Includes bibligraphical references and index.
ISBN 0-7167-3510-5
1. Experimental Design
I. Title


QA279.O34 2000
519.5—dc21
99-059934

Copyright c 2010 Gary W. Oehlert. All rights reserved.
This work is licensed under a “Creative Commons” license. Briefly, you are free to
copy, distribute, and transmit this work provided the following conditions are met:
1. You must properly attribute the work.
2. You may not use this work for commercial purposes.
3. You may not alter, transform, or build upon this work.
A complete description of the license may be found at
/>

For Becky
who helped me all the way through

and for Christie and Erica
who put up with a lot while it was getting done



Contents
Preface
1

2

3

Introduction

1.1
Why Experiment? . . . . . . . .
1.2
Components of an Experiment .
1.3
Terms and Concepts . . . . . . .
1.4
Outline . . . . . . . . . . . . .
1.5
More About Experimental Units
1.6
More About Responses . . . . .

xvii

.
.
.
.
.
.

1
1
4
5
7
8
10


.
.
.
.
.
.
.
.
.

13
14
16
17
19
20
25
26
27
28

Completely Randomized Designs
3.1
Structure of a CRD . . . . . . . . . . . . . . . . . . . . .
3.2
Preliminary Exploratory Analysis . . . . . . . . . . . . .
3.3
Models and Parameters . . . . . . . . . . . . . . . . . . .

31

31
33
34

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.


.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.

.
.
.
.

Randomization and Design
2.1
Randomization Against Confounding . . . . . . . . . .
2.2
Randomizing Other Things . . . . . . . . . . . . . . . .
2.3
Performing a Randomization . . . . . . . . . . . . . . .
2.4
Randomization for Inference . . . . . . . . . . . . . . .
2.4.1 The paired t-test . . . . . . . . . . . . . . . . .
2.4.2 Two-sample t-test . . . . . . . . . . . . . . . .
2.4.3 Randomization inference and standard inference
2.5
Further Reading and Extensions . . . . . . . . . . . . .
2.6
Problems . . . . . . . . . . . . . . . . . . . . . . . . .


viii

CONTENTS

3.4
3.5
3.6

3.7
3.8
3.9
3.10
3.11
3.12
4

5

Estimating Parameters . . . . . . . . . . . .
Comparing Models: The Analysis of Variance
Mechanics of ANOVA . . . . . . . . . . . .
Why ANOVA Works . . . . . . . . . . . . .
Back to Model Comparison . . . . . . . . . .
Side-by-Side Plots . . . . . . . . . . . . . .
Dose-Response Modeling . . . . . . . . . . .
Further Reading and Extensions . . . . . . .
Problems . . . . . . . . . . . . . . . . . . .

Looking for Specific Differences—Contrasts
4.1
Contrast Basics . . . . . . . . . . . . .
4.2
Inference for Contrasts . . . . . . . . .
4.3
Orthogonal Contrasts . . . . . . . . . .
4.4
Polynomial Contrasts . . . . . . . . . .
4.5

Further Reading and Extensions . . . .
4.6
Problems . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.

.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.


.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

39
44
45
52
52
54
55
58
60

.
.
.

.
.
.

65
65
68
71
73
75
75

Multiple Comparisons
77
5.1
Error Rates . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2
Bonferroni-Based Methods . . . . . . . . . . . . . . . . . 81
5.3
The Scheff´e Method for All Contrasts . . . . . . . . . . . 85
5.4
Pairwise Comparisons . . . . . . . . . . . . . . . . . . . . 87
5.4.1 Displaying the results . . . . . . . . . . . . . . . 88
5.4.2 The Studentized range . . . . . . . . . . . . . . . 89
5.4.3 Simultaneous confidence intervals . . . . . . . . . 90
5.4.4 Strong familywise error rate . . . . . . . . . . . . 92
5.4.5 False discovery rate . . . . . . . . . . . . . . . . 96
5.4.6 Experimentwise error rate . . . . . . . . . . . . . 97
5.4.7 Comparisonwise error rate . . . . . . . . . . . . . 98
5.4.8 Pairwise testing reprise . . . . . . . . . . . . . . 98

5.4.9 Pairwise comparisons methods that do not control
combined Type I error rates . . . . . . . . . . . . 98
5.4.10 Confident directions . . . . . . . . . . . . . . . . 100


CONTENTS

5.5

5.6
5.7
5.8
5.9
5.10
6

7

Comparison with Control or the Best
5.5.1 Comparison with a control .
5.5.2 Comparison with the best .
Reality Check on Coverage Rates .
A Warning About Conditioning . . .
Some Controversy . . . . . . . . . .
Further Reading and Extensions . .
Problems . . . . . . . . . . . . . .

ix

.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

Checking Assumptions
6.1
Assumptions . . . . . . . . . . . . . . . . . .
6.2
Transformations . . . . . . . . . . . . . . . . .
6.3
Assessing Violations of Assumptions . . . . . .

6.3.1 Assessing nonnormality . . . . . . . .
6.3.2 Assessing nonconstant variance . . . .
6.3.3 Assessing dependence . . . . . . . . .
6.4
Fixing Problems . . . . . . . . . . . . . . . . .
6.4.1 Accommodating nonnormality . . . .
6.4.2 Accommodating nonconstant variance
6.4.3 Accommodating dependence . . . . .
6.5
Effects of Incorrect Assumptions . . . . . . . .
6.5.1 Effects of nonnormality . . . . . . . .
6.5.2 Effects of nonconstant variance . . . .
6.5.3 Effects of dependence . . . . . . . . .
6.6
Implications for Design . . . . . . . . . . . . .
6.7
Further Reading and Extensions . . . . . . . .
6.8
Problems . . . . . . . . . . . . . . . . . . . .
Power and Sample Size
7.1
Approaches to Sample Size Selection . .
7.2
Sample Size for Confidence Intervals . . .
7.3
Power and Sample Size for ANOVA . . .
7.4
Power and Sample Size for a Contrast . .
7.5
More about Units and Measurement Units


.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.


.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.

.
.
.

.
.
.
.
.
.
.
.

101
101
104
105
106
106
107
108

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

111
111
113
114
115
118
120
124
124
126
133
134
134
136
138
140
141
143

.

.
.
.
.

149
149
151
153
158
158


x

CONTENTS

7.6
7.7
7.8
8

9

Allocation of Units for Two Special Cases . . . . . . . . . 160
Further Reading and Extensions . . . . . . . . . . . . . . 161
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Factorial Treatment Structure
8.1

Factorial Structure . . . . . . . . . . . . . . . . .
8.2
Factorial Analysis: Main Effect and Interaction .
8.3
Advantages of Factorials . . . . . . . . . . . . .
8.4
Visualizing Interaction . . . . . . . . . . . . . .
8.5
Models with Parameters . . . . . . . . . . . . . .
8.6
The Analysis of Variance for Balanced Factorials
8.7
General Factorial Models . . . . . . . . . . . . .
8.8
Assumptions and Transformations . . . . . . . .
8.9
Single Replicates . . . . . . . . . . . . . . . . .
8.10 Pooling Terms into Error . . . . . . . . . . . . .
8.11 Hierarchy . . . . . . . . . . . . . . . . . . . . .
8.12 Problems . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

165
165
167
170
171
175
179
182
185
186
191
192
197


A Closer Look at Factorial Data
9.1
Contrasts for Factorial Data . . . . . . . . . . . . . . . .
9.2
Modeling Interaction . . . . . . . . . . . . . . . . . . .
9.2.1 Interaction plots . . . . . . . . . . . . . . . . .
9.2.2 One-cell interaction . . . . . . . . . . . . . . .
9.2.3 Quantitative factors . . . . . . . . . . . . . . .
9.2.4 Tukey one-degree-of-freedom for nonadditivity .
9.3
Further Reading and Extensions . . . . . . . . . . . . .
9.4
Problems . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

203
203
209
209
210
212

217
220
222

.
.
.
.
.
.

225
225
226
227
230
233
234

10 Further Topics in Factorials
10.1 Unbalanced Data . . . . . . . . . . . . . .
10.1.1 Sums of squares in unbalanced data
10.1.2 Building models . . . . . . . . . .
10.1.3 Testing hypotheses . . . . . . . . .
10.1.4 Empty cells . . . . . . . . . . . . .
10.2 Multiple Comparisons . . . . . . . . . . .

.
.
.

.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.


.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.


CONTENTS

10.3
10.4


10.5
10.6

Power and Sample Size . . . . .
Two-Series Factorials . . . . . .
10.4.1 Contrasts . . . . . . . .
10.4.2 Single replicates . . . .
Further Reading and Extensions
Problems . . . . . . . . . . . .

xi

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.

.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.

.
.
.
.
.
.

11 Random Effects
11.1 Models for Random Effects . . . . . . . . . . .
11.2 Why Use Random Effects? . . . . . . . . . . .
11.3 ANOVA for Random Effects . . . . . . . . . .
11.4 Approximate Tests . . . . . . . . . . . . . . .
11.5 Point Estimates of Variance Components . . . .
11.6 Confidence Intervals for Variance Components
11.7 Assumptions . . . . . . . . . . . . . . . . . .
11.8 Power . . . . . . . . . . . . . . . . . . . . . .
11.9 Further Reading and Extensions . . . . . . . .
11.10 Problems . . . . . . . . . . . . . . . . . . . .
12 Nesting, Mixed Effects, and Expected Mean Squares
12.1 Nesting Versus Crossing . . . . . . . . . . . .
12.2 Why Nesting? . . . . . . . . . . . . . . . . . .
12.3 Crossed and Nested Factors . . . . . . . . . . .
12.4 Mixed Effects . . . . . . . . . . . . . . . . . .
12.5 Choosing a Model . . . . . . . . . . . . . . . .
12.6 Hasse Diagrams and Expected Mean Squares .
12.6.1 Test denominators . . . . . . . . . . .
12.6.2 Expected mean squares . . . . . . . .
12.6.3 Constructing a Hasse diagram . . . . .

12.7 Variances of Means and Contrasts . . . . . . .
12.8 Unbalanced Data and Random Effects . . . . .
12.9 Staggered Nested Designs . . . . . . . . . . .
12.10 Problems . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.


.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

235
236
237
240
244
245

.

.
.
.
.
.
.
.
.
.

253
253
256
257
260
264
267
271
272
274
275

.
.
.
.
.
.
.
.

.
.
.
.
.

279
279
283
283
285
288
289
290
293
296
298
304
306
307


xii

CONTENTS

13 Complete Block Designs

315


13.1

Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13.2

The Randomized Complete Block Design . . . . . . . . . 316
13.2.1 Why and when to use the RCB . . . . . . . . . . 318
13.2.2 Analysis for the RCB . . . . . . . . . . . . . . . 319
13.2.3 How well did the blocking work? . . . . . . . . . 322
13.2.4 Balance and missing data . . . . . . . . . . . . . 324

13.3

Latin Squares and Related Row/Column Designs . . . . . 324
13.3.1 The crossover design . . . . . . . . . . . . . . . . 326
13.3.2 Randomizing the LS design . . . . . . . . . . . . 327
13.3.3 Analysis for the LS design . . . . . . . . . . . . . 327
13.3.4 Replicating Latin Squares . . . . . . . . . . . . . 330
13.3.5 Efficiency of Latin Squares . . . . . . . . . . . . 335
13.3.6 Designs balanced for residual effects . . . . . . . 338

13.4

Graeco-Latin Squares . . . . . . . . . . . . . . . . . . . . 343

13.5

Further Reading and Extensions . . . . . . . . . . . . . . 344


13.6

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 345

14 Incomplete Block Designs
14.1

357

Balanced Incomplete Block Designs . . . . . . . . . . . . 358
14.1.1 Intrablock analysis of the BIBD . . . . . . . . . . 360
14.1.2 Interblock information . . . . . . . . . . . . . . . 364

14.2

Row and Column Incomplete Blocks . . . . . . . . . . . . 368

14.3

Partially Balanced Incomplete Blocks . . . . . . . . . . . 370

14.4

Cyclic Designs . . . . . . . . . . . . . . . . . . . . . . . 372

14.5

Square, Cubic, and Rectangular Lattices . . . . . . . . . . 374

14.6


Alpha Designs . . . . . . . . . . . . . . . . . . . . . . . . 376

14.7

Further Reading and Extensions . . . . . . . . . . . . . . 378

14.8

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 379


CONTENTS

15 Factorials in Incomplete Blocks—Confounding
15.1

xiii

387

Confounding the Two-Series Factorial . . . . . . . . . . . 388
15.1.1 Two blocks . . . . . . . . . . . . . . . . . . . . . 389
15.1.2 Four or more blocks . . . . . . . . . . . . . . . . 392
15.1.3 Analysis of an unreplicated confounded two-series 397
15.1.4 Replicating a confounded two-series . . . . . . . 399
15.1.5 Double confounding . . . . . . . . . . . . . . . . 402

15.2


Confounding the Three-Series Factorial . . . . . . . . . . 403
15.2.1 Building the design . . . . . . . . . . . . . . . . 404
15.2.2 Confounded effects . . . . . . . . . . . . . . . . 407
15.2.3 Analysis of confounded three-series . . . . . . . . 408

15.3

Further Reading and Extensions . . . . . . . . . . . . . . 409

15.4

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 410

16 Split-Plot Designs

417

16.1

What Is a Split Plot? . . . . . . . . . . . . . . . . . . . . 417

16.2

Fancier Split Plots . . . . . . . . . . . . . . . . . . . . . . 419

16.3

Analysis of a Split Plot . . . . . . . . . . . . . . . . . . . 420

16.4


Split-Split Plots . . . . . . . . . . . . . . . . . . . . . . . 428

16.5

Other Generalizations of Split Plots . . . . . . . . . . . . 434

16.6

Repeated Measures . . . . . . . . . . . . . . . . . . . . . 438

16.7

Crossover Designs . . . . . . . . . . . . . . . . . . . . . 441

16.8

Further Reading and Extensions . . . . . . . . . . . . . . 441

16.9

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 442

17 Designs with Covariates

453

17.1

The Basic Covariate Model . . . . . . . . . . . . . . . . . 454


17.2

When Treatments Change Covariates . . . . . . . . . . . . 460

17.3

Other Covariate Models . . . . . . . . . . . . . . . . . . . 462

17.4

Further Reading and Extensions . . . . . . . . . . . . . . 466

17.5

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 466


xiv

CONTENTS

18 Fractional Factorials

471

18.1

Why Fraction? . . . . . . . . . . . . . . . . . . . . . . . . 471


18.2

Fractioning the Two-Series . . . . . . . . . . . . . . . . . 472

18.3

Analyzing a 2k−q . . . . . . . . . . . . . . . . . . . . . . 479

18.4

Resolution and Projection . . . . . . . . . . . . . . . . . . 482

18.5

Confounding a Fractional Factorial . . . . . . . . . . . . . 485

18.6

De-aliasing . . . . . . . . . . . . . . . . . . . . . . . . . 485

18.7

Fold-Over . . . . . . . . . . . . . . . . . . . . . . . . . . 487

18.8

Sequences of Fractions . . . . . . . . . . . . . . . . . . . 489

18.9


Fractioning the Three-Series . . . . . . . . . . . . . . . . 489

18.10 Problems with Fractional Factorials . . . . . . . . . . . . 492
18.11 Using Fractional Factorials in Off-Line Quality Control . . 493
18.11.1 Designing an off-line quality experiment . . . . . 494
18.11.2 Analysis of off-line quality experiments . . . . . . 495
18.12 Further Reading and Extensions . . . . . . . . . . . . . . 498
18.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 499
19 Response Surface Designs

509

19.1

Visualizing the Response . . . . . . . . . . . . . . . . . . 509

19.2

First-Order Models . . . . . . . . . . . . . . . . . . . . . 511

19.3

First-Order Designs . . . . . . . . . . . . . . . . . . . . . 512

19.4

Analyzing First-Order Data . . . . . . . . . . . . . . . . . 514

19.5


Second-Order Models . . . . . . . . . . . . . . . . . . . . 517

19.6

Second-Order Designs . . . . . . . . . . . . . . . . . . . 522

19.7

Second-Order Analysis . . . . . . . . . . . . . . . . . . . 526

19.8

Mixture Experiments . . . . . . . . . . . . . . . . . . . . 529
19.8.1 Designs for mixtures . . . . . . . . . . . . . . . . 530
19.8.2 Models for mixture designs . . . . . . . . . . . . 533

19.9

Further Reading and Extensions . . . . . . . . . . . . . . 535

19.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 536


CONTENTS

xv

20 On Your Own
543
20.1 Experimental Context . . . . . . . . . . . . . . . . . . . . 543

20.2 Experiments by the Numbers . . . . . . . . . . . . . . . . 544
20.3 Final Project . . . . . . . . . . . . . . . . . . . . . . . . . 548
Bibliography
A Linear Models for Fixed Effects
A.1
Models . . . . . . . . . .
A.2
Least Squares . . . . . . .
A.3
Comparison of Models . .
A.4
Projections . . . . . . . .
A.5
Random Variation . . . . .
A.6
Estimable Functions . . . .
A.7
Contrasts . . . . . . . . .
A.8
The Scheff´e Method . . . .
A.9
Problems . . . . . . . . .

549

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

563
563
566
568
570
572
576
578
579
580

B Notation

583


C Experimental Design Plans
C.1
Latin Squares . . . . . . . . . . . . . . . . . .
C.1.1 Standard Latin Squares . . . . . . . .
C.1.2 Orthogonal Latin Squares . . . . . . .
C.2
Balanced Incomplete Block Designs . . . . . .
C.3
Efficient Cyclic Designs . . . . . . . . . . . .
C.4
Alpha Designs . . . . . . . . . . . . . . . . . .
C.5
Two-Series Confounding and Fractioning Plans

607
607
607
608
609
615
616
617

.
.
.
.
.
.
.


.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.

.
.

.
.
.
.
.
.
.

D Tables

621

Index

647



Preface

Preface
This text covers the basic topics in experimental design and analysis and
is intended for graduate students and advanced undergraduates. Students
should have had an introductory statistical methods course at about the level
of Moore and McCabe’s Introduction to the Practice of Statistics (Moore and
McCabe 1999) and be familiar with t-tests, p-values, confidence intervals,
and the basics of regression and ANOVA. Most of the text soft-pedals theory

and mathematics, but Chapter 19 on response surfaces is a little tougher sledding (eigenvectors and eigenvalues creep in through canonical analysis), and
Appendix A is an introduction to the theory of linear models. I use the text
in a service course for non-statisticians and in a course for first-year Masters
students in statistics. The non-statisticians come from departments scattered
all around the university including agronomy, ecology, educational psychology, engineering, food science, pharmacy, sociology, and wildlife.
I wrote this book for the same reason that many textbooks get written:
there was no existing book that did things the way I thought was best. I start
with single-factor, fixed-effects, completely randomized designs and cover
them thoroughly, including analysis, checking assumptions, and power. I
then add factorial treatment structure and random effects to the mix. At this
stage, we have a single randomization scheme, a lot of different models for
data, and essentially all the analysis techniques we need. I next add blocking designs for reducing variability, covering complete blocks, incomplete
blocks, and confounding in factorials. After this I introduce split plots, which
can be considered incomplete block designs but really introduce the broader
subject of unit structures. Covariate models round out the discussion of variance reduction. I finish with special treatment structures, including fractional
factorials and response surface/mixture designs.
This outline is similar in content to a dozen other design texts; how is this
book different?
• I include many exercises where the student is required to choose an
appropriate experimental design for a given situation, or recognize the
design that was used. Many of the designs in question are from earlier
chapters, not the chapter where the question is given. These are important skills that often receive short shrift. See examples on pages 500
and 502.

xvii


xviii

Preface

• I use Hasse diagrams to illustrate models, find test denominators, and
compute expected mean squares. I feel that the diagrams provide a
much easier and more understandable approach to these problems than
the classic approach with tables of subscripts and live and dead indices.
I believe that Hasse diagrams should see wider application.
• I spend time trying to sort out the issues with multiple comparisons
procedures. These confuse many students, and most texts seem to just
present a laundry list of methods and no guidance.
• I try to get students to look beyond saying main effects and/or interactions are significant and to understand the relationships in the data. I
want them to learn that understanding what the data have to say is the
goal. ANOVA is a tool we use at the beginning of an analysis; it is not
the end.
• I describe the difference in philosophy between hierarchical model
building and parameter testing in factorials, and discuss how this becomes crucial for unbalanced data. This is important because the different philosophies can lead to different conclusions, and many texts
avoid the issue entirely.
• There are three kinds of “problems” in this text, which I have denoted
exercises, problems, and questions. Exercises are intended to be simpler than problems, with exercises being more drill on mechanics and
problems being more integrative. Not everyone will agree with my
classification. Questions are not necessarily more difficult than problems, but they cover more theoretical or mathematical material.

Data files for the examples and problems can be downloaded from the
Freeman web site at A second resource is Appendix B, which documents the notation used in the text.
This text contains many formulae, but I try to use formulae only when I
think that they will increase a reader’s understanding of the ideas. In several
settings where closed-form expressions for sums of squares or estimates exist, I do not present them because I do not believe that they help (for example,
the Analysis of Covariance). Similarly, presentations of normal equations do
not appear. Instead, I approach ANOVA as a comparison of models fit by
least squares, and let the computing software take care of the details of fitting. Future statisticians will need to learn the process in more detail, and
Appendix A gets them started with the theory behind fixed effects.
Speaking of computing, examples in this text use one of four packages:

MacAnova, Minitab, SAS, and S-Plus. MacAnova is a homegrown package
that we use here at Minnesota because we can distribute it freely; it runs


Preface

xix

on Macintosh, Windows, and Unix; and it does everything we need. You can
download MacAnova (any version and documentation, even the source) from
Minitab and SAS
are widely used commercial packages. I hadn’t used Minitab in twelve years
when I started using it for examples; I found it incredibly easy to use. The
menu/dialog/spreadsheet interface was very intuitive. In fact, I only opened
the manual once, and that was when I was trying to figure out how to do
general contrasts (which I was never able to figure out). SAS is far and away
the market leader in statistical software. You can do practically every kind of
analysis in SAS, but as a novice I spent many hours with the manuals trying
to get SAS to do any kind of analysis. In summary, many people swear by
SAS, but I found I mostly swore at SAS. I use S-Plus extensively in research;
here I’ve just used it for a couple of graphics.
I need to acknowledge many people who helped me get this job done.
First are the students and TA’s in the courses where I used preliminary versions. Many of you made suggestions and pointed out mistakes; in particular
I thank John Corbett, Alexandre Varbanov, and Jorge de la Vega Gongora.
Many others of you contributed data; your footprints are scattered throughout
the examples and exercises. Next I have benefited from helpful discussions
with my colleagues here in Minnesota, particularly Kit Bingham, Kathryn
Chaloner, Sandy Weisberg, and Frank Martin. I thank Sharon Lohr for introducing me to Hasse diagrams, and I received much helpful criticism from
reviewers, including Larry Ringer (Texas A&M), Morris Southward (New
Mexico State), Robert Price (East Tennessee State), Andrew Schaffner (Cal

Poly—San Luis Obispo), Hiroshi Yamauchi (Hawaii—Manoa), and William
Notz (Ohio State). My editor Patrick Farace and others at Freeman were a
great help. Finally, I thank my family and parents, who supported me in this
for years (even if my father did say it looked like a foreign language!).
They say you should never let the camel’s nose into the tent, because
once the nose is in, there’s no stopping the rest of the camel. In a similar
vein, student requests for copies of lecture notes lead to student requests for
typed lecture notes, which lead to student requests for more complete typed
lecture notes, which lead . . . well, in my case it leads to a textbook on design and analysis of experiments, which you are reading now. Over the years
my students have preferred various more primitive incarnations of this text to
other texts; I hope you find this text worthwhile too.

Gary W. Oehlert



Chapter 1

Introduction
Researchers use experiments to answer questions. Typical questions might
be:
• Is a drug a safe, effective cure for a disease? This could be a test of
how AZT affects the progress of AIDS.
• Which combination of protein and carbohydrate sources provides the
best nutrition for growing lambs?
• How will long-distance telephone usage change if our company offers
a different rate structure to our customers?
• Will an ice cream manufactured with a new kind of stabilizer be as
palatable as our current ice cream?
• Does short-term incarceration of spouse abusers deter future assaults?


• Under what conditions should I operate my chemical refinery, given
this month’s grade of raw material?

This book is meant to help decision makers and researchers design good
experiments, analyze them properly, and answer their questions.

1.1 Why Experiment?
Consider the spousal assault example mentioned above. Justice officials need
to know how they can reduce or delay the recurrence of spousal assault. They
are investigating three different actions in response to spousal assaults. The

Experiments
answer questions


2

Treatments,
experimental
units, and
responses

Introduction

assailant could be warned, sent to counseling but not booked on charges,
or arrested for assault. Which of these actions works best? How can they
compare the effects of the three actions?
This book deals with comparative experiments. We wish to compare
some treatments. For the spousal assault example, the treatments are the three

actions by the police. We compare treatments by using them and comparing
the outcomes. Specifically, we apply the treatments to experimental units
and then measure one or more responses. In our example, individuals who
assault their spouses could be the experimental units, and the response could
be the length of time until recurrence of assault. We compare treatments by
comparing the responses obtained from the experimental units in the different
treatment groups. This could tell us if there are any differences in responses
between the treatments, what the estimated sizes of those differences are,
which treatment has the greatest estimated delay until recurrence, and so on.
An experiment is characterized by the treatments and experimental units to
be used, the way treatments are assigned to units, and the responses that are
measured.

Advantages of
experiments

Experiments help us answer questions, but there are also nonexperimental techniques. What is so special about experiments? Consider that:
1. Experiments allow us to set up a direct comparison between the treatments of interest.
2. We can design experiments to minimize any bias in the comparison.
3. We can design experiments so that the error in the comparison is small.
4. Most important, we are in control of experiments, and having that control allows us to make stronger inferences about the nature of differences that we see in the experiment. Specifically, we may make inferences about causation.

Control versus
observation

This last point distinguishes an experiment from an observational study. An
observational study also has treatments, units, and responses. However, in
the observational study we merely observe which units are in which treatment
groups; we don’t get to control that assignment.


Example 1.1

Does spanking hurt?
Let’s contrast an experiment with an observational study described in Straus,
Sugarman, and Giles-Sims (1997). A large survey of women aged 14 to 21
years was begun in 1979; by 1988 these same women had 1239 children


1.1 Why Experiment?

3

between the ages of 6 and 9 years. The women and children were interviewed and tested in 1988 and again in 1990. Two of the items measured
were the level of antisocial behavior in the children and the frequency of
spanking. Results showed that children who were spanked more frequently
in 1988 showed larger increases in antisocial behavior in 1990 than those who
were spanked less frequently. Does spanking cause antisocial behavior? Perhaps it does, but there are other possible explanations. Perhaps children who
were becoming more troublesome in 1988 may have been spanked more frequently, while children who were becoming less troublesome may have been
spanked less frequently in 1988.
The drawback of observational studies is that the grouping into “treatments” is not under the control of the experimenter and its mechanism is
usually unknown. Thus observed differences in responses between treatment
groups could very well be due to these other hidden mechanisms, rather than
the treatments themselves.
It is important to say that while experiments have some advantages, observational studies are also useful and can produce important results. For example, studies of smoking and human health are observational, but the link
that they have established is one of the most important public health issues
today. Similarly, observational studies established an association between
heart valve disease and the diet drug fen-phen that led to the withdrawal
of the drugs fenfluramine and dexfenfluramine from the market (Connolloy
et al. 1997 and US FDA 1997).
Mosteller and Tukey (1977) list three concepts associated with causation

and state that two or three are needed to support a causal relationship:

Observational
studies are useful
too

Causal
relationships

• Consistency
• Responsiveness
• Mechanism.

Consistency means that, all other things being equal, the relationship between two variables is consistent across populations in direction and maybe
in amount. Responsiveness means that we can go into a system, change the
causal variable, and watch the response variable change accordingly. Mechanism means that we have a step-by-step mechanism leading from cause to
effect.
In an experiment, we are in control, so we can achieve responsiveness.
Thus, if we see a consistent difference in observed response between the
various treatments, we can infer that the treatments caused the differences
in response. We don’t need to know the mechanism—we can demonstrate

Experiments can
demonstrate
consistency and
responsiveness


4


Ethics constrain
experimentation

Introduction

causation by experiment. (This is not to say that we shouldn’t try to learn
mechanisms—we should. It’s just that we don’t need mechanism to infer
causation.)
We should note that there are times when experiments are not feasible,
even when the knowledge gained would be extremely valuable. For example,
we can’t perform an experiment proving once and for all that smoking causes
cancer in humans. We can observe that smoking is associated with cancer in
humans; we have mechanisms for this and can thus infer causation. But we
cannot demonstrate responsiveness, since that would involve making some
people smoke, and making others not smoke. It is simply unethical.

1.2 Components of an Experiment
An experiment has treatments, experimental units, responses, and a method
to assign treatments to units.
Treatments, units, and assignment method specify the experimental design.

Analysis not part
of design, but
consider it during
planning

Some authors make a distinction between the selection of treatments to be
used, called “treatment design,” and the selection of units and assignment of
treatments, called “experiment design.”
Note that there is no mention of a method for analyzing the results.

Strictly speaking, the analysis is not part of the design, though a wise experimenter will consider the analysis when planning an experiment. Whereas
the design determines the proper analysis to a great extent, we will see that
two experiments with similar designs may be analyzed differently, and two
experiments with different designs may be analyzed similarly. Proper analysis depends on the design and the kinds of statistical model assumptions we
believe are correct and are willing to assume.
Not all experimental designs are created equal. A good experimental
design must
• Avoid systematic error
• Be precise

• Allow estimation of error
• Have broad validity.

We consider these in turn.


1.3 Terms and Concepts

Comparative experiments estimate differences in response between treatments. If our experiment has systematic error, then our comparisons will be
biased, no matter how precise our measurements are or how many experimental units we use. For example, if responses for units receiving treatment
one are measured with instrument A, and responses for treatment two are
measured with instrument B, then we don’t know if any observed differences
are due to treatment effects or instrument miscalibrations. Randomization, as
will be discussed in Chapter 2, is our main tool to combat systematic error.
Even without systematic error, there will be random error in the responses,
and this will lead to random error in the treatment comparisons. Experiments
are precise when this random error in treatment comparisons is small. Precision depends on the size of the random errors in the responses, the number of
units used, and the experimental design used. Several chapters of this book
deal with designs to improve precision.
Experiments must be designed so that we have an estimate of the size

of random error. This permits statistical inference: for example, confidence
intervals or tests of significance. We cannot do inference without an estimate
of error. Sadly, experiments that cannot estimate error continue to be run.
The conclusions we draw from an experiment are applicable to the experimental units we used in the experiment. If the units are actually a statistical
sample from some population of units, then the conclusions are also valid
for the population. Beyond this, we are extrapolating, and the extrapolation
might or might not be successful. For example, suppose we compare two
different drugs for treating attention deficit disorder. Our subjects are preadolescent boys from our clinic. We might have a fair case that our results
would hold for preadolescent boys elsewhere, but even that might not be true
if our clinic’s population of subjects is unusual in some way. The results are
even less compelling for older boys or for girls. Thus if we wish to have
wide validity—for example, broad age range and both genders—then our experimental units should reflect the population about which we wish to draw
inference.
We need to realize that some compromise will probably be needed between these goals. For example, broadening the scope of validity by using a
variety of experimental units may decrease the precision of the responses.

1.3 Terms and Concepts
Let’s define some of the important terms and concepts in design of experiments. We have already seen the terms treatment, experimental unit, and
response, but we define them again here for completeness.

5

Design to avoid
systematic error

Design to
increase
precision

Design to

estimate error

Design to widen
validity

Compromise
often needed


×