Analysis of messy data volume III analysis of covariance

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (26.71 MB, 598 trang )

Analysis
of Messy Data
VOLUME III:
ANALYSIS OF COVARIANCE

George A. Milliken
Dallas E. Johnson

CHAPMAN & HALL/CRC
A CRC Pr ess Compan y
Boca Raton London Ne w York Washington, D.C.

C0317fm frame Page 4 Monday, July 16, 2001 7:52 AM

Library of Congress Cataloging-in-Publication Data
Milliken, George A., 1943–
Analysis of messy data / George A. Milliken, Dallas E. Johnson.
2 v. : ill. ; 24 cm.
Includes bibliographies and indexes.
Contents: v. 1. Designed experiments -- v. 2. Nonreplicated
experiments.
Vol. 2 has imprint: New York : Van Nostrand Reinhold.
ISBN 0-534-02713-X (v. 1) : $44.00 -- ISBN 0-442-24408-8 (v. 2)
1. Analysis of variance. 2. Experimental design. 3. Sampling
(Statistics) I. Johnson, Dallas E., 1938– . II. Title.
QA279 .M48 1984
519.5′352--dc19

84-000839

This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.
Apart from any fair dealing for the purpose of research or private study, or criticism or review, as permitted
under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored
or transmitted, in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without the prior permission
in writing of the publishers, or in the case of reprographic reproduction only in accordance with the
terms of the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with the
terms of the license issued by the appropriate Reproduction Rights Organization outside the UK.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com
© 2002 by Chapman & Hall/CRC
No claim to original U.S. Government works
International Standard Book Number 1-584-88083-X
Library of Congress Card Number 84-000839
Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper

C0317fm frame Page 5 Monday, June 25, 2001 1:04 PM

Table of Contents
Chapter 1

Introduction to the Analysis of Covariance

1.1
Introduction
1.2
The Covariate Adjustment Process
1.3
A General AOC Model and the Basic Philosophy
References
Chapter 2

One-Way Analysis of Covariance — One Covariate in a
Completely Randomized Design Structure

2.1
2.2
2.3
2.4

The Model
Estimation
Strategy for Determining the Form of the Model
Comparing the Treatments or Regression Lines
2.4.1 Equal Slopes Model
2.4.2 Unequal Slopes Model-Covariate by Treatment Interaction
2.5
Confidence Bands about the Difference of Two Treatments
2.6
Summary of Strategies

2.7
Analysis of Covariance Computations via the SAS® System
2.7.1 Using PROC GLM and PROC MIXED
2.7.2 Using JMP®
2.8
Conclusions
References
Exercise
Chapter 3
3.1
3.2

3.3
3.4
3.5
3.6

Examples: One-Way Analysis of Covariance — One Covariate
in a Completely Randomized Design Structure

Introduction
Chocolate Candy — Equal Slopes
3.2.1 Analysis Using PROC GLM
3.2.2 Analysis Using PROC MIXED
3.2.3 Analysis Using JMP®
Exercise Programs and Initial Resting Heart Rate — Unequal Slopes
Effect of Diet on Cholesterol Level: An Exception to the Basic
Analysis of Covariance Strategy
Change from Base Line Analysis Using Effect of Diet on Cholesterol
Level Data

Shoe Tread Design Data for Exception to the Basic Strategy

© 2002 by CRC Press LLC

C0317fm frame Page 6 Monday, June 25, 2001 1:04 PM

3.7

Equal Slopes within Groups of Treatments and Unequal Slopes
between Groups
3.8
Unequal Slopes and Equal Intercepts — Part 1
3.9
Unequal Slopes and Equal Intercepts — Part 2
References
Exercises
Chapter 4

Multiple Covariates in a One-Way Treatment Structure in a
Completely Randomized Design Structure

4.1
4.2
4.3
4.4
4.5

Introduction
The Model

Estimation
Example: Driving A Golf Ball with Different Shafts
Example: Effect of Herbicides on the Yield of Soybeans — Three
Covariates
4.6 Example: Models That Are Quadratic Functions of the Covariate
4.7 Example: Comparing Response Surface Models
Reference
Exercises
Chapter 5

Two-Way Treatment Structure and Analysis of Covariance in
a Completely Randomized Design Structure

5.1
5.2
5.3

Introduction
The Model
Using the SAS® System
5.3.1 Using PROC GLM and PROC MIXED
5.3.2 Using JMP®
5.4
Example: Average Daily Gains and Birth Weight — Common Slope
5.5
Example: Energy from Wood of Different Types of Trees — Some
Unequal Slopes
5.6
Missing Treatment Combinations
5.7

Example: Two-Way Treatment Structure with Missing Cells
5.8
Extensions
Reference
Exercises
Chapter 6
6.1
6.2
6.3
6.4
6.5
6.6

Beta-Hat Models

Introduction
The Beta-Hat Model and Analysis
Testing Equality of Parameters
Complex Treatment Structures
Example: One-Way Treatment Structure
Example: Two-Way Treatment Structure

© 2002 by CRC Press LLC

C0317fm frame Page 7 Monday, June 25, 2001 1:04 PM

6.7
Summary
Exercises

Chapter 7

Variable Selection in the Analysis of Covariance Model

7.1
Introduction
7.2
Procedure for Equal Slopes
7.3
Example: One-Way Treatment Structure with Equal Slopes Model
7.4
Some Theory
7.5
When Slopes are Possibly Unequal
References
Exercises
Chapter 8

Comparing Models for Several Treatments

8.1
Introduction
8.2
Testing Equality of Models for a One-Way Treatment Structure
8.3
Comparing Models for a Two-Way Treatment Structure
8.4
Example: One-Way Treatment Structure with One Covariate
8.5
Example: One-Way Treatment Structure with Three Covariates

8.6
Example: Two-Way Treatment Structure with One Covariate
8.7
Discussion
References
Exercises
Chapter 9

Two Treatments in a Randomized Complete Block Design
Structure

9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8

Introduction
Complete Block Designs
Within Block Analysis
Between Block Analysis
Combining Within Block and Between Block Information
Determining the Form of the Model
Common Slope Model
Comparing the Treatments
9.8.1 Equal Slopes Models
9.8.2 Unequal Slopes Model

9.9
Confidence Intervals about Differences of Two Regression Lines
9.9.1 Within Block Analysis
9.9.2 Combined Within Block and Between Block Analysis
9.10 Computations for Model 9.1 Using the SAS® System
9.11 Example: Effect of Drugs on Heart Rate
9.12 Summary
References
Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 8 Monday, June 25, 2001 1:04 PM

Chapter 10 More Than Two Treatments in a Blocked Design Structure
10.1
10.2
10.3

Introduction
RCB Design Structure — Within and Between Block Information
Incomplete Block Design Structure — Within and Between Block
Information
10.4 Combining Between Block and Within Block Information
10.5 Example: Five Treatments in RCB Design Structure
10.6 Example: Balanced Incomplete Block Design Structure with Four
Treatments
10.7 Example: Balanced Incomplete Block Design Structure with Four
Treatments Using JMP®

10.8 Summary
References
Exercises
Chapter 11 Covariate Measured on the Block in RCB and Incomplete
Block Design Structures
11.1
11.2
11.3
11.4
11.5
11.6

Introduction
The Within Block Model
The Between Block Model
Combining Within Block and Between Block Information
Common Slope Model
Adjusted Means and Comparing Treatments
11.6.1 Common Slope Model
11.6.2 Non-Parallel Lines Model
11.7 Example: Two Treatments
11.8 Example: Four Treatments in RCB
11.9 Example: Four Treatments in BIB
11.10 Summary
References
Exercises
Chapter 12 Random Effects Models with Covariates
12.1
12.2
12.3

12.4

Introduction
The Model
Estimation of the Variance Components
Changing Location of the Covariate Changes the Estimates of the
Variance Components
12.5 Example: Balanced One-Way Treatment Structure
12.6 Example: Unbalanced One-Way Treatment Structure
12.7 Example: Two-Way Treatment Structure
12.8 Summary
References
Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 9 Monday, June 25, 2001 1:04 PM

Chapter 13 Mixed Models
13.1
13.2
13.3
13.4

Introduction
The Matrix Form of the Mixed Model
Fixed Effects Treatment Structure
Estimation of Fixed Effects and Some Small Sample Size
Approximations

13.5 Fixed Treatments and Locations Random
13.6 Example: Two-Way Mixed Effects Treatment Structure in a CRD
13.7 Example: Treatments are Fixed and Locations are Random with a
RCB at Each Location
References
Exercises
Chapter 14 Analysis of Covariance Models with Heterogeneous Errors
14.1
14.2
14.3

Introduction
The Unequal Variance Model
Tests for Homogeneity of Variances
14.3.1 Levene’s Test for Equal Variances
14.3.2 Hartley’s F-Max Test for Equal Variances
14.3.3 Bartlett’s Test for Equal Variances
14.3.4 Likelihood Ratio Test for Equal Variances
14.4 Estimating the Parameters of the Regression Model
14.4.1 Least Squares Estimation
14.4.2 Maximum Likelihood Methods
14.5 Determining the Form of the Model
14.6 Comparing the Models
14.6.1 Comparing the Nonparallel Lines Models
14.6.2 Comparing the Parallel Lines Models
14.7 Computational Issues
14.8 Example: One-Way Treatment Structure with Unequal Variances
14.9 Example: Two-Way Treatment Structure with Unequal Variances
14.10 Example: Treatments in Multi-location Trial
14.11 Summary

References
Exercises
Chapter 15 Analysis of Covariance for Split-Plot and Strip-Plot Design
Structures
15.1
15.2
15.3
15.4

Introduction
Some Concepts
Covariate Measured on the Whole Plot or Large Size of Experimental
Unit
Covariate is Measured on the Small Size of Experimental Unit

© 2002 by CRC Press LLC

C0317fm frame Page 10 Monday, June 25, 2001 1:04 PM

15.5

Covariate is Measured on the Large Size of Experimental Unit and a
Covariate is Measured on the Small Size of Experimental Unit
15.6 General Representation of the Covariate Part of the Model
15.6.1 Covariate Measured on Large Size of Experimental Unit
15.6.2 Covariate Measured on the Small Size of Experimental Units
15.6.3 Summary of General Representation
15.7 Example: Flour Milling Experiment — Covariate Measured on the
Whole Plot

15.8 Example: Cookie Baking
15.9 Example: Teaching Methods with One Covariate Measured on the
Large Size Experimental Unit and One Covariate Measured on the
Small Size Experimental Unit
15.10 Example: Comfort Study in a Strip-Plot Design with Three Sizes of
Experimental Units and Three Covariates
15.11 Conclusions
References
Exercises
Chapter 16 Analysis of Covariance for Repeated Measures Designs
16.1
16.2
16.3
16.4

Introduction
The Covariance Part of the Model — Selecting R
Covariance Structure of the Data
Specifying the Random and Repeated Statements for PROC MIXED
of the SAS® System
16.5 Selecting an Adequate Covariance Structure
16.6 Example: Systolic Blood Pressure Study with Covariate Measured
on the Large Size Experimental Unit
16.7 Example: Oxide Layer Development Experiment with Three Sizes
of Experimental Units Where the Repeated Measure is at the Middle
Size of Experimental Unit and the Covariate is Measured on the
Small Size Experimental Unit
16.8 Conclusions
References
Exercises

Chapter 17 Analysis of Covariance for Nonreplicated Experiments
17.1
17.2
17.3.
17.4
17.5
17.6
17.7
17.8

Introduction
Experiments with A Single Covariate
Experiments with Multiple Covariates
Selecting Non-null and Null Partitions
Estimating the Parameters
Example: Milling Flour Using Three Factors Each at Two Levels
Example: Baking Bread Using Four Factors Each at Two Levels
Example: Hamburger Patties with Four Factors Each at Two Levels

© 2002 by CRC Press LLC

C0317fm frame Page 11 Monday, June 25, 2001 1:04 PM

17.9

Example: Strength of Composite Material Coupons with Two
Covariates
17.10 Example: Effectiveness of Paint on Bricks with Unequal Slopes
17.11 Summary

References
Exercises
Chapter 18 Special Applications of Analysis of Covariance
18.1
18.2
18.3
18.4

Introduction
Blocking and Analysis of Covariance
Treatments Have Different Ranges of the Covariate
Nonparametric Analysis of Covariance
18.4.1 Heart Rate Data from Exercise Programs
18.4.2 Average Daily Gain Data from a Two-Way Treatment
Structure
18.5 Crossover Design with Covariates
18.6 Nonlinear Analysis of Covariance
18.7 Effect of Outliers
References
Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 13 Monday, June 25, 2001 1:04 PM

Preface
Analysis of covariance is a statistical procedure that enables one to incorporate
information about concomitant variables into the analysis of a response variable.
Sometimes this is done in an attempt to reduce experimental error. Other times it is

done to better understand the phenomenon being studied. The approach used in this
book is that the analysis of covariance model is described as a method of comparing
a series of regression models — one for each of the levels of a factor or combinations
of levels of factors being studied. Since covariance models are regression models,
analysts can use all of the methods of regression analysis to deal with problems
such as lack of fit, outliers, etc. The strategies described in this book will enable the
reader to appropriately formulate and analyze various kinds of covariance models.
When covariates are measured and incorporated into the analysis of a response
variable, the main objective of analysis of covariance is to compare treatments or
treatment combinations at common values of the covariates. This is particularly true
when the experimental units assigned to each of the treatment combinations may
have differing values of the covariates. Comparing treatments is dependent on the
form of the covariance model and thus care must be taken so that mistakes are not
made when drawing conclusions.
The goal of this book is to present the structure and philosophy for using the
analysis of covariance by including descriptions of methodologies, illustrating the
methodologies by analyzing numerous data sets, and occasionally furnishing some
theory when required. Our aim is to provide data analysts with tools for analyzing
data with covariates and to enable them to appropriately interpret the results.
Some of the methods and techniques described in this book are not available in
other books, but two issues of Biometrics (1957, Volume 13, Number 3, and 1982,
Volume 38, Number 3) were dedicated to the topic of analysis of covariance. The
topics presented are among those that we, as consulting statisticians, have found to
be most helpful in analyzing data when covariates are available for possible inclusion
in the analysis.
Readers of this book will learn how to:
• Formulate appropriate analysis of covariance models
• Simplify analysis of covariance models
• Compare levels of a factor or of levels of combinations of factors when
the model involves covariates

• Construct and analyze a model with two or more factors in the treatment
structure
• Analyze two-way treatment structures with missing cells
• Compare models using the beta-hat model
• Perform variable selection within the analysis of covariance model

© 2002 by CRC Press LLC

C0317fm frame Page 14 Monday, June 25, 2001 1:04 PM

• Analyze models with blocking in the design structure and use combined
intra-block and inter-block information about the slopes of the regression
models
• Use random statements in PROC MIXED to specify random coefficient
regression models
• Carry out the analysis of covariance in a mixed model framework
• Incorporate unequal treatment variances into the analysis
• Specify the analysis of covariance models for split-plot, strip-plot and
repeated measures designs both in terms of the regression models and the
covariance structures of the repeated measures
• Incorporate covariates into the analysis of nonreplicated experiments, thus
extending some of the results in Analysis of Messy Data, Volume II
The last chapter consists of a collection of examples that deal with (1) using the
covariate to form blocks, (2) crossover designs, (3) nonparametric analysis of covariance, (4) using a nonlinear model for the covariate model, and (5) the process of
examining mixed analysis of covariance models for possible outliers.
The approach used in this book is similar to that used in the first two volumes.
Each topic is covered from a practical viewpoint, emphasizing the implementation
of the methods much more than the theory behind the methods. Some theory has
been presented for some of the newer methodologies. The book utilized the procedures of the SAS® system and JMP® software packages to carry out the computations

and few computing formulae are presented. Either SAS® system code or JMP® menus
are presented for the analysis of the data sets in the examples. The data in the
examples (except for those using chocolate chips) were generated to simulate real
world applications that we have encountered in our consulting experiences.
This book is intended for everyone who analyzes data. The reader should have
a knowledge of analysis of variance and regression analysis as well as basic statistical
ideas including randomization, confidence intervals, and hypothesis testing. The first
four chapters contain the information needed to form a basic philosophy for using
the analysis of covariance with a one-way treatment structure and should be read
by everyone. As one progresses through the book, the topics become more complex
by going from designs with blocking to split-plot and repeated measures designs.
Before reading about a particular topic in the later chapters, read the first four
chapters. Knowledge of Chapters 13 and 14 from Analysis of Messy Data, Volume I:
Designed Experiments would be useful for understanding the part of Chapter 5
involving missing cells. The information in Chapters 4 through 9 of Analysis of
Messy Data, Volume II: Nonreplicated Experiments is useful for comprehending the
topics discussed in Chapter 17.
This book is the culmination of more than 25 years of writing. The earlier
editions of this manuscript were slanted toward providing an appropriate analysis
of split-plot type designs by using fixed effects software such as PROC GLM of the
SAS® system. With the development of mixed models software, such as PROC
MIXED of the SAS® system and JMP®, the complications of the analysis of splitplot type designs disappeared and thus enabled the manuscript to be completed
without including the difficult computations that are required when using fixed

© 2002 by CRC Press LLC

C0317fm frame Page 15 Monday, June 25, 2001 1:04 PM

effects software. Over the years, several colleagues made important contributions.

Discussions with Shie-Shien Yang were invaluable for the development of the variable selection process described in Chapter 7. Vicki Landcaster and Marie Loughin
read some of the earlier versions and provided important feedback. Discussions with
James Schwenke, Kate Ash, Brian Fergen, Kevin Chartier, Veronica Taylor, and
Mike Butine were important for improving the chapters involving combining intraand inter-block information and the strategy for the analysis of repeated measures
designs. Finally, we cannot express enough our thanks to Jane Cox who typed many
of the initial versions of the chapters. If it were not for Jane’s skills with the word
processor, the task of finishing this book would have been much more difficult.
We dedicate this volume to all who have made important contributions to our
personal and professional lives. This includes our wives, Janet and Erma Jean, our
children, Scott and April and Kelly and Mark, and our parents and parents in-law
who made it possible for us to pursue our careers as statisticians. We were both
fortunate to study with Franklin Graybill and we thank him for making sure that we
were headed in the right direction when our careers began.

© 2002 by CRC Press LLC

C0317c01 frame Page 1 Sunday, June 24, 2001 1:46 PM

1

Introduction to the
Analysis of Covariance

1.1 INTRODUCTION
The statistical procedure termed analysis of covariance has been used in several
contexts. The most common description of analysis of covariance is to adjust the
analysis for variables that could not be controlled by the experimenter. For example,
if a researcher wishes to compare the effect that ten different chemical weed control
treatments have on yield of a specific wheat variety, the researcher may wish to

control for the differential effects of a fertility trend occurring in the field and for
the number of wheat plants per plot that happen to emerge after planting. The
differential effects of a fertility trend can possibly be removed by using a randomized
complete block design structure, but it may not be possible to control the number
of wheat plants per plot (unless the seeds are sewn thickly and then the emerging
plants are thinned to a given number of plants per plot). The researcher wishes to
compare the treatments as if each treatment were grown on plots with the same
average fertility level and as if every plot had the same number of wheat plants. The
use of a randomized complete block design structure in which the blocks are constructed such that the fertility levels of plots within a block are very similar will
enable the treatments to be compared by averaging over the fertility levels, but the
analysis of covariance is a procedure which can compare treatment means after first
adjusting for the differential number of wheat plants per plot. The adjustment
procedure involves constructing a model that describes the relationship between
yield and the number of wheat plants per plot for each treatment, which is in the
form of a regression model. The regression models, one for each level of the
treatment, are then compared at a predetermined common number of wheat plants
per plot.

1.2 THE COVARIATE ADJUSTMENT PROCESS
To demonstrate the type of adjustment process that is being carried out when the
analysis of covariance methodology is applied, the set of data in Table 1.1 is used
in which there are two treatments and five plots per treatment in a completely
randomized design structure. Treatment 1 is a chemical application to control the
growth of weeds and Treatment 2 is a control without any chemicals to control the
weeds. The data in Table 1.1 consist of the yield of wheat plants of a specific variety
from plots of identical size along with the number of wheat plants that emerged

© 2002 by CRC Press LLC

C0317c01 frame Page 2 Sunday, June 24, 2001 1:46 PM

2

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.1
Yield and Plants per Plot Data for the Example
in Section 1.2
Treatment 1
Yield per plot
951
957
776
1033
840

Treatment 2

Plants per plot
126
128
107
142
120

Yield per plot
930
790
764

989
740

Plants per plot
135
119
110
140
102

Yield per plot

1100

1000

900

X
Means
X

800

700
1

2
Treatment Number

FIGURE 1.1 Plot of the data for the two treatments, with the “X” denoting the respective
means.

after planting per plot. The researcher wants to compare the yields of the two
treatments for the condition when there are 125 plants per plot.
Figure 1.1 is a graphical display of the plot yields for each of the treatments
where the circles represent the data points for Treatment 1 and the boxes represent
the data points for Treatment 2. An “X” is used to mark the means of each of the
treatments.
If the researcher uses the two-sample t-test or one-way analysis of variance to
compare the two treatments without taking information into account about the
number of plants per plot, a t statistic of 1.02 or a F statistic of 1.05 is obtained,
indicating the two treatment means are not significantly different ( p = 0.3361). The
results of the analysis are in Table 1.2 in which the estimated standard error of the
difference of the two treatment means is 67.23.
© 2002 by CRC Press LLC

C0317c01 frame Page 3 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

3

TABLE 1.2
Analysis of Variance Table and Means for Comparing
the Yields of the Two Treatments Where No Information
about the Number of Plants per Plot is Used
Source
Model

Error
Corrected total

df
1
8
9

SS
11833.60
90408.40
102242.00

MS
11833.60
11301.05

FValue
1.05

ProbF
0.3361

Source
TRT

df
1

SS (type III)

11833.60

MS
11833.60

FValue
1.05

ProbF
0.3361

Parameter
Trt 1 – Trt 2

Estimate
68.8

StdErr
67.23

t Value
1.02

Probt
0.3361

TRT
1
2

LSMean
911.40
842.60

ProbtDiff
0.3361

1100

el

od

m
t1

en

atm

Tre

Yield per plot

1000

l

de

2
nt

mo

e

atm

Tre

900

800

Treatment 1 data
Treatment 2 data

700
100

110

120

130

140

150

Number of plants per plot

FIGURE 1.2 Plot of the data and the estimated regression models for the two treatments.

The next step is to investigate the relationship between the yield per plot and
the number of plants per plot. Figure 1.2 is a display of the data where the number
of plants is on the horizontal axis and the yield is on the vertical axis. The circles
denote the data for Treatment 1 and the boxes denote the data for Treatment 2. The
two lines on the graph, denoted by Treatment 1 model and Treatment 2 model, were
computed from the data by fitting the model yij = αi + βxij + εij, i = 1, 2 and j = 1,
© 2002 by CRC Press LLC

C0317c01 frame Page 4 Sunday, June 24, 2001 1:46 PM

4

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.3
Analysis of Covariance to Provide the Estimates of the Slope
and Intercepts to be Used in Adjusting the Data
Source
Model
Error
Uncorr Total

df
3

7
10

SS
7787794.74
5737.26
7793532.00

MS
2595931.58
819.61

FValue
3167.28

ProbF
0.0000

Source
TRT
Plants

df
2
1

SS(Type III)
4964.18
84671.14

MS
2482.09
84671.14

FValue
3.03
103.31

ProbF
0.1128
0.0000

Parameter
Trt 1 – Trt 2

Estimate
44.73

StdErr
18.26

tValue
2.45

Probt
0.0441

Parameter
TRT 1
TRT 2

Plants

Estimate
29.453
–15.281
7.078

StdErr
87.711
85.369
0.696

tValue
0.34
–0.18
10.16

Probt
0.7469
0.8630
0.0000

2, …, 5, a model with different intercepts and common or equal slopes. The results
are included in Table 1.3.
Now analysis of covariance is used to compare the two treatments when there
are 125 plants per plot. The process of the analysis of covariance is to slide or move
the observations from a given treatment along the estimated regression model (parallel to the model) to intersect the vertical line at 125 plants per plot. This sliding
is demonstrated in Figure 1.3 where the solid circles represent the adjusted data for
Treatment 1 and the solid boxes represent the adjusted data for Treatment 2.
The lines join the open circles to the solid circles and join the open boxes to

the solid boxes. The lines indicate that the respective data points slid to the vertical
line at which there are 125 plants per plot.
The adjusted data are computed by

(

) (

)

(

yAij = yij − αˆ i + βˆ xij + αˆ i + βˆ125 = yij + βˆ 125 − xij

)

ˆ i + βˆ xij) i = 1,2 and j = 1,2,…,5 are the residuals or deviations of
The terms yij – (α
the observations from the estimated regression models. The preliminary computations of the adjusted yields are in Table 1.4. These adjusted yields are the predicted
yields of the plots as if each plot had 125 plants.
The next step is to compare the two treatments through the adjusted yield values
by computing a two-sample t statistic or the F statistic from a one-way analysis of
variance. The results of these analyses are in Table 1.5.
A problem with this analysis is that it assumes the adjusted data are not adjusted
data and so there is no reduction in the degrees of freedom for error due to estimating
the slope of the regression lines. Hence the final step is to recalculate the statistics

© 2002 by CRC Press LLC

C0317c01 frame Page 5 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

5

125 plants per plot

1100

el

od

m
t1

en

Yield per plot

tm

a
Tre

1000

el

od

m
t2

Slide observations
parallel to regression line
to meet the line of 125
plants per plot

en

tm

a
Tre

900
Adjusted data symbols

800

Treatment 1 data
Treatment 2 data

700
100

110

120

130

140

150

Number of plants per plot

FIGURE 1.3 Plot of the data and estimated regression models showing how to compute
adjusted yield values at 125 plants per plot.

TABLE 1.4
Preliminary Computations Used in Computing Adjusted Data
for Each Treatment as If All Plots Had 125 Plants per Plot
Treatment
1
1
1
1
1

Yield Per Plot
951
957
776
1033
840

Plants Per Plot
126
128
107
142
120

Residual
29.6905
21.534
–10.8232
–1.5611
–38.8402

Adjusted Yield
943.922
935.765
903.408
912.67
875.391

2
2
2
2
2

930
790
764

989
740

135
119
110
140
102

–10.2795
–37.0279
0.6761
13.3294
33.3019

859.218
832.469
870.173
882.827
902.799

by changing the degrees of freedom for error in Table 1.5 from 8 to 7 (the cost of
estimating the slope). The sum of squares error is identical for both Tables 1.3 and
1.5, but the error sum of squares from Table 1.5 is based on 8 degrees of freedom
instead of 7. To account for this change in degrees of freedom in Table 1.5, the
estimated standard error for comparing the two treatments needs to be multiplied
by 8 ⁄ 7 , the t statistic needs to be multiplied by 7 ⁄ 8 , and the F statistic needs to
be multiplied by 7/8. The recalculated statistics are presented in Table 1.6. Here the
estimated standard error of the difference between the two means is 18.11, a 3.7-fold
reduction over the analysis that ignores the information from the covariate. Thus,

© 2002 by CRC Press LLC

C0317c01 frame Page 6 Sunday, June 24, 2001 1:46 PM

6

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.5
Analysis of the Adjusted Yields (Too Many Degrees
of Freedom for Error)
Source
Model
Error
Corrected Total

df
1
8
9

SS
5002.83
5737.26
10740.09

MS
5002.83

717.16

FValue
6.98

ProbF
0.0297

Source
TRT

df
1

SS (Type III)
5002.83

MS
5002.83

FValue
6.98

ProbF
0.0297

Parameter
Trt 1 – Trt 2

Estimate

44.734

StdErr
16.937

t Value
2.641197

Probt
0.0297

TRT
1
2

LSMean
914.231
869.497

ProbtDiff
0.0297

TABLE 1.6
Recalculated Statistics to Reflect the Loss of
Error Degrees of Freedom Due to Estimating
the Slope before Computing the Adjusted Yields
Recalculated
Recalculated
Recalculated
Recalculated

estimated standard error
t-statistic
F-statistic
significance level

18.11
2.47
6.10
0.0428

by taking into account the linear relationship between the yield of the plot and the
number of plants in that plot, there is a tremendous reduction in the variability of
the data. In fact, the analysis of the adjusted data shows there is a significant
difference between the yields of the two treatments when adjusting for the unequal
number of plants per plot (p = 0.0428), when the analysis of variance in Table 1.2
did not indicate there is a significant difference between the treatments ( p = 0.3361).
The final issue is that since this analysis of the adjusted data overlooks the fact the
slope has been estimated, the estimated standard error of the difference of two means
is a little small as compared to the estimated standard error one gets from the analysis
of covariance. The estimated standard error of the difference of the two means as
computed from the analysis of covariance in Table 1.3 is 18.26 as compared to 18.11
for the analysis of the adjusted data. Thus the two analyses are not quite identical.
This example shows the power of being able to use information about covariates
or independent variables to make decisions about the treatments being included in

© 2002 by CRC Press LLC

C0317c01 frame Page 7 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

7

the study. The analysis of covariance uses a model to adjust the data as if all the
observations are from experimental units with identical values of the covariates.
A typical discussion of analysis of covariance indicates that the analyst should
include the number of plants as a term in the model so that term accounts for
variability in the observed yields, i.e., the variance of the model is reduced. If
including the number of plants in the model reduces the variability enough, then it
is used to adjust the data before the variety means are compared. It is important to
remember that there is a model being assumed when the covariate or covariates are
included in a model.

1.3 A GENERAL AOC MODEL AND
THE BASIC PHILOSOPHY
In this text, the analysis of covariance is described in more generality than that of
adjusting for variation due to uncontrollable variables. The analysis of covariance
is defined as a method for comparing several regression surfaces or lines, one for
each treatment or treatment combination, where a different regression surface is
possibly used to describe the data for each treatment or treatment combination.
A one-way treatment structure with t treatments in a completely randomized
design structure (Milliken and Johnson, 1992) is used as a basis for setting up the
definitions for the analysis of covariance model. The experimental situation involves
selecting N experimental units from a population of experimental units and measuring k characteristics x1ij, x2ij, …, xkij on each experimental unit. The variables x1ij,
x2ij, …, xkij are called covariates or independent variables or concomitant variables.
It is important to measure the values of the covariates before the treatments are
applied to the experimental units so that the levels of the treatments do not effect
the values of the covariates. At a minimum, the values of the covariate should not

be effected by the applied levels of the treatments. In the chemical weed treatment
experiment, the number of plants per plot occur after applying a particular treatment
on a plot, so the value of the covariate (number of plants per plot) could not be
determined before the treatments were applied to the plots. If the germination rate is
affected by the applied treatments, then the number of plants per plot cannot be used
as a covariate in the conventional manner (see Chapter 2 for further discussion). After
the set of experimental units is selected and the values of the covariates are determined
(whent possible), then randomly assign ni experimental units to treatment i, where
N = Σ ni. One generally assigns equal numbers of experimental units to the levels
i1
of the treatment, but equal numbers of experimental units per level of the treatment
are not necessary. After an experimental unit is subjected its specific level of the
treatment, then measure the response or dependent variable which is denoted by yij.
Thus the variables used in the discussions are summarized as:
yij
x1ij
x2ij
xkij

is
is
is
is

the
the
the
the

dependent measure

first independent variable or covariate
second independent variable or covariate
kth independent variable or covariate

© 2002 by CRC Press LLC

C0317c01 frame Page 8 Sunday, June 24, 2001 1:46 PM

8

Analysis of Messy Data, Volume III: Analysis of Covariance

At this point, the experimental design is a one-way treatment structure with
t treatments in a completely randomized design structure with k covariates. If there
is a linear relationship between the mean of y for the ith treatment and the k covariates
or independent variables, an analysis of covariance model can be expressed as:
y ij = βoi + βli x lij + β2 i x 2 ij + … + β ki x kij + ε ij

(1.1)

for i = 1, 2, …, t, and j = 1, 2, …, ni, and the εij ~ iid N(0, σ2), i.e., the εij are
independently identically distributed normal random variables with mean 0 and
variance σ2. The important thing to note about this model is that the mean of the
y values from a given treatment depends on the values of the x’s as well as on the
treatment applied to the experimental units.
The analysis of covariance is a strategy for making decisions about the form of
the covariance model through testing a series of hypotheses and then making treatment comparisons by comparing the estimated responses from the final regression
models. Two important hypotheses that help simplify the regression models are
H01: βh1 = βh2 = … = βht = 0 vs. Ha1: (not H01:), that is, all the treatments’

slopes for the hth covariate are zero, h = 1, 2, …, k, or
H02: βh1 = βh2 = … = βht vs. Ha2: (not Ho2:), that is, the slopes for the hth
covariate are equal across the treatments, meaning the surfaces are parallel
in the direction of the hth covariate, h = 1, 2, …, k.
The analysis of covariance model in Equation 1.1 is a combination of an analysis
of variance model and a regression model. The analysis of covariance model is part
of an analysis of variance model since the intercepts and slopes are functions of the
levels of the treatments. The analysis of covariance model is also part of a regression
model since the model for each treatment is a regression model.
An experiment is designed to purchase a certain number of degrees of freedom
for error (generally without the covariates) and the experimenter is willing to sell
some of those degrees of freedom for good or effective covariates which will help
reduce the magnitude of the error variance. The philosophy in this book is to select
the simplest possible expression for the covariate part of the model before making
treatment comparisons.
This process of model building to determine the simplest adequate form of the
regression models follows the principle of parsimony and helps guard against foolishly selling degrees of freedom for error to retain unnecessary covariate terms in
the model. Thus the strategy for analysis of covariance begins with testing hypotheses
such as H01 and H02 to make decisions about the form of the covariate or regression
part of the model. Once the form of the covariate part of the model is finalized, the
treatments are compared by comparing the regression surfaces at predetermined
values of the covariates.
The structure of the following chapters leads one through the forest of analysis
of covariance by starting with the simple model with one covariate and building
through the complex process involving analysis of covariance in split-plot and

© 2002 by CRC Press LLC

C0317c01 frame Page 9 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

9

repeated measures designs. Other topics discussed are multiple covariates, experiments involving blocks, and graphical methods for comparing the models for the
various treatments.
Chapter 2 discusses the simple analysis of covariance model involving a oneway treatment structure in a completely randomized design structure with one
covariate and Chapter 3 contains several examples demonstrating the strategies for
situations involving one covariate. Chapter 4 presents a discussion of the analysis
of covariance models involving more than one covariate which includes polynomial
regression models. Models involving two-way treatment structures, both balanced
and unbalanced, are discussed in Chapter 5. A method of comparing parameters via
beta-hat models is described in Chapter 6. Chapter 7 describes a method for variable
selection in the analysis of covariance where many possible covariates were measured. Chapter 8 discusses methods for testing the equality of several regression
models.
The next set of chapters (9 through 11) discuss analysis of covariance in the
randomized complete block and incomplete block design structures. The analysis
of data where the values of a characteristic are used to construct blocks is described,
i.e., where the value of the covariate is the same for all experimental units in a block.
In the analysis of covariance context, inter- or between block information about the
intercepts and slopes is required to extract all available information about the regression lines or surfaces from the data. Usual analysis methods extract only the intrablock information from the data. A mixed models analysis involving methods of
moments and maximum likelihood estimation of the variance components provides
combined estimates of the parameters and should be used for blocked experiments.
Chapter 12 describes models where the levels of the treatments are random effects
(Littell et al., 1996). The models in Chapter 12 include random coefficient models.
Chapter 13 provides a discussion of mixed models with covariates and Chapter 14
presents a discussion of unequal variance models.
Chapters 15 and 16 discuss problems with applying the analysis of covariance
to experiments involving repeated measures and split-plot design structures. One

has to consider the size of experimental unit on which the covariate is measured.
Cases are discussed where the covariate is measured on the large size of an experimental unit and when the covariate is measured on the small size of an experimental
unit. Several examples of split-plot and repeated measures designs are presented. A
process of selecting the simplest covariance structure for the repeated measures part
of the model and the simplest covariate (regression model) part of the model is
described. The analysis of covariance in the nonreplicated experiment is discussed
in Chapter 17. The half-normal plot methodology (Milliken and Johnson, 1989) is
used to determine the form of the covariate part of the model and to determine which
effects are to be included in the intercept part of the model.
Finally, several special applications of analysis of covariance are presented in
Chapter 18, including using the covariate to construct blocks, crossover designs,
nonlinear models, nonparameteric analysis of covariance, and a process for examining mixed models for possible outliers in the data set.
The procedures of the SAS® system (1989, 1996, and 1997) and JMP® (2000)
are used to demonstrate how to use software to carry out the analysis of covariance

© 2002 by CRC Press LLC

C0317c01 frame Page 10 Sunday, June 24, 2001 1:46 PM

10

Analysis of Messy Data, Volume III: Analysis of Covariance

computations. The topic of analysis of covariance has been the topic of two volumes
of Biometrics, Volume 13, Number 3, in 1957 and Volume 38, Number 3, in 1982.
The collection of papers in these two volumes present discussions of widely diverse
applications of analysis of covariance.

REFERENCES

Littell, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996) SAS® System for
Mixed Models, SAS Institute Inc., Cary, NC.
Milliken, G. A. and Johnson, D. E. (1989) Analysis of Messy Data, Volume II: Nonreplicated
Experiments, Chapman & Hall, London.
Milliken, G. A. and Johnson, D. E. (1992) Analysis of Messy Data, Volume I: Design
Experiments, Chapman & Hall, London.
SAS Institute Inc. (1989) SAS/STAT® User’s Guide, Version 6, Fourth Edition, Volume 2,
Cary, NC.
SAS Institute Inc. (1996) SAS/STAT® Software: Changes and Enhancements Through Release
6.11, Cary, NC.
SAS Institute Inc. (1997) SAS/STAT® Software: Changes and Enhancements Through Release
6.12, Cary, NC.
SAS Institute Inc. (2000) JMP® Statistics and Graphics Guide, Version 4, Cary, NC.

© 2002 by CRC Press LLC

C0317c02 frame Page 11 Monday, June 25, 2001 9:13 PM

2

One-Way Analysis
of Covariance —
One Covariate in a
Completely Randomized
Design Structure

2.1 THE MODEL
Suppose you have N homogeneous experimental
units and you randomly divide

t
them into t groups of ni units each where Σ ni = N. Each of the t treatments of a
i =1
one-way treatment structure is randomly assigned to one group of experimental
units, providing a one-way treatment structure in a completely randomized design
structure. It is assumed that the experimental units are subjected to their assigned
treatments independently of each other. Let yij (dependent variable) denote the jth
observation from the ith treatment and xij denote the covariate (independent variable)
corresponding to the (i,j)th experimental unit. As in Chapter 1, the values of the
covariate are not to be influenced by the levels of the treatment. The best case is
where the values of the covariate are determined before the treatments are assigned.
In any case, it is a good strategy to use the analysis of variance to check to see if
there are differences among the treatment covariate means (see Chapter 18).
Assume that the mean of yij can be expressed as a linear function of the covariate,
xij, with possibly a different linear function being required for each treatment. It is
important to note that the mean of an observation from the ith treatment group depends
on the value of the covariate as well as the treatment. In analysis of variance, the
mean of an observation from the ith treatment group depends only on the treatment.
The analysis of covariance model for a one-way treatment structure with one
covariate in a completely randomized design structure is
y ij = α i + βi X ij + ε ij ,
i = 1, 2, …, t. j = 1, 2, …, n i

(2.1)

where the mean of yi for a given value of X is µYi ΈX = αi + βiX. For making inferences,
it is assumed that εij ~ iid N(0, σ2). Model 2.1 has t intercepts (α1, …, αt ), t slopes
(β1, …, βt ), and one variance σ2, i.e., the model represents a collection of simple
linear regression models with a different model for each level of the treatment.

© 2002 by CRC Press LLC

C0317c02 frame Page 12 Monday, June 25, 2001 9:13 PM

12

Analysis of Messy Data, Volume III: Analysis of Covariance

Before analyzing this model, make sure that the data from each treatment can
in fact be described by a simple linear regression model. Various regression diagnostics should be run on the data before continuing. The equal variance assumption
should also be checked (see Chapter 14). If the simple linear regression model is
not adequate to describe the data for each treatment, then another model must be
selected before continuing with the analysis of covariance.
The analysis of covariance is a process of comparing the regression models and
then making decisions about the various parameters of the models. The process
involves comparing the t slopes, comparing the distances between the regression lines
(surfaces) at preselected values of X, and possibly comparing the t intercepts. The
analysis of covariance computations are typically presented in summation notation
with little emphasis on interpretations. In this and the following chapters, the various
covariance models are expressed in terms of matrices (see Chapter 6 of Milliken and
Johnson, 1992) and their interpretations are discussed. Software is used as the mode
of doing the analysis of covariance computations. The matrix form of Model 2.1 is
 y11  1
 M  M
 

 y1n1  1
 


 y21  0
 M  M
=

 y2 n2  0
 M  M
 

 ytl  0
 M  M
 

 ytnt  0
 


x11
M
x1n1
0
M
0
M
0
M
0

0
M
0

1
M
1
M
0
M
0

0
M
0
x21
M
x2 n2
M
0
M
0

L
L
L
L
L
L

0
M
0
0

M
0
M
1
M
1

0 
M 

0 

0 
M 

0 
M 

xtl 
M 

xtnt 


 α1 
 
 
 β1 
 
α 2 

 
  + ε.
 β2 
M
 
αt 
 
 
 βt 

(2.2)

which is expressed in the form of a linear model as y = Xβ + ε. The vector y denotes
the observations ordered by observation within each treatment, the 2t × 1 vector β
denotes the collection of slopes and intercepts, the matrix X is the design matrix,
and the vector ε represents the random errors.

2.2 ESTIMATION
The least squares estimator of the parameter vector β is βˆ = (X′X)–1X′y, but the least
squares estimator of β can also be obtained by fitting the simple linear regression
model to the data from each treatment and computing the least squares estimator of
each pair of parameters (αi, βi). For data from the ith treatment, fit the model
 y i1  1
  
 M  = M
  
y  1
 in i  
© 2002 by CRC Press LLC

x i1 

M 

x in i 


α i 
 
  + εi ,
 
 βi 
 

(2.3)

C0317c02 frame Page 13 Monday, June 25, 2001 9:13 PM

One-Way Analysis of Covariance

13

which is expressed as yi = Xiβi +εi. The least squares estimator of βi is βˆ i =
(X′i Xi)–1X′i y, the same as the estimator obtained for a simple linear regression model.
The estimates of βi and αi in summation notation are
ni

∑x y

ij ij

βˆ i =

− n i xi. yi.

j =1

ni

∑x

2
ij

− n i xi2.

j =1

and
αˆ i = yi. − βˆ i xi. .
The residual sum of squares for the ith model is
ni

SS Re si =

∑ (y

ij

)

2

− αˆ i − βˆ i x ij .

j =1

There are ni – 2 degrees of freedom associated with SSResi since the ith model
involves two parameters. After testing the equality of the treatment variances (see
Chapter 14) and deciding there is not enough evidence to conclude the variances
are unequal, the residual sum of squares for Model 2.1 can be obtained by pooling
residual sums of the squares for each of the t models, i.e., sum the SSResi together
to obtain
t

SS Re s =

∑ SS Re s .

(2.4)

i

i =1

The pooled residual sum of squares, SSRes, is based on the pooled degrees of
freedom, computed and denoted by
d.f .SS Re s =

t

t

i =1

i =1

∑ ( n i − 2 ) = ∑ n i − 2 t = N − 2 t.

The best estimate of the variance of the experimental units is σˆ 2 = SSRes/(N – 2t).
The sampling distribution of (N – 2t) σˆ 2/σ2 is central chi-square with (N – 2t) degrees
ˆ = (α
ˆ 1, βˆ 1,
of freedom. The sampling distribution of the least squares estimator, β′
ˆ
ˆ t, βt ) is normal with mean β′ = (α1, β1, …, αt , βt ) and variance-covariance
…, α
matrix σ2 (X′ X)–1, which can be written as
(X′X )−1
 1 1
−1
σ 2 ( X ′X ) = σ 2 
M
 0

© 2002 by CRC Press LLC

O
L

0 

M

−1
(X′t Xt ) 

(2.5)

Analysis of messy data volume III analysis of covariance

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về