Tải bản đầy đủ (.pdf) (738 trang)

Methods of multivariate analysis

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.71 MB, 738 trang )


Methods of Multivariate Analysis
Second Edition



Methods of Multivariate Analysis
Second Edition

ALVIN C. RENCHER
Brigham Young University

A JOHN WILEY & SONS, INC. PUBLICATION


This book is printed on acid-free paper.



Copyright c 2002 by John Wiley & Sons, Inc. All rights reserved.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as
permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)
750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212)
850-6008. E-Mail:
For ordering and customer service, call 1-800-CALL-WILEY.
Library of Congress Cataloging-in-Publication Data


Rencher, Alvin C., 1934–
Methods of multivariate analysis / Alvin C. Rencher.—2nd ed.
p. cm. — (Wiley series in probability and mathematical statistics)
“A Wiley-Interscience publication.”
Includes bibliographical references and index.
ISBN 0-471-41889-7 (cloth)
1. Multivariate analysis. I. Title. II. Series.
QA278 .R45 2001
519.5 35—dc21
2001046735
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1


Contents

1. Introduction
1.1
1.2
1.3
1.4

1

Why Multivariate Analysis?, 1
Prerequisites, 3
Objectives, 3
Basic Types of Data and Analysis, 3

2. Matrix Algebra


5

2.1 Introduction, 5
2.2 Notation and Basic Definitions, 5
2.2.1 Matrices, Vectors, and Scalars, 5
2.2.2 Equality of Vectors and Matrices, 7
2.2.3 Transpose and Symmetric Matrices, 7
2.2.4 Special Matrices, 8
2.3 Operations, 9
2.3.1 Summation and Product Notation, 9
2.3.2 Addition of Matrices and Vectors, 10
2.3.3 Multiplication of Matrices and Vectors, 11
2.4 Partitioned Matrices, 20
2.5 Rank, 22
2.6 Inverse, 23
2.7 Positive Definite Matrices, 25
2.8 Determinants, 26
2.9 Trace, 30
2.10 Orthogonal Vectors and Matrices, 31
2.11 Eigenvalues and Eigenvectors, 32
2.11.1 Definition, 32
2.11.2 I + A and I − A, 33
2.11.3 tr(A) and |A|, 34
2.11.4 Positive Definite and Semidefinite Matrices, 34
2.11.5 The Product AB, 35
2.11.6 Symmetric Matrix, 35
v



vi

CONTENTS

2.11.7
2.11.8
2.11.9
2.11.10

Spectral Decomposition, 35
Square Root Matrix, 36
Square Matrices and Inverse Matrices, 36
Singular Value Decomposition, 36

3. Characterizing and Displaying Multivariate Data

43

3.1 Mean and Variance of a Univariate Random Variable, 43
3.2 Covariance and Correlation of Bivariate Random Variables, 45
3.2.1 Covariance, 45
3.2.2 Correlation, 49
3.3 Scatter Plots of Bivariate Samples, 50
3.4 Graphical Displays for Multivariate Samples, 52
3.5 Mean Vectors, 53
3.6 Covariance Matrices, 57
3.7 Correlation Matrices, 60
3.8 Mean Vectors and Covariance Matrices for Subsets of
Variables, 62
3.8.1 Two Subsets, 62

3.8.2 Three or More Subsets, 64
3.9 Linear Combinations of Variables, 66
3.9.1 Sample Properties, 66
3.9.2 Population Properties, 72
3.10 Measures of Overall Variability, 73
3.11 Estimation of Missing Values, 74
3.12 Distance between Vectors, 76
4. The Multivariate Normal Distribution
4.1 Multivariate Normal Density Function, 82
4.1.1 Univariate Normal Density, 82
4.1.2 Multivariate Normal Density, 83
4.1.3 Generalized Population Variance, 83
4.1.4 Diversity of Applications of the Multivariate Normal, 85
4.2 Properties of Multivariate Normal Random Variables, 85
4.3 Estimation in the Multivariate Normal, 90
4.3.1 Maximum Likelihood Estimation, 90
4.3.2 Distribution of y and S, 91
4.4 Assessing Multivariate Normality, 92
4.4.1 Investigating Univariate Normality, 92
4.4.2 Investigating Multivariate Normality, 96

82


vii

CONTENTS

4.5 Outliers, 99
4.5.1 Outliers in Univariate Samples, 100

4.5.2 Outliers in Multivariate Samples, 101
5. Tests on One or Two Mean Vectors

112

5.1 Multivariate versus Univariate Tests, 112
5.2 Tests on ␮ with ⌺ Known, 113
5.2.1 Review of Univariate Test for H0 : µ = µ0
with σ Known, 113
5.2.2 Multivariate Test for H0 : ␮ = ␮0 with ⌺ Known, 114
5.3 Tests on ␮ When ⌺ Is Unknown, 117
5.3.1 Review of Univariate t-Test for H0 : µ = µ0 with σ
Unknown, 117
5.3.2 Hotelling’s T 2 -Test for H0 : ␮ = ␮0 with ⌺ Unknown, 117
5.4 Comparing Two Mean Vectors, 121
5.4.1 Review of Univariate Two-Sample t-Test, 121
5.4.2 Multivariate Two-Sample T 2 -Test, 122
5.4.3 Likelihood Ratio Tests, 126
5.5 Tests on Individual Variables Conditional on Rejection of H0 by
the T 2 -Test, 126
5.6 Computation of T 2 , 130
5.6.1 Obtaining T 2 from a MANOVA Program, 130
5.6.2 Obtaining T 2 from Multiple Regression, 130
5.7 Paired Observations Test, 132
5.7.1 Univariate Case, 132
5.7.2 Multivariate Case, 134
5.8 Test for Additional Information, 136
5.9 Profile Analysis, 139
5.9.1 One-Sample Profile Analysis, 139
5.9.2 Two-Sample Profile Analysis, 141

6. Multivariate Analysis of Variance

156

6.1 One-Way Models, 156
6.1.1 Univariate One-Way Analysis of Variance (ANOVA), 156
6.1.2 Multivariate One-Way Analysis of Variance Model
(MANOVA), 158
6.1.3 Wilks’ Test Statistic, 161
6.1.4 Roy’s Test, 164
6.1.5 Pillai and Lawley–Hotelling Tests, 166


viii

CONTENTS

6.2
6.3

6.4
6.5

6.6

6.7
6.8
6.9

6.10


6.11

6.1.6 Unbalanced One-Way MANOVA, 168
6.1.7 Summary of the Four Tests and Relationship to T 2 , 168
6.1.8 Measures of Multivariate Association, 173
Comparison of the Four Manova Test Statistics, 176
Contrasts, 178
6.3.1 Univariate Contrasts, 178
6.3.2 Multivariate Contrasts, 180
Tests on Individual Variables Following Rejection of H0 by the
Overall MANOVA Test, 183
Two-Way Classification, 186
6.5.1 Review of Univariate Two-Way ANOVA, 186
6.5.2 Multivariate Two-Way MANOVA, 188
Other Models, 195
6.6.1 Higher Order Fixed Effects, 195
6.6.2 Mixed Models, 196
Checking on the Assumptions, 198
Profile Analysis, 199
Repeated Measures Designs, 204
6.9.1 Multivariate vs. Univariate Approach, 204
6.9.2 One-Sample Repeated Measures Model, 208
6.9.3 k-Sample Repeated Measures Model, 211
6.9.4 Computation of Repeated Measures Tests, 212
6.9.5 Repeated Measures with Two Within-Subjects
Factors and One Between-Subjects Factor, 213
6.9.6 Repeated Measures with Two Within-Subjects
Factors and Two Between-Subjects Factors, 219
6.9.7 Additional Topics, 221

Growth Curves, 221
6.10.1 Growth Curve for One Sample, 221
6.10.2 Growth Curves for Several Samples, 229
6.10.3 Additional Topics, 230
Tests on a Subvector, 231
6.11.1 Test for Additional Information, 231
6.11.2 Stepwise Selection of Variables, 233

7. Tests on Covariance Matrices
7.1 Introduction, 248
7.2 Testing a Specified Pattern for ⌺, 248
7.2.1 Testing H0 : ⌺ = ⌺0 , 248

248


CONTENTS

ix

7.2.2 Testing Sphericity, 250
7.2.3 Testing H0 : ⌺ = σ 2 [(1 − ρ)I + ρJ], 252
7.3 Tests Comparing Covariance Matrices, 254
7.3.1 Univariate Tests of Equality of Variances, 254
7.3.2 Multivariate Tests of Equality of Covariance Matrices, 255
7.4 Tests of Independence, 259
7.4.1 Independence of Two Subvectors, 259
7.4.2 Independence of Several Subvectors, 261
7.4.3 Test for Independence of All Variables, 265
8. Discriminant Analysis: Description of Group Separation


270

8.1 Introduction, 270
8.2 The Discriminant Function for Two Groups, 271
8.3 Relationship between Two-Group Discriminant Analysis and
Multiple Regression, 275
8.4 Discriminant Analysis for Several Groups, 277
8.4.1 Discriminant Functions, 277
8.4.2 A Measure of Association for Discriminant Functions, 282
8.5 Standardized Discriminant Functions, 282
8.6 Tests of Significance, 284
8.6.1 Tests for the Two-Group Case, 284
8.6.2 Tests for the Several-Group Case, 285
8.7 Interpretation of Discriminant Functions, 288
8.7.1 Standardized Coefficients, 289
8.7.2 Partial F-Values, 290
8.7.3 Correlations between Variables and Discriminant
Functions, 291
8.7.4 Rotation, 291
8.8 Scatter Plots, 291
8.9 Stepwise Selection of Variables, 293
9. Classification Analysis: Allocation of Observations to Groups
9.1 Introduction, 299
9.2 Classification into Two Groups, 300
9.3 Classification into Several Groups, 304
9.3.1 Equal Population Covariance Matrices: Linear
Classification Functions, 304
9.3.2 Unequal Population Covariance Matrices: Quadratic
Classification Functions, 306


299


x

CONTENTS

9.4 Estimating Misclassification Rates, 307
9.5 Improved Estimates of Error Rates, 309
9.5.1 Partitioning the Sample, 310
9.5.2 Holdout Method, 310
9.6 Subset Selection, 311
9.7 Nonparametric Procedures, 314
9.7.1 Multinomial Data, 314
9.7.2 Classification Based on Density Estimators, 315
9.7.3 Nearest Neighbor Classification Rule, 318
10. Multivariate Regression

322

10.1 Introduction, 322
10.2 Multiple Regression: Fixed x’s, 323
10.2.1 Model for Fixed x’s, 323
10.2.2 Least Squares Estimation in the Fixed-x Model, 324
10.2.3 An Estimator for σ 2 , 326
10.2.4 The Model Corrected for Means, 327
10.2.5 Hypothesis Tests, 329
10.2.6 R 2 in Fixed-x Regression, 332
10.2.7 Subset Selection, 333

10.3 Multiple Regression: Random x’s, 337
10.4 Multivariate Multiple Regression: Estimation, 337
10.4.1 The Multivariate Linear Model, 337
10.4.2 Least Squares Estimation in the Multivariate Model, 339
ˆ 341
10.4.3 Properties of Least Squares Estimators B,
10.4.4 An Estimator for ⌺, 342
10.4.5 Model Corrected for Means, 342
10.5 Multivariate Multiple Regression: Hypothesis Tests, 343
10.5.1 Test of Overall Regression, 343
10.5.2 Test on a Subset of the x’s, 347
10.6 Measures of Association between the y’s and the x’s, 349
10.7 Subset Selection, 351
10.7.1 Stepwise Procedures, 351
10.7.2 All Possible Subsets, 355
10.8 Multivariate Regression: Random x’s, 358
11. Canonical Correlation
11.1 Introduction, 361
11.2 Canonical Correlations and Canonical Variates, 361

361


CONTENTS

xi

11.3 Properties of Canonical Correlations, 366
11.4 Tests of Significance, 367
11.4.1 Tests of No Relationship between the y’s and the x’s, 367

11.4.2 Test of Significance of Succeeding Canonical
Correlations after the First, 369
11.5 Interpretation, 371
11.5.1 Standardized Coefficients, 371
11.5.2 Correlations between Variables and Canonical Variates, 373
11.5.3 Rotation, 373
11.5.4 Redundancy Analysis, 373
11.6 Relationships of Canonical Correlation Analysis to Other
Multivariate Techniques, 374
11.6.1 Regression, 374
11.6.2 MANOVA and Discriminant Analysis, 376
12. Principal Component Analysis

380

12.1 Introduction, 380
12.2 Geometric and Algebraic Bases of Principal Components, 381
12.2.1 Geometric Approach, 381
12.2.2 Algebraic Approach, 385
12.3 Principal Components and Perpendicular Regression, 387
12.4 Plotting of Principal Components, 389
12.5 Principal Components from the Correlation Matrix, 393
12.6 Deciding How Many Components to Retain, 397
12.7 Information in the Last Few Principal Components, 401
12.8 Interpretation of Principal Components, 401
12.8.1 Special Patterns in S or R, 402
12.8.2 Rotation, 403
12.8.3 Correlations between Variables and Principal
Components, 403
12.9 Selection of Variables, 404

13. Factor Analysis
13.1 Introduction, 408
13.2 Orthogonal Factor Model, 409
13.2.1 Model Definition and Assumptions, 409
13.2.2 Nonuniqueness of Factor Loadings, 414
13.3 Estimation of Loadings and Communalities, 415
13.3.1 Principal Component Method, 415
13.3.2 Principal Factor Method, 421

408


xii

CONTENTS

13.4
13.5

13.6
13.7
13.8

13.3.3 Iterated Principal Factor Method, 424
13.3.4 Maximum Likelihood Method, 425
Choosing the Number of Factors, m, 426
Rotation, 430
13.5.1 Introduction, 430
13.5.2 Orthogonal Rotation, 431
13.5.3 Oblique Rotation, 435

13.5.4 Interpretation, 438
Factor Scores, 438
Validity of the Factor Analysis Model, 443
The Relationship of Factor Analysis to Principal Component
Analysis, 447

14. Cluster Analysis

451

14.1 Introduction, 451
14.2 Measures of Similarity or Dissimilarity, 452
14.3 Hierarchical Clustering, 455
14.3.1 Introduction, 455
14.3.2 Single Linkage (Nearest Neighbor), 456
14.3.3 Complete Linkage (Farthest Neighbor), 459
14.3.4 Average Linkage, 463
14.3.5 Centroid, 463
14.3.6 Median, 466
14.3.7 Ward’s Method, 466
14.3.8 Flexible Beta Method, 468
14.3.9 Properties of Hierarchical Methods, 471
14.3.10 Divisive Methods, 479
14.4 Nonhierarchical Methods, 481
14.4.1 Partitioning, 481
14.4.2 Other Methods, 490
14.5 Choosing the Number of Clusters, 494
14.6 Cluster Validity, 496
14.7 Clustering Variables, 497
15. Graphical Procedures

15.1 Multidimensional Scaling, 504
15.1.1 Introduction, 504
15.1.2 Metric Multidimensional Scaling, 505
15.1.3 Nonmetric Multidimensional Scaling, 508

504


CONTENTS

xiii

15.2 Correspondence Analysis, 514
15.2.1 Introduction, 514
15.2.2 Row and Column Profiles, 515
15.2.3 Testing Independence, 519
15.2.4 Coordinates for Plotting Row and Column Profiles, 521
15.2.5 Multiple Correspondence Analysis, 526
15.3 Biplots, 531
15.3.1 Introduction, 531
15.3.2 Principal Component Plots, 531
15.3.3 Singular Value Decomposition Plots, 532
15.3.4 Coordinates, 533
15.3.5 Other Methods, 535
A. Tables

549

B. Answers and Hints to Problems


591

C. Data Sets and SAS Files

679

References

681

Index

695



Preface

I have long been fascinated by the interplay of variables in multivariate data and by
the challenge of unraveling the effect of each variable. My continuing objective in
the second edition has been to present the power and utility of multivariate analysis
in a highly readable format.
Practitioners and researchers in all applied disciplines often measure several variables on each subject or experimental unit. In some cases, it may be productive to
isolate each variable in a system and study it separately. Typically, however, the variables are not only correlated with each other, but each variable is influenced by the
other variables as it affects a test statistic or descriptive statistic. Thus, in many
instances, the variables are intertwined in such a way that when analyzed individually they yield little information about the system. Using multivariate analysis, the
variables can be examined simultaneously in order to access the key features of the
process that produced them. The multivariate approach enables us to (1) explore
the joint performance of the variables and (2) determine the effect of each variable
in the presence of the others.

Multivariate analysis provides both descriptive and inferential procedures—we
can search for patterns in the data or test hypotheses about patterns of a priori interest. With multivariate descriptive techniques, we can peer beneath the tangled web of
variables on the surface and extract the essence of the system. Multivariate inferential
procedures include hypothesis tests that (1) process any number of variables without
inflating the Type I error rate and (2) allow for whatever intercorrelations the variables possess. A wide variety of multivariate descriptive and inferential procedures
is readily accessible in statistical software packages.
My selection of topics for this volume reflects many years of consulting with
researchers in many fields of inquiry. A brief overview of multivariate analysis is
given in Chapter 1. Chapter 2 reviews the fundamentals of matrix algebra. Chapters
3 and 4 give an introduction to sampling from multivariate populations. Chapters 5,
6, 7, 10, and 11 extend univariate procedures with one dependent variable (including
t-tests, analysis of variance, tests on variances, multiple regression, and multiple correlation) to analogous multivariate techniques involving several dependent variables.
A review of each univariate procedure is presented before covering the multivariate
counterpart. These reviews may provide key insights the student missed in previous
courses.
Chapters 8, 9, 12, 13, 14, and 15 describe multivariate techniques that are not
extensions of univariate procedures. In Chapters 8 and 9, we find functions of the
variables that discriminate among groups in the data. In Chapters 12 and 13, we
xv


xvi

PREFACE

find functions of the variables that reveal the basic dimensionality and characteristic
patterns of the data, and we discuss procedures for finding the underlying latent
variables of a system. In Chapters 14 and 15 (new in the second edition), we give
methods for searching for groups in the data, and we provide plotting techniques that
show relationships in a reduced dimensionality for various kinds of data.

In Appendix A, tables are provided for many multivariate distributions and tests.
These enable the reader to conduct an exact test in many cases for which software
packages provide only approximate tests. Appendix B gives answers and hints for
most of the problems in the book.
Appendix C describes an ftp site that contains (1) all data sets and (2) SAS command files for all examples in the text. These command files can be adapted for use
in working problems or in analyzing data sets encountered in applications.
To illustrate multivariate applications, I have provided many examples and exercises based on 59 real data sets from a wide variety of disciplines. A practitioner
or consultant in multivariate analysis gains insights and acumen from long experience in working with data. It is not expected that a student can achieve this kind of
seasoning in a one-semester class. However, the examples provide a good start, and
further development is gained by working problems with the data sets. For example,
in Chapters 12 and 13, the exercises cover several typical patterns in the covariance
or correlation matrix. The student’s intuition is expanded by associating these covariance patterns with the resulting configuration of the principal components or factors.
Although this is a methods book, I have included a few derivations. For some
readers, an occasional proof provides insights obtainable in no other way. I hope that
instructors who do not wish to use proofs will not be deterred by their presence. The
proofs can be disregarded easily when reading the book.
My objective has been to make the book accessible to readers who have taken as
few as two statistical methods courses. The students in my classes in multivariate
analysis include majors in statistics and majors from other departments. With the
applied researcher in mind, I have provided careful intuitive explanations of the concepts and have included many insights typically available only in journal articles or
in the minds of practitioners.
My overriding goal in preparation of this book has been clarity of exposition. I
hope that students and instructors alike will find this multivariate text more comfortable than most. In the final stages of development of both the first and second
editions, I asked my students for written reports on their initial reaction as they read
each day’s assignment. They made many comments that led to improvements in the
manuscript. I will be very grateful if readers will take the time to notify me of errors
or of other suggestions they might have for improvements.
I have tried to use standard mathematical and statistical notation as far as possible and to maintain consistency of notation throughout the book. I have refrained
from the use of abbreviations and mnemonic devices. These save space when one
is reading a book page by page, but they are annoying to those using a book as a

reference.
Equations are numbered sequentially throughout a chapter; for example, (3.75)
indicates the 75th numbered equation in Chapter 3. Tables and figures are also num-


PREFACE

xvii

bered sequentially throughout a chapter in the form “Table 3.8” or “Figure 3.1.”
Examples are not numbered sequentially; each example is identified by the same
number as the section in which it appears and is placed at the end of the section.
When citing references in the text, I have used the standard format involving the
year of publication. For a journal article, the year alone suffices, for example, Fisher
(1936). But for books, I have usually included a page number, as in Seber (1984,
p. 216).
This is the first volume of a two-volume set on multivariate analysis. The second
volume is entitled Multivariate Statistical Inference and Applications (Wiley, 1998).
The two volumes are not necessarily sequential; they can be read independently. I
adopted the two-volume format in order to (1) provide broader coverage than would
be possible in a single volume and (2) offer the reader a choice of approach.
The second volume includes proofs of many techniques covered in the first 13
chapters of the present volume and also introduces additional topics. The present
volume includes many examples and problems using actual data sets, and there are
fewer algebraic problems. The second volume emphasizes derivations of the results
and contains fewer examples and problems with real data. The present volume has
fewer references to the literature than the other volume, which includes a careful
review of the latest developments and a more comprehensive bibliography. In this
second edition, I have occasionally referred the reader to Rencher (1998) to note that
added coverage of a certain subject is available in the second volume.

I am indebted to many individuals in the preparation of the first edition. My initial exposure to multivariate analysis came in courses taught by Rolf Bargmann at
the University of Georgia and D. R. Jensen at Virginia Tech. Additional impetus to
probe the subtleties of this field came from research conducted with Bruce Brown
at BYU. I wish to thank Bruce Brown, Deane Branstetter, Del Scott, Robert Smidt,
and Ingram Olkin for reading various versions of the manuscript and making valuable suggestions. I am grateful to the following students at BYU who helped with
computations and typing: Mitchell Tolland, Tawnia Newton, Marianne Matis Mohr,
Gregg Littlefield, Suzanne Kimball, Wendy Nielsen, Tiffany Nordgren, David Whiting, Karla Wasden, and Rachel Jones.

SECOND EDITION
For the second edition, I have added Chapters 14 and 15, covering cluster analysis,
multidimensional scaling, correspondence analysis, and biplots. I also made numerous corrections and revisions (almost every page) in the first 13 chapters, in an effort
to improve composition, readability, and clarity. Many of the first 13 chapters now
have additional problems.
I have listed the data sets and SAS files on the Wiley ftp site rather than on a
diskette, as in the first edition. I have made improvements in labeling of these files.
I am grateful to the many readers who have pointed out errors or made suggestions
for improvements. The book is better for their caring and their efforts.


xviii

PREFACE

I thank Lonette Stoddard and Candace B. McNaughton for typing and J. D.
Williams for computer support. As with my other books, I dedicate this volume to
my wife, LaRue, who has supplied much needed support and encouragement.
A LVIN C. R ENCHER


Acknowledgments


I thank the authors, editors, and owners of copyrights for permission to reproduce
the following materials:


Figure 3.8 and Table 3.2, Kleiner and Hartigan (1981), Reprinted by permission
of Journal of the American Statistical Association



Table 3.3, Kramer and Jensen (1969a), Reprinted by permission of Journal of
Quality Technology



Table 3.4, Reaven and Miller (1979), Reprinted by permission of Diabetologia



Table 3.5, Timm (1975), Reprinted by permission of Elsevier North-Holland
Publishing Company



Table 3.6, Elston and Grizzle (1962), Reprinted by permission of Biometrics



Table 3.7, Frets (1921), Reprinted by permission of Genetica




Table 3.8, O’Sullivan and Mahan (1966), Reprinted by permission of American
Journal of Clinical Nutrition



Table 4.3, Royston (1983), Reprinted by permission of Applied Statistics



Table 5.1, Beall (1945), Reprinted by permission of Psychometrika



Table 5.2, Hummel and Sligo (1971), Reprinted by permission of Psychological
Bulletin



Table 5.3, Kramer and Jensen (1969b), Reprinted by permission of Journal of
Quality Technology



Table 5.5, Lubischew (1962), Reprinted by permission of Biometrics



Table 5.6, Travers (1939), Reprinted by permission of Psychometrika




Table 5.7, Andrews and Herzberg (1985), Reprinted by permission of SpringerVerlag



Table 5.8, Tintner (1946), Reprinted by permission of Journal of the American
Statistical Association



Table 5.9, Kramer (1972), Reprinted by permission of the author



Table 5.10, Cameron and Pauling (1978), Reprinted by permission of National
Academy of Science
xix


xx

ACKNOWLEDGMENTS


Table 6.2, Andrews and Herzberg (1985), Reprinted by permission of SpringerVerlag




Table 6.3, Rencher and Scott (1990), Reprinted by permission of Communications in Statistics: Simulation and Computation



Table 6.6, Posten (1962), Reprinted by permission of the author



Table 6.8, Crowder and Hand (1990, pp. 21–29), Reprinted by permission of
Routledge Chapman and Hall



Table 6.12, Cochran and Cox (1957), Timm (1980), Reprinted by permission
of John Wiley and Sons and Elsevier North-Holland Publishing Company



Table 6.14, Timm (1980), Reprinted by permission of Elsevier North-Holland
Publishing Company



Table 6.16, Potthoff and Roy (1964), Reprinted by permission of Biometrika
Trustees



Table 6.17, Baten, Tack, and Baeder (1958), Reprinted by permission of Quality
Progress




Table 6.18, Keuls et al. (1984), Reprinted by permission of Scientia Horticulturae



Table 6.19, Burdick (1979), Reprinted by permission of the author



Table 6.20, Box (1950), Reprinted by permission of Biometrics



Table 6.21, Rao (1948), Reprinted by permission of Biometrika Trustees



Table 6.22, Cameron and Pauling (1978), Reprinted by permission of National
Academy of Science



Table 6.23, Williams and Izenman (1989), Reprinted by permission of Colorado
State University



Table 6.24, Beauchamp and Hoel (1974), Reprinted by permission of Journal

of Statistical Computation and Simulation



Table 6.25, Box (1950), Reprinted by permission of Biometrics



Table 6.26, Grizzle and Allen (1969), Reprinted by permission of Biometrics



Table 6.27, Crepeau et al. (1985), Reprinted by permission of Biometrics



Table 6.28, Zerbe (1979a), Reprinted by permission of Journal of the American
Statistical Association



Table 6.29, Timm (1980), Reprinted by permission of Elsevier North-Holland
Publishing Company



Table 7.1, Siotani et al. (1963), Reprinted by permission of the Institute of Statistical Mathematics


ACKNOWLEDGMENTS


xxi



Table 7.2, Reprinted by permission of R. J. Freund



Table 8.1, Kramer and Jensen (1969a), Reprinted by permission of Journal of
Quality Technology



Table 8.3, Reprinted by permission of G. R. Bryce and R. M. Barker



Table 10.1, Box and Youle (1955), Reprinted by permission of Biometrics



Tables 12.2, 12.3, and 12.4, Jeffers (1967), Reprinted by permission of Applied
Statistics



Table 13.1, Brown et al. (1984), Reprinted by permission of the Journal of
Pascal, Ada, and Modula




Correlation matrix in Example 13.6, Brown, Strong, and Rencher (1973),
Reprinted by permission of The Journal of the Acoustical Society of America



Table 14.1, Hartigan (1975), Reprinted by permission of John Wiley and Sons



Table 14.3, Dawkins (1989), Reprinted by permission of The American Statistician



Table 14.7, Hand et al. (1994), Reprinted by permission of D. J. Hand



Table 14.12, Sokol and Rohlf (1981), Reprinted by permission of W. H. Freeman and Co.



Table 14.13, Hand et al. (1994), Reprinted by permission of D. J. Hand



Table 15.1, Kruskal and Wish (1978), Reprinted by permission of Sage Publications




Tables 15.2 and 15.5, Hand et al. (1994), Reprinted by permission of D. J. Hand



Table 15.13, Edwards and Kreiner (1983), Reprinted by permission of Biometrika



Table 15.15, Hand et al. (1994), Reprinted by permission of D. J. Hand



Table 15.16, Everitt (1987), Reprinted by permission of the author



Table 15.17, Andrews and Herzberg (1985), Reprinted by permission of
Springer Verlag



Table 15.18, Clausen (1988), Reprinted by permission of Sage Publications



Table 15.19, Andrews and Herzberg (1985), Reprinted by permission of
Springer Verlag




Table A.1, Mulholland (1977), Reprinted by permission of Biometrika Trustees



Table A.2, D’Agostino and Pearson (1973), Reprinted by permission of
Biometrika Trustees



Table A.3, D’Agostino and Tietjen (1971), Reprinted by permission of Biometrika
Trustees


xxii

ACKNOWLEDGMENTS



Table A.4, D’Agostino (1972), Reprinted by permission of Biometrika Trustees



Table A.5, Mardia (1970, 1974), Reprinted by permission of Biometrika
Trustees



Table A.6, Barnett and Lewis (1978), Reprinted by permission of John Wiley

and Sons



Table A.7, Kramer and Jensen (1969a), Reprinted by permission of Journal of
Quality Technology



Table A.8, Bailey (1977), Reprinted by permission of Journal of the American
Statistical Association



Table A.9, Wall (1967), Reprinted by permission of the author, Albuquerque,
NM



Table A.10, Pearson and Hartley (1972) and Pillai (1964, 1965), Reprinted by
permission of Biometrika Trustees



Table A.11, Schuurmann et al. (1975), Reprinted by permission of Journal of
Statistical Computation and Simulation



Table A.12, Davis (1970a,b, 1980), Reprinted by permission of Biometrika

Trustees



Table A.13, Kleinbaum, Kupper, and Muller (1988), Reprinted by permission
of PWS-KENT Publishing Company



Table A.14, Lee et al. (1977), Reprinted by permission of Elsevier NorthHolland Publishing Company



Table A.15, Mathai and Katiyar (1979), Reprinted by permission of Biometrika
Trustees


CHAPTER 1

Introduction

1.1 WHY MULTIVARIATE ANALYSIS?
Multivariate analysis consists of a collection of methods that can be used when several measurements are made on each individual or object in one or more samples. We
will refer to the measurements as variables and to the individuals or objects as units
(research units, sampling units, or experimental units) or observations. In practice,
multivariate data sets are common, although they are not always analyzed as such.
But the exclusive use of univariate procedures with such data is no longer excusable,
given the availability of multivariate techniques and inexpensive computing power
to carry them out.
Historically, the bulk of applications of multivariate techniques have been in the

behavioral and biological sciences. However, interest in multivariate methods has
now spread to numerous other fields of investigation. For example, I have collaborated on multivariate problems with researchers in education, chemistry, physics,
geology, engineering, law, business, literature, religion, public broadcasting, nursing, mining, linguistics, biology, psychology, and many other fields. Table 1.1 shows
some examples of multivariate observations.
The reader will notice that in some cases all the variables are measured in the same
scale (see 1 and 2 in Table 1.1). In other cases, measurements are in different scales
(see 3 in Table 1.1). In a few techniques, such as profile analysis (Sections 5.9 and
6.8), the variables must be commensurate, that is, similar in scale of measurement;
however, most multivariate methods do not require this.
Ordinarily the variables are measured simultaneously on each sampling unit. Typically, these variables are correlated. If this were not so, there would be little use for
many of the techniques of multivariate analysis. We need to untangle the overlapping
information provided by correlated variables and peer beneath the surface to see the
underlying structure. Thus the goal of many multivariate approaches is simplification. We seek to express what is going on in terms of a reduced set of dimensions.
Such multivariate techniques are exploratory; they essentially generate hypotheses
rather than test them.
On the other hand, if our goal is a formal hypothesis test, we need a technique that
will (1) allow several variables to be tested and still preserve the significance level
1


2

INTRODUCTION

Table 1.1. Examples of Multivariate Data
Units
1. Students
2. Students
3. People
4. Skulls

5. Companies
6. Manufactured items
7. Applicants for bank loans
8. Segments of literature
9. Human hairs
10. Birds

Variables
Several exam scores in a single course
Grades in mathematics, history, music, art, physics
Height, weight, percentage of body fat, resting heart
rate
Length, width, cranial capacity
Expenditures for advertising, labor, raw materials
Various measurements to check on compliance with
specifications
Income, education level, length of residence, savings
account, current debt load
Sentence length, frequency of usage of certain words
and of style characteristics
Composition of various elements
Lengths of various bones

and (2) do this for any intercorrelation structure of the variables. Many such tests are
available.
As the two preceding paragraphs imply, multivariate analysis is concerned generally with two areas, descriptive and inferential statistics. In the descriptive realm, we
often obtain optimal linear combinations of variables. The optimality criterion varies
from one technique to another, depending on the goal in each case. Although linear
combinations may seem too simple to reveal the underlying structure, we use them
for two obvious reasons: (1) they have mathematical tractability (linear approximations are used throughout all science for the same reason) and (2) they often perform

well in practice. These linear functions may also be useful as a follow-up to inferential procedures. When we have a statistically significant test result that compares
several groups, for example, we can find the linear combination (or combinations)
of variables that led to rejection of the hypothesis. Then the contribution of each
variable to these linear combinations is of interest.
In the inferential area, many multivariate techniques are extensions of univariate
procedures. In such cases, we review the univariate procedure before presenting the
analogous multivariate approach.
Multivariate inference is especially useful in curbing the researcher’s natural tendency to read too much into the data. Total control is provided for experimentwise
error rate; that is, no matter how many variables are tested simultaneously, the value
of α (the significance level) remains at the level set by the researcher.
Some authors warn against applying the common multivariate techniques to data
for which the measurement scale is not interval or ratio. It has been found, however,
that many multivariate techniques give reliable results when applied to ordinal data.
For many years the applications lagged behind the theory because the computations were beyond the power of the available desktop calculators. However, with
modern computers, virtually any analysis one desires, no matter how many variables


×