Tải bản đầy đủ (.pdf) (739 trang)

An introduction to multivariate s(bookfi) Anderson

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.3 MB, 739 trang )

An Introduction to
Multivariate Statistical Analysis
Third Edition

T. W. ANDERSON
Stanford University
Department of Statl'ltLc",
Stanford, CA

~WlLEY­
~INTERSCIENCE
A JOHN WILEY & SONS, INC., PUBLICATION


Copyright © 200J by John Wiley & Sons, Inc. All rights reserved.
Published by John Wih:y & Sons, lnc. Hohuken, Nl:W Jersey
PlIhlislu:d sinll.II;lI1collsly in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or hy any me'IllS, electronic, mechanical, photocopying, recording, scanning 0,' otherwise,
except as pClmit(ed under Section 107 or lOS or the 1Y7c> Uni!l:d States Copyright Act, without
either the prior writ'en permission of the Publisher, or al thorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive,
Danvers, MA (lIn], 'J7H-750-H400, fax 97R-750-4470, or on the weh 'It www.copyright,com,
Requests tf) the ,>ublisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (20n 748-6011, fax (20n
748-6008, e-mdil:
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
efforts in preparing this book, they make no representations or warranties with resped to
the accuracy or completeness of the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be
created or extended by sales representatives or written sales materials. The advice and


strategies contained herein may not be suitable for your situation. You should consult with
a professional where appropriate. Neither the publisher nor au'hor shall be liable for any
loss of profit or any other commercial damages, including but not limited to special,
incidental, consequential, or other damages.
For gl:nl:ral information on our othl:r products and sl:rvices pll:asl: contad our Customl:r
Care Department within the U.S. at 877-762-2974, outside the U.S, at 317-572-3993 or
fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print, however, may not he availllhie in electronic format.
Library of Congress Cataloging-in-Publication Data
Anderson, 1'. W. (Theodore Wilbur), IYI1:!An introduction to multivariate statistical analysis / Theodore W. Anderson.-- 3rd ed.
p. cm.-- (Wiley series in probability and mathematical statistics)
Includes hihliographical rekrcncc~ and indcx.
ISBN 0-471-36091-0 (cloth: acid-free paper)
1. Multivariate analysis. 1. Title. II. Series.
QA278.A516 2003
519.5'35--dc21
Printed in the United States of America
lOYH7654321

2002034317


To

DOROTHY



Contents


Preface to the Third Edition

xv

Preface to the Second Edition

xvii

Preface to the First Edition

xix

1 Introduction

1

1.1. Multivariate Statistical Analysis, 1
1.2. The Multivariate Normal Distribution, 3
2 The Multivariate Normal Distribution

6

2.1.
2.2.
2.3.
2.4.

Introduction, 6
Notions of Multivariate Distributions, 7

The Multivariate Normal Distribution, 13
The Distribution of Linear Combinations of Normally
Distributed Variates; Independence of Variates;
Marginal Distributions, 23
2.5. Conditional Distributions and Multiple Correlation
Coefficient, 33
2.6. The Characteristic Function; Moments, 41
2.7. Elliptically Contoured Distributions, 47
Problems, 56

3

Estimation of the Mean Vector and the Covariance Matrix

66

3.1. Introduction, 66
vii


CONTENTS

Vlli

3.2.

Tile Maximum LikelihoOll Estimators uf the Mean Vet:lor
and the Covariance Matrix, 67
3.3. The Distribution of the Sample Mean Vector; Inference
Concerning the Mean When the Covariance Matrix Is

Known, 74
3.4. Theoretical Properties of Estimators of the Mean
Vector, 83
3.5. Improved Estimation of the Mean, 91
3.6. Elliptically Contoured Distributions, 101
Problems, 108
4

The Distributions and Uses of Sample Correlation Coefficients
4.1.
4.2.
4.3.

115

rntroduction,

115
Currelation CoclTiciellt or a 13ivariate Sample, 116
Partial Correlation CoetTicients; Conditional
Di!'trihutions, 136
4.4. The MUltiple Correlation Codficient, 144
4.5. Elliptically Contoured Distributions, ] 58
Problems, I ()3

5

The Generalized T 2-Statistic
5. I.
5.2.


170

rntrod uction,

170
Derivation of the Generalized T 2-Statistic and Its
Distribution, 171
5.3. Uses of the T"-Statistic, 177
5.4. The Distribution of T2 under Alternative Hypotheses;
The Power Function, 185
5.5. The Two-Sample Problem with Unequal Covariance
Matrices, 187
5.6. Some Optimal Properties or the T 1-Test, 190
5.7. Elliptically Contoured Distributions, 199
Problems, 20 I

6

Classification of Observations
6.1.
6.2.
(>.3.

The Problem of Classification, 207
Standards of Good Classification, 208
Pro(;eOureJ.; or C1assiricatiun into One or Two Populations
with Known Probability Distributions, 2]]

207



CONTENTS

IX

6.4.

Classification into One of Two Known Multivariate Normal
Populations, 215
6.5. Classification into One of Two Multivariate Normal
Populations When the Parameters Are Estimated, 219
6.6. Probabilities of Misc1assification, 227
6.7. Classification into One of Several Populations, 233
6.8. Classification into One of Several Multivariate Normal
Populations, 237
6.9. An Example of Classification into One of Several
Multivariate Normal Populations, 240
6.10. Classification into One of Two Known Multivariate Normal
Populations with Unequal Covariance Matrices, 242
Problems, 248
7 The Distribution of the Sample Covarirnce Matrix and the
Sample Generalized Variance

251

7.1.
7.2.
7.3.
7.4.

7.5.
7.6.

Introduction, 251
The Wishart Distribution, 252
Some Properties of the Wishart Distribution, 258
Cochran's Theorem, 262
The Generalized Variance, 264
Distribution of the Set of Correlation Coefficients When
the Population Covariance Matrix Is Diagonal, 270
7.7. The Inverted Wishart Distribution and Bayes Estimation of
the Covariance Matrix, 272
7.8. Improved Estimation of the Covariance Matrix, 276
7.9. Elliptically Contoured Distributions, 282
PrOblems, 285

8 Testing the General Linear Hypothesis; Multivariate Analysis
of Variance
8.1.
8.2.

Introduction, 291
Estimators of Parameters in Multivariate Linear
Regl'ession, 292
8.3. Likelihood Ratio Criteria for Testing Linear Hypotheses
about Regression Coefficients, 298
g.4. The Distribution of the Likelihood Ratio Criterion When
the Hypothesis Is True, 304

291



x

CONTENTS
~.5.

An Asymptotic Expansion of the Distribution of the
Likelihood Ratio Criterion, 316
8.6. Other Criteria for Testing the Linear Hypothesis, 326
8.7. Tests of Hypotheses about Matrices of Regression
Coefficients and Confidence Regions, 337
8.8. Testing Equality of Means of Several Normal Distributions
with Common Covariance Matrix, 342
8.9. Multivariate Analysis of Variance, 346
8.10. Some Optimal Properties of Tests, 353
8.11. Elliptically Contoured Distributions, 370
Problems, 3""4

9 Testing Independence of Sets of Variates

381

9.1.
9.2.

I ntroductiom, 381
The Likelihood Ratio Criterion for Testing Independence
<,.)f Sets of Variates, 381
9.3. The Distribution of the Likelihood Ratio Criterion When

the Null Hypothesis Is True, 386
9.·t An Asymptotic Expansion of the Distribution of the
Likelihood Ratio Criterion, 390
9.5. Other Criteria, 391
9.6. Step-Down Procedures, 393
9.7. An Example, 396
9.S. The Case of Two Sets of Variates, 397
9.9. Admi~sibility of the Likelihood Ratio Test, 401
9.10. Monotonicity of Power Functions of Tests of
Independence of Set~, 402
9.11. Elliptically Contoured Distributions, 404
Problems, 408

10 Testing Hypotheses of Equality of Covariance Matrices and
Equality of Mean Vectors and Covariance Matrices
10.1.
10.2.
10.3.
10.4.

Introduction, 411
Criteria for Testing Equality of Several Covariance
Matrices, 412
Criteria for Testing That Several Normal Distributions
Are Identical, 415
Distributions of the Criteria, 417

411



CONTENTS

xi

10.5. Asymptotic EXpansions of the Distributions of the
Criteria, 424
10.6. The Case of Two Populations, 427
10.7. Testing the Hypothesis That a Covariance Matrix
Is Proportional to a Given Matrrix; The Sphericity
Test, 431
10.8. . Testing the Hypothesis That a Covariance Matrix Is
Equal to a Given Matrix, 438
10.9. Testing the Hypothesis That a Mean Vector and a
Covariance Matrix Are Equal to a Given Vector ann
Matrix, 444
10.10. Admissibility of Tests, 446
10.11. Elliptically Contoured Distributions, 449
Problems, 454
11 Principal Components

459

11.1. Introduction, 459
11.2. Definition of Principal Components in the
Populat10n, 460
11.3. Maximum Likelihood Estimators of the Principal
Components and Their Variances, 467
11.4. Computation of the Maximum Likelihood Estimates of
the Principal Components, 469
11.5. An Example, 471

11.6. Statistical Inference, 473
11.7. Testing Hypotheses about the Characteristic Roots of a
Covariance Matrix, 478
11.8. Elliptically Contoured Distributions, 482
Problems, 483
12 Canonical Correlations and Canonical Variables

12.1. Introduction, 487
12.2. Canonical Correlations and Variates in the
Population, 488
12.3. Estimation of Canonical Correlations and Variates, 498
12.4. Statistical Inference, 503
12.5. An Example, 505
12.6. Linearly Related Expected Values, 508

487


Xli

CONTENTS

12.7. Reduced Rank Regression, 514
12.8. Simultaneous Equations Models, 515
Problems, 526
13 The Distributions of Characteristic Roots and Vectors

528

13.1.

13.2.
13.3.
13.4.
13.5.

Introduction, 528
The Case of Two Wishart Matrices, 529
The Case of One Nonsingular Wishart Matrix, 538
Canonical Correlations, 543
Asymptotic Distributions in the Case of One Wishart
Matrix, 545
13.6. Asymptotic Distributions in the Case of Two Wishart
Matrices, 549
13.7. Asymptotic Distribution in a Regression Model, 555
13.S. Elliptically Contoured Distributions, 563
Problems, 567

14 Factor Analysis

569

14.1. Introduction, 569
14.2. The Model, 570
14.3. Maximum Likelihood Estimators for Random
Oithogonal Factors, 576
14.4. Estimation for Fixed Factors, 586
14.5. Factor Interpretation and Transformation, 587
14.6. Estimation for Identification by Specified Zeros, 590
14.7. Estimation of Factor Scores, 591
Problems, 593

15 Patterns of Dependence; Graphical Models

595

15.1. Introduction, 595
15.2. Undirected Graphs, 596
15.3. Directed Graphs, 604
15.4. Chain Graphs, 610
15.5. Statistical Inference, 613
Appendix A Matrix Theory

A.I.
A.2.

Definition of a Matrix and Operations on Matrices, 624
Characteristic Roots and Vectors, 631

624


Xiii

CONTENTS

A.3.
A.4.
A.5.

Partitioned Vectors and Matrices, 635
Some Miscellaneous Results, 639

Gram-Schmidt Orthogonalization and the Soll1tion of
Linear Equations, 647

Appendix B Tables
B.1.

B.2.
B.3.
B.4.
B.5.

B.6.
B.7.

651

Wilks' Likelihood Criterion: Factors C(p, m, M) to
Adjust to X;.m' where M = n - p + 1, 651
Tables of Significance Points for the Lawley-Hotelling
Trace Test, 657
Tables of Significance Points for the
.Bartlett-Nanda-Pillai Trace Test, 673
Tables of Significance Points for the Roy Maximum Root
Test, 677
Significance Points for the Modified Likelihood Ratio
Test of Equality of Covariance Matrices Based on Equal
Sample Sizes, 681
Correction Factors for Significance Points for the
Sphericity Test, 683
Significance Points for the Modified Likelihood Ratio

Test "I = "I o, 685

References

687

Index

713



Preface to the Third Edition

For some forty years the first and second editions of this book have been
used by students to acquire a basic knowledge of the theory and methods of
multivariate statistical analysis. The book has also served a wider community
of stati~ticians in furthering their understanding and proficiency in this field.
Since the second edition was published, multivariate analysis has been
developed and extended in many directions. Rather than attempting to cover,
or even survey, the enlarged scope, I have elected to elucidate several aspects
that are particularly interesting and useful for methodology and comprehension.
Earlier editions included some methods that could be carried out on an
adding machine! In the twenty-first century, however, computational techniques have become so highly developed and improvements come so rapidly
that it is impossible to include all of the relevant methods in a volume on the
general mathematical theory. Some aspects of statistics exploit computational
power such as the resampling technologies; these are not covered here.
The definition of multivariate statistics implies the treatment of variables
that are interrelated. Several chapters are devoted to measures of correlation
and tests of independence. A new chapter, "Patterns of Dependence; Graphical Models" has been added. A so-called graphical model is a set of vertices

Or nodes identifying observed variables together with a new set of edges
suggesting dependences between variables. The algebra of such graphs is an
outgrowth and development of path analysis and the study of causal chains.
A graph may represent a sequence in time or logic and may suggest causation
of one set of variables by another set.
Another new topic systematically presented in the third edition is that of
elliptically contoured distributions. The multivariate normal distribution,
which is characterized by the mean vector and covariance matrix, has a
limitation that the fourth-order moments of the variables are determined by
the first- and second-order moments. The class .of elliptically contoured
xv


xvi

PREFACE TO THE THIRD EDITION

distribution relaxes this restriction. A density in this class has contours of
equal density which are ellipsoids as does a normal density, but the set of
fourth-order moments has one further degree of freedom. This topic is
expounded by the addition of sections to appropriate chapters.
Reduced rank regression developed in Chapters 12 and 13 provides a
method of reducing the number of regression coefficients to be estimated in
the regression of one set of variables to another. This approach includes the
limited-information maximum-likelihood estimator of an equation in a simultaneous equations model.
The preparation of the third edition has been benefited by advice and
comments of readers of the first and second editions as well as by reviewers
of the current revision. In addition to readers of the earlier editions listed in
those prefaces I want to thank Michael Perlman and Kathy Richards for their
assistance in getting this manuscript ready.


T. W.
Stanford, California
February 2003

ANDERSON


Preface to the Second Edition

Twenty-six years have plssed since the first edition of this book was published. During that tim~ great advances have been made in multivariate
statistical analysis-particularly in the areas treated in that volume. This new
edition purports to bring the original edition up to date by substantial
revision, rewriting, and additions. The basic approach has been maintained,
llamely, a mathematically rigorous development of statistical methods for
observations consisting of several measurements or characteristics of each
sUbject and a study of their properties. The general outline of topics has been
retained.
The method of maximum likelihood has been augmented by other considerations. In point estimation of the mf"an vectOr and covariance matrix
alternatives to the maximum likelihood estimators that are better with
respect to certain loss functions, such as Stein and Bayes estimators, have
been introduced. In testing hypotheses likelihood ratio tests have been
supplemented by other invariant procedures. New results on distributions
and asymptotic distributions are given; some significant points are tabulated.
Properties of these procedures, such as power functions, admissibility, unbiasedness, and monotonicity of power functions, are studied. Simultaneous
confidence intervals for means and covariances are developed. A chapter on
factor analysis replaces the chapter sketching miscellaneous results in the
first edition. Some new topics, including simultaneous equations models and
linear functional relationships, are introduced. Additional problems present
further results.

It is impossible to cover all relevant material in this book~ what seems
most important has been included. FOr a comprehensive listing of papers
until 1966 and books until 1970 the reader is referred to A Bibliography of
Multivariate Statistical Analysis by Anderson, Das Gupta, and Styan (1972).
Further references can be found in Multivariate Analysis: A Selected and
xvii


xvIH

PREFACE TO THE SECOND EDITION

Abstracted Bibliography, 1957-1972 by Subrahmaniam and Subrahmaniam
(973).
I am in debt to many students, colleagues, and friends for their suggestions
and assistance; they include Yasuo Amemiya, James Berger, Byoung-Seon
Choi. Arthur Cohen, Margery Cruise, Somesh Das Gupta, Kai-Tai Fang,
Gene Golub. Aaron Han, Takeshi Hayakawa, Jogi Henna, Huang Hsu, Fred
Huffer, Mituaki Huzii, Jack Kiefer, Mark Knowles, Sue Leurgans, Alex
McMillan, Masashi No, Ingram Olkin, Kartik Patel, Michael Perlman, Allen
Sampson. Ashis Sen Gupta, Andrew Siegel, Charles Stein, Patrick Strout,
Akimichi Takemura, Joe Verducci, MarIos Viana, and Y. Yajima. I was
helped in preparing the manuscript by Dorothy Anderson, Alice Lundin,
Amy Schwartz, and Pat Struse. Special thanks go to Johanne Thiffault and
George P. H, Styan for their precise attention. Support was contributed by
the Army Research Office, the National Science Foundation, the Office of
Naval Research, and IBM Systems Research Institute.
Seven tables of significance points are given in Appendix B to facilitate
carrying out test procedures. Tables 1, 5, and 7 are Tables 47, 50, and 53,
respectively, of Biometrika Tables for Statisticians, Vol. 2, by E. S. Pearson

and H. 0, Hartley; permission of the Biometrika Trustees is hereby acknowledged. Table 2 is made up from three tables prepared by A. W. Davis and
published in Biometrika (1970a), Annals of the Institute of Statistical Mathematics (1970b) and Communications in Statistics, B. Simulation and Computation (1980). Tables 3 and 4 are Tables 6.3 and 6.4, respectively, of Concise
Stalistical Tables, edited by Ziro Yamauti (1977) and published by the
Japanese Stamlards Alisociation; this book is a concise version of Statistical
Tables and Formulas with Computer Applications, JSA-1972. Table 6 is Table 3
of The Distribution of the Sphericity Test Criterion, ARL 72-0154, by B. N.
Nagarscnkcr and K. C. S. Pillai, Aerospacc Research Laboratorics (1972).
The author is indebted to the authors and publishers listed above for
permission to reproduce these tables.

T. W.
SIanford. California
June 1984

ANDERSON


Preface to the First Edition

This book has been designed primarily as a text for a two-semester course in
multivariate statistics. It is hoped that the book will also serve as an
introduction to many topics in this area to statisticians who are not students
and will be used as a reference by other statisticians.
For several years the book in the form of dittoed notes has been used in a
two-semester sequence of graduate courses at Columbia University; the first
six chapters constituted the text for the first semester, emphasizing correlation theory. It is assumed that the reader is familiar with the usual theory of
univariate statistics, particularly methods based on the univariate normal
distribution. A knowledge of matrix algebra is also a prerequisite; however,
an appendix on this topic has been included.
It is hoped that the more basic and important topics are treated here,

though to some extent the coverage is a matter of taste. Some 0f the mOre
recent and advanced developments are only briefly touched on in the late
chapter.
The method of maximum likelihood is used to a large extent. This leads to
reasonable procedures; in some cases it can be proved that they are optimal.
In many situations, however, the theory of desirable or optimum procedures
is lacking.
Over the years this manuscript has been developed, a number of students
and colleagues have been of considerable assistance. Allan Birnbaum, Harold
Hotelling, Jacob Horowitz, Howard Levene, Ingram OIkin, Gobind Seth,
Charles Stein, and Henry Teicher are to be mentioned particularly. Acknowledgements are also due to other members of the Graduate Mathematical

xix


xx

PREFACE TO THE FIRST EDITION

Statistics Society at Columbia University for aid in the preparation of the
manuscript in dittoed form. The preparation of this manuscript was supported in part by the Office of Naval Research.

T. W.
Center for Advanced Study
in the Behavioral Sciences
Stanford, California
December 1957

ANDERSON



CHAPTER 1

Introduction

1.1. MULTIVARIATE STATISTICAL ANALYSIS
Multivariate statistical analysis is concerned with data that consist of sets of
measurements on a number of individuals or objects. The sample data may
be heights an~ weights of some individuals drawn randomly from a population of school children in a given city, or the statistical treatment may be
made on a collection of measurements, such as lengths and widths of petals
and lengths and widths of sepals of iris plants taken from two species, or one
may study the scores on batteries of mental tests administered to a number of
students.
The measurements made on a single individual can be assembled into a
column vector. We think of the entire vector as an observation from a
multivariate population or distribution. When the individual is drawn randomly, we consider the vector as a random vector with a distribution or
probability law describing that population. The set of observations on all
individuals in a sample constitutes a sample of vectors, and the vectors set
side by side make up the matrix of observations. t The data to be analyzed
then are thought of as displayed in a matrix or in several matrices.
We shall see that it is helpful in visualizing the data and understanding the
methods to think of each observation vector as constituting a point in a
Euclidean space, each coordinate corresponding to a measurement or variable. Indeed, an early step in the statistical analysis is plotting the data; since
tWhen data are listed on paper by individual, it is natural to print the measurements on one
individual as a row of the table; then one individual corresponds to a row vector. Since we prefer
to operate algebraically with column vectors, we have chosen to treat observations in terms of
column vectors. (In practice, the basic data set may weD be on cards, tapes, or di.sks.)

An Introductihn to MuItiuanate Statistical Analysis, Third Edmon. By T. W. Anderson
ISBN 0-471-36091-0 Copyright © 2003 John Wiley & Sons. Inc.


1


INTRODUCTION

most statisticians are limited to two-dimensional plots, two coordinates of the
observation are plotted in turn.
Characteristics of a univariate distribution of essential interest are the
mean as a measure of location and the standard deviation as a measure of
variability; similarly the mean and standard deviation of a univariate sample
are important summary measures. In multivariate analysis, the means and
variances of the separate measurements-for distributions and for samples
-have corresponding relevance. An essential aspect, however, of multivariate analysis is the dependence between the different variables. The dependence between two variables may involve the covariance between them, that
is, the average products of their deviations from their respective means. The
covariance standardized by the corresponding standard deviations is the
correlation coefficient; it serves as a measure of degree of depend~nce. A set
of summary statistics is the mean vector (consisting of the univariate means)
and the covariance matrix (consisting of the univariate variances and bivariate covariances). An alternative set of summary statistics with the same
information is the mean vector, the set of' standard deviations, and the
correlation matrix. Similar parameter quantities describe location, variability,
and dependence in the population or for a probability distribution. The
multivariate nonnal distribution is completely determined by its mean vector
and covariance matrix~ and the sample mean vector and covariance matrix
constitute a sufficient set of statistics.
The measurement and analysis of dependence between variables~ between
sets of variables, and between variables and sets of variables are fundamental
to multivariate analysis. The multiple correlation coefficient is an extension
of the notion of correlation to the relationship of one variable to a set of
variables. The partial correlation coefficient is a measure of dependence

between two variables when the effects of other correlated variables have
been removed. The various correlation coefficients computed from samples
are used to estimate corresponding correlation coefficientS of distributions.
In this hook tests or hypothe~es or independence are developed. The properties of the estimators and test proredures are studied for sampling from the
multivariate normal distribution.
A number of statistical problems arising in multivariate populations are
straightforward analogs of problems arising in univariate populations; the
suitable methods for handling these problems are similarly related. For
example, ill the univariate case we may wish to test the hypothesis that the
mean of a variable is zero; in the multivariate case we may wish to test the
hypothesis that the vector of the means of several variables is the zero vector.
The analog of the Student t-test for the first hypOthesis is the generalized
T 2-test. The analysis of variance of a single variable is adapted to vector


1.2 THE ML"LTIVARIATE NORMAL DISTRIBUTION

3

observations; in regression analysis, the dependent quantity may be a vector
variable. A comparison of variances is generalized into a comparison of
covariance matrices.
The test procedures of univariate statistics are generalized to the multivariate case in such ways that the dependence between variables is taken into
account. These methods may not depend on the coordinate system; that is,
the procedures may be invariant with respect to linear transformations that
leave the nUll. hypothesis invariant. In some problems there may be families
of tests that are invariant; then choices must be made. Optimal properties of
the tests are considered.
For some other purposes, however, it may be important to select a
coordinate system so that the variates have desired statistical properties. One

might say that they involve characterizations of inherent properties of normal
distributions and of samples. These are closely related to the algebraic
problems of canonical forms of matrices. An example is finding the normalized linear combination of variables with maximum or minimum variance
(finding principal components); this amounts to finding a rotation of axes
that carries the covariance matrix to diagonal form. Another example is
characterizing the dependence between two sets of variates (finding canonical correlations). These problems involve the characteristic roots and vectors
of various matrices. The statistical properties of the corresponding sample
quantities are treated.
Some statistical problems arise in models in which means and covariances
are restricted. Factor analysis may be based on a model with a (population)
covariance matrix that is the sum of a positive definite diagonal matrix and a
positive semidefinite matrix of low rank; linear str Jctural relationships may
have a Similar formulation. The simultaneous equations system of econometrics is another example of a special model.

1.2. mE MULTIVARlATE NORMAL DISTRIBUTION
The statistical methods treated in this book can be developed and evaluated
in the context of the multivariate normal distribution, though many of the
procedures are useful and effective when the distribution sampled is not
normal. A major reason for basing statistical analysis on the normal distribution is that this probabilistic model approximates well the distribution of
continuous measurements in many sampled popUlations. In fact, most of the
methods and theory have been developed to serve statistical analysis of data.
Mathematicians such as Adrian (1808), Laplace (1811), Plana (1813), Gauss


4

INTRODUCTION

(1823), and Bravais (1846) l:tudicd the bivariate normal density. Francis


Galton, th.! 3eneticist, introduced the ideas of correlation, regression, and
homoscedasticity in the study ·of pairs of measurements, one made on a
parent and OTJ~ in an offspring. [See, e.g., Galton (1889).] He enunciated the
theory of the multivariate normal distribution as a generalization of obsetved
properties of s2mples.
Karl Pearson and others carried on the development of the theory and use
of differe'lt kinds of correlation coefficients t for studying proble.ns in genetics, biology, and other fields. R. A. Fisher further developed methods for
agriculture, botany, and anthropology, including the discriminant function for
classification problems. In another direction, analysis of scores 01 mental
tests led to a theory, including factor analysis, the sampling theory of which is
based on the normal distribution. In these cases, as well as in agricultural
experiments, in engineering problems, in certain economic problems, and in
other fields, the multivariate normal distributions have been found to be
sufficiently close approximations to the populations so that statistical analyses based on these models are justified.
The univariate normal distribution arises frequently because the effect
studied is the sum of many independent random effects. Similarly, the
multivariate normal distribution often occurs because the multiple meaSUrements are sums of small independent effects. Just as the central limit
theorem leads to the univariate normal distrL>ution for single variables, so
does the general central limit theorem for several variables lead to the
multivariate normal distribution.
Statistical theory based on the normal distribution has the advantage that
the multivariate methods based on it are extensively developed and can be
studied in an organized and systematic way. This is due not only to the need
for such methods because they are of practical US,!, but also to the fact that
normal theory is amenable to exact mathematical treatment. The 'suitable
methods of analysis are mainly based on standard operations of matrix.
algebra; the distributions of many statistics involved can be obtained exactly
or at least characterized; and in many cases optimum properties of procedures can be deduced.
The point of view in this book is to state problems of inference in terms of
the multivariate normal distributions, develop efficient and often optimum

methods in this context, and evaluate significance and confidence levels in
these terms. This approach gives coherence and rigor to the exposition, but,
by its very nature, cannot exhaust consideration of multivariate &tUistical
analysis. The procedures are appropriate to many nonnormal distributions,
f For

a detailed study of the development of the ideas of correlation, see Walker (1931),


1.2 THE MULTIVARIATE NORMAL DISTRIBUTION

s

but their adequacy may be open to question. Roughly speaking, inferences
about means are robust because of the operation of the central limit
theorem~ but inferences about covariances are sensitive to normality, the
variability of sample covariances depending on fourth-order moments.
This inflexibility of normal methods with respect to moments of order
greater than two can be reduced by including a larger class of elliptically
contoured distributions. In the univariate case the normal distribution is
determined by the mean and variance; higher-order moments and properties
such as peakedness and long tails are functions of the mean and variance.
Similarly, in the multivariate case the means and covariances or the means,
variances, and correlations determine all of the properties of the distribution.
That limitation is alleviated in one respect by consideration of a broad class
of elliptically contoured distributions. That class maintains the dependence
structure, but permits more general peakedness and long tails. This study
leads to more robust methods.
The development of computer technology has revolutionized multivariate
statistics in several respects. As in univariate statistics, modern computers

permit the evaluation of obsetved variability and significance of results by
resampling methods, such as the bootstrap and cross-validation. Such
methodology reduces the reliance on tables of significance points as well as
eliminates some restrictions of the normal distribution.
Nonparametric techniques are available when nothing is known about the
underlying distributions. Space does not permit inclusion of these topics as
well as o,\her considerations of data analysis, such as treatment of outliers
a.n?Jransformations of variables to approximate normality and homoscedastIClty.

The availability of modem computer facilities makes possible the analysis
of large data sets and that ability permits the application of multivariate
methods to new areas, such as image analysis, and more effective a.nalysis of
data, such as meteorological. Moreover, new problems of statistical analysis
arise, such as sparseness of parameter Or data matrices. Because hardware
and software development is so explosive and programs require specialized
knowledge, we are content to make a few remarks here and there about
computation. Packages of statistical programs are available for most of the
methods.


CHAPTER 2

The Multivariate
Normal Distribution

2.1. INTRODUCTION
In this chapter we discuss the multivariate normal distribution and some of
its properties. In Section 2.2 are considered the fundamental notions of
multivariate distributions: the definition by means of multivariate density
functions, marginal distributions, conditional distributions, expected values,

and moments. In Section 2.3 tht multivariate normal distribution is defined;
the parameters are shown to be the means, variances, and covariances or the
means, variances, and correlations of the components of the random vector.
In Section 2.4 it is shown that linear combinations of normal variables are
normally distributed and hence that marginal distributions are normal. In
Section 2.5 we see that conditional distributions are also normal with means
that are linear functions of the conditioning variables; the coefficients are
regression coefficients. The variances, covariances, and correlations-called
partial correlations-are constants. The multiple correlation coefficient is
the maximum correlation between a scalar random variable and linear
combination of other random variables; it is a measure of association between one variable and a set of others. The fact that marginal and conditional distributions of normal distributions are normal makes the treatment
of this family of di~tribution~ coherent. In Section 2.6 the characteristic
function, moments, and cumulants are discussed. In Section 2.7 elliptically
contoured distributions are defined; the properties of the normal distribution
arc extended to this Iarger cla~s of distributions .

. 41/ Illlrodl/(lIlll/ 10 Mull/l!analc Siulisl/cal Al/lIIYM~. Hllrd c;dillOll.

ISBN 0-471-36091-0 Copyright © 2003 John Wiley & Sons, Inc.

6

By T. W. Anderson


2.2

NOTIONS OF MULTIVARIATE DISTRIBUTIONS

7


2.2. NOTIONS OF MULTIVARIATE DISTRIBUTIONS
2.2.1. Joint Distributions
In this section we shall consider the notions of joint distributions of several
derived marginal distributions of subsets of variables, and derived
conditional distributions. First consider the case of two (real) random
variables t X and Y. Probabilities of events defined in terms of these variables
can be obtained by operations involving the cumulative distribution function
(abbrevialed as cdf),

variable~,

F(x,y) =Pr{X~x, Y~y},

(1)

defined for every pair of real numbers (x, y). We are interested in cases
where F(x, y) is absolutely continuous;· this means that the following partial
derivative exists almost everywhere:
a 2 F(x, y) _
a-~ay
-f(x,y),

(2)
and

(3)

F(x,y) =


f

y

x

f

-00

f(u,v) dudv.
-00

The nonnegative function f(x, y) is called the density of X and Y. The pair
of random variables ex, Y) defines a random point in a plane. The probability that (X, Y) falls in a rectangle is
(4)

Pr{x~X~x+6.x,y~Y~y+6.y}

""F(x+6.x,y+6.y) -F(x+6.x,y) -F(x,y+6.y) +F(x,y)
=

f(u,v) dudv
jYY+6 Y j.l+6X
x

(6.x> 0, 6.y> 0). The probability of the random point (X, Y) falling in any
set E for which the following int.!gral is defined (that is, any measurable set
E) is


(5)

Pr{(X,Y)EE}

=

f f/(x,y)dxdy .

tIn Chapter 2 we shall distinguish between random variables and running variables by use of
capital and lowercase letters, respectively. In later chapters we may be unable to hold to this
convention because of other complications of notation.


×