Tải bản đầy đủ (.pdf) (814 trang)

keenan a pituch, james p stevens applied multivariate statistics for the social sciences analyses with SAS and IBMs SPSS routledge (2015)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.19 MB, 814 trang )


APPLIED MULTIVARIATE STATISTICS
FOR THE SOCIAL SCIENCES

Now in its 6th edition, the authoritative textbook Applied Multivariate Statistics for
the Social Sciences, continues to provide advanced students with a practical and conceptual understanding of statistical procedures through examples and data-sets from
actual research studies. With the added expertise of co-author Keenan Pituch (University of Texas-Austin), this 6th edition retains many key features of the previous editions, including its breadth and depth of coverage, a review chapter on matrix algebra,
applied coverage of MANOVA, and emphasis on statistical power. In this new edition,
the authors continue to provide practical guidelines for checking the data, assessing
assumptions, interpreting, and reporting the results to help students analyze data from
their own research confidently and professionally.
Features new to this edition include:
 NEW chapter on Logistic Regression (Ch. 11) that helps readers understand and
use this very flexible and widely used procedure
 NEW chapter on Multivariate Multilevel Modeling (Ch. 14) that helps readers
understand the benefits of this “newer” procedure and how it can be used in conventional and multilevel settings
 NEW Example Results Section write-ups that illustrate how results should be presented in research papers and journal articles
 NEW coverage of missing data (Ch. 1) to help students understand and address
problems associated with incomplete data
 Completely re-written chapters on Exploratory Factor Analysis (Ch. 9), Hierarchical Linear Modeling (Ch. 13), and Structural Equation Modeling (Ch. 16) with
increased focus on understanding models and interpreting results
 NEW analysis summaries, inclusion of more syntax explanations, and reduction
in the number of SPSS/SAS dialogue boxes to guide students through data analysis in a more streamlined and direct approach
 Updated syntax to reflect newest versions of IBM SPSS (21) /SAS (9.3)


 A free online resources site www.routledge.com/9780415836661 with data sets
and syntax from the text, additional data sets, and instructor’s resources (including
PowerPoint lecture slides for select chapters, a conversion guide for 5th edition
adopters, and answers to exercises).
Ideal for advanced graduate-level courses in education, psychology, and other social


sciences in which multivariate statistics, advanced statistics, or quantitative techniques
courses are taught, this book also appeals to practicing researchers as a valuable reference. Pre-requisites include a course on factorial ANOVA and covariance; however, a
working knowledge of matrix algebra is not assumed.
Keenan Pituch is Associate Professor in the Quantitative Methods Area of the Department of Educational Psychology at the University of Texas at Austin.
James P. Stevens is Professor Emeritus at the University of Cincinnati.


APPLIED MULTIVARIATE
STATISTICS FOR THE
SOCIAL SCIENCES
Analyses with SAS and
IBM‘s SPSS
Sixth edition

Keenan A. Pituch and James P. Stevens


Sixth edition published 2016

by Routledge
711 Third Avenue, New York, NY 10017
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor€& Francis Group, an informa business
© 2016 Taylor€& Francis

The right of Keenan A. Pituch and James P. Stevens to be identified as authors of this work has
been asserted by them in accordance with sections€77 and 78 of the Copyright, Designs and Patents
Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form

or by any electronic, mechanical, or other means, now known or hereafter invented, including
photocopying and recording, or in any information storage or retrieval system, without permission
in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Fifth edition published by Routledge 2009
Library of Congress Cataloging-in-Publication Data
Pituch, Keenan A.
â•… Applied multivariate statistics for the social sciences / Keenan A. Pituch and James
P. Stevens –– 6th edition.
â•…â•…pages cm
â•… Previous edition by James P. Stevens.
â•… Includes index.
╇1.╇ Multivariate analysis.â•… 2.╇ Social sciences––Statistical methods.â•… I.╇ Stevens, James (James
Paul)╅II.╇ Title.
â•… QA278.S74 2015
â•… 519.5'350243––dc23
â•… 2015017536
ISBN 13: 978-0-415-83666-1(pbk)
ISBN 13: 978-0-415-83665-4(hbk)
ISBN 13: 978-1-315-81491-9(ebk)
Typeset in Times New Roman
by Apex CoVantage, LLC
Commissioning Editor: Debra Riegert
Textbook Development Manager: Rebecca Pearce
Project Manager: Sheri Sipka
Production Editor: Alf Symons
Cover Design: Nigel Turner
Companion Website Manager: Natalya Dyer
Copyeditor: Apex CoVantage, LLC



Keenan would like to dedicate this:
To his Wife: Elizabeth and
To his Children: Joseph and Alexis
Jim would like to dedicate this:
To his Grandsons: Henry and Killian and
To his Granddaughter: Fallon


This page intentionally left blank


CONTENTS

Preface

xv

1.Introduction
1.1Introduction
1.2 Type I€Error, Type II Error, and Power
1.3 Multiple Statistical Tests and the Probability
of Spurious Results
1.4 Statistical Significance Versus Practical Importance
1.5 Outliers
1.6 Missing Data
1.7 Unit or Participant Nonresponse
1.8 Research Examples for Some Analyses
Considered in This Text

1.9 The SAS and SPSS Statistical Packages
1.10 SAS and SPSS Syntax
1.11 SAS and SPSS Syntax and Data Sets on the Internet
1.12 Some Issues Unique to Multivariate Analysis
1.13 Data Collection and Integrity
1.14 Internal and External Validity
1.15 Conflict of Interest
1.16Summary
1.17Exercises
2.

Matrix Algebra
2.1 Introduction
2.2 Addition, Subtraction, and Multiplication of a
Matrix by a Scalar
2.3 Obtaining the Matrix of Variances and Covariances
2.4 Determinant of a Matrix
2.5 Inverse of a Matrix
2.6 SPSS Matrix Procedure

1
1
3
6
10
12
18
31
32
35

35
36
36
37
39
40
40
41
44
44
47
50
52
55
58


viii

↜渀屮

↜渀屮 Contents

2.7
2.8
2.9
3.

4.


5.

SAS IML Procedure
Summary
Exercises

Multiple Regression for Prediction
3.1Introduction
3.2 Simple Regression
3.3 Multiple Regression for Two Predictors: Matrix Formulation
3.4 Mathematical Maximization Nature of
Least Squares Regression
3.5 Breakdown of Sum of Squares and F Test for
Multiple Correlation
3.6 Relationship of Simple Correlations to Multiple Correlation
3.7Multicollinearity
3.8 Model Selection
3.9 Two Computer Examples
3.10 Checking Assumptions for the Regression Model
3.11 Model Validation
3.12 Importance of the Order of the Predictors
3.13 Other Important Issues
3.14 Outliers and Influential Data Points
3.15 Further Discussion of the Two Computer Examples
3.16 Sample Size Determination for a Reliable Prediction Equation
3.17 Other Types of Regression Analysis
3.18 Multivariate Regression
3.19 Summary
3.20 Exercises


60
61
61
65
65
67
69
72
73
75
75
77
82
93
96
101
104
107
116
121
124
124
128
129

Two-Group Multivariate Analysis of Variance
4.1 Introduction
4.2 Four Statistical Reasons for Preferring a Multivariate Analysis
4.3 The Multivariate Test Statistic as a Generalization of
the Univariate t Test

4.4 Numerical Calculations for a Two-Group Problem
4.5Three Post Hoc Procedures
4.6 SAS and SPSS Control Lines for Sample Problem
and Selected Output
4.7 Multivariate Significance but No Univariate Significance
4.8 Multivariate Regression Analysis for the Sample Problem
4.9 Power Analysis
4.10 Ways of Improving Power
4.11 A Priori Power Estimation for a Two-Group MANOVA
4.12 Summary
4.13Exercises

142
142
143

K-Group MANOVA: A Priori and Post Hoc Procedures
5.1Introduction

175
175

144
146
150
152
156
156
161
163

165
169
170


Contents

5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
6.

7.

Multivariate Regression Analysis for a Sample Problem
Traditional Multivariate Analysis of Variance
Multivariate Analysis of Variance for Sample Data
Post Hoc Procedures

The Tukey Procedure
Planned Comparisons
Test Statistics for Planned Comparisons
Multivariate Planned Comparisons on SPSS MANOVA
Correlated Contrasts
Studies Using Multivariate Planned Comparisons
Other Multivariate Test Statistics
How Many Dependent Variables for a MANOVA?
Power Analysis—A Priori Determination of Sample Size
Summary
Exercises

↜渀屮

↜渀屮

176
177
179
184
187
193
196
198
204
208
210
211
211
213

214

Assumptions in MANOVA
6.1 Introduction
6.2 ANOVA and MANOVA Assumptions
6.3 Independence Assumption
6.4 What Should Be Done With Correlated Observations?
6.5 Normality Assumption
6.6 Multivariate Normality
6.7 Assessing the Normality Assumption
6.8 Homogeneity of Variance Assumption
6.9 Homogeneity of the Covariance Matrices
6.10 Summary
6.11 Complete Three-Group MANOVA Example
6.12 Example Results Section for One-Way MANOVA
6.13 Analysis Summary
Appendix 6.1 Analyzing Correlated Observations
Appendix 6.2 Multivariate Test Statistics for Unequal
Covariance Matrices
6.14 Exercises

219
219
220
220
222
224
225
226
232

233
240
242
249
250
255

Factorial ANOVA and MANOVA
7.1 Introduction
7.2 Advantages of a Two-Way Design
7.3 Univariate Factorial Analysis
7.4 Factorial Multivariate Analysis of Variance
7.5 Weighting of the Cell Means
7.6 Analysis Procedures for Two-Way MANOVA
7.7 Factorial MANOVA With SeniorWISE Data
7.8 Example Results Section for Factorial MANOVA With
SeniorWise Data
7.9 Three-Way MANOVA

265
265
266
268
277
280
280
281

259
262


290
292

ix


x

↜渀屮

↜渀屮 Contents

7.10 Factorial Descriptive Discriminant Analysis
7.11 Summary
7.12 Exercises

294
298
299

8.

Analysis of Covariance
301
8.1Introduction
301
8.2 Purposes of ANCOVA
302
8.3 Adjustment of Posttest Means and Reduction of Error Variance 303

8.4 Choice of Covariates
307
8.5 Assumptions in Analysis of Covariance
308
8.6 Use of ANCOVA With Intact Groups
311
8.7 Alternative Analyses for Pretest–Posttest Designs
312
8.8 Error Reduction and Adjustment of Posttest Means for
Several Covariates
314
8.9 MANCOVA—Several Dependent Variables and
315
Several Covariates
8.10 Testing the Assumption of Homogeneous
Hyperplanes on SPSS
316
8.11 Effect Size Measures for Group Comparisons in
MANCOVA/ANCOVA317
8.12 Two Computer Examples
318
8.13 Note on Post Hoc Procedures
329
8.14 Note on the Use of MVMM
330
8.15 Example Results Section for MANCOVA
330
8.16 Summary
332
8.17 Analysis Summary

333
8.18Exercises
335

9.

Exploratory Factor Analysis
339
9.1Introduction
339
9.2 The Principal Components Method
340
9.3 Criteria for Determining How Many Factors to Retain
Using Principal Components Extraction
342
9.4 Increasing Interpretability of Factors by Rotation
344
9.5 What Coefficients Should Be Used for Interpretation?
346
9.6 Sample Size and Reliable Factors
347
9.7 Some Simple Factor Analyses Using Principal
Components Extraction
347
9.8 The Communality Issue
359
9.9 The Factor Analysis Model
360
9.10 Assumptions for Common Factor Analysis
362

9.11 Determining How Many Factors Are Present With
364
Principal Axis Factoring
9.12 Exploratory Factor Analysis Example With Principal Axis
Factoring365
9.13 Factor Scores
373


Contents

10.

11.

↜渀屮

↜渀屮

9.14
9.15
9.16
9.17

Using SPSS in Factor Analysis
Using SAS in Factor Analysis
Exploratory and Confirmatory Factor Analysis
Example Results Section for EFA of Reactions-toTests Scale
9.18Summary
9.19Exercises


376
378
382

Discriminant Analysis
10.1Introduction
10.2 Descriptive Discriminant Analysis
10.3 Dimension Reduction Analysis
10.4 Interpreting the Discriminant Functions
10.5 Minimum Sample Size
10.6 Graphing the Groups in the Discriminant Plane
10.7 Example With SeniorWISE Data
10.8 National Merit Scholar Example
10.9 Rotation of the Discriminant Functions
10.10 Stepwise Discriminant Analysis
10.11 The Classification Problem
10.12 Linear Versus Quadratic Classification Rule
10.13 Characteristics of a Good Classification Procedure
10.14 Analysis Summary of Descriptive Discriminant Analysis
10.15 Example Results Section for Discriminant Analysis of the
National Merit Scholar Example
10.16 Summary
10.17 Exercises

391
391
392
393
395

396
397
398
409
415
415
416
425
425
426

Binary Logistic Regression
11.1Introduction
11.2 The Research Example
11.3 Problems With Linear Regression Analysis
11.4 Transformations and the Odds Ratio With a
Dichotomous Explanatory Variable
11.5 The Logistic Regression Equation With a Single
Dichotomous Explanatory Variable
11.6 The Logistic Regression Equation With a Single
Continuous Explanatory Variable
11.7 Logistic Regression as a Generalized Linear Model
11.8 Parameter Estimation
11.9 Significance Test for the Entire Model and Sets of Variables
11.10 McFadden’s Pseudo R-Square for Strength of Association
11.11 Significance Tests and Confidence Intervals for
Single Variables
11.12 Preliminary Analysis
11.13 Residuals and Influence


434
434
435
436

383
385
387

427
429
429

438
442
443
444
445
447
448
450
451
451

xi


xii

↜渀屮


↜渀屮 Contents

11.14Assumptions
453
11.15 Other Data Issues
457
11.16 Classification
458
11.17 Using SAS and SPSS for Multiple Logistic Regression
461
11.18 Using SAS and SPSS to Implement the Box–Tidwell
Procedure463
11.19 Example Results Section for Logistic Regression
With Diabetes Prevention Study
465
11.20 Analysis Summary
466
11.21 Exercises
468
12.

13.

Repeated-Measures Analysis
12.1 Introduction
12.2 Single-Group Repeated Measures
12.3 The Multivariate Test Statistic for Repeated Measures
12.4 Assumptions in Repeated-Measures Analysis
12.5 Computer Analysis of the Drug Data

12.6 Post Hoc Procedures in Repeated-Measures Analysis
12.7 Should We Use the Univariate or Multivariate Approach?
12.8 One-Way Repeated Measures—A Trend Analysis
12.9 Sample Size for Power€=€.80 in Single-Sample Case
12.10 Multivariate Matched-Pairs Analysis
12.11 One-Between and One-Within Design
12.12 Post Hoc Procedures for the One-Between and
One-Within Design
12.13 One-Between and Two-Within Factors
12.14 Two-Between and One-Within Factors
12.15 Two-Between and Two-Within Factors
12.16 Totally Within Designs
12.17 Planned Comparisons in Repeated-Measures Designs
12.18 Profile Analysis
12.19 Doubly Multivariate Repeated-Measures Designs
12.20 Summary
12.21 Exercises

471
471
475
477
480
482
487
488
489
494
496
497

505
511
515
517
518
520
524
528
529
530

Hierarchical Linear Modeling
537
13.1Introduction
537
13.2 Problems Using Single-Level Analyses of
Multilevel Data
539
13.3 Formulation of the Multilevel Model
541
13.4 Two-Level Model—General Formation
541
13.5 Example 1: Examining School Differences in
Mathematics545
13.6 Centering Predictor Variables
563
568
13.7 Sample Size
13.8 Example 2: Evaluating the Efficacy of a Treatment
569

13.9Summary
576


Contents

↜渀屮

↜渀屮

14.

Multivariate Multilevel Modeling
578
14.1Introduction
578
14.2 Benefits of Conducting a Multivariate Multilevel
Analysis579
14.3 Research Example
580
14.4 Preparing a Data Set for MVMM Using SAS and SPSS
581
14.5 Incorporating Multiple Outcomes in the Level-1 Model
584
14.6 Example 1: Using SAS and SPSS to Conduct Two-Level
Multivariate Analysis
585
14.7 Example 2: Using SAS and SPSS to Conduct
Three-Level Multivariate Analysis
595

14.8Summary
614
14.9 SAS and SPSS Commands Used to Estimate All
Models in the Chapter
615

15.

Canonical Correlation
15.1Introduction
15.2 The Nature of Canonical Correlation
15.3 Significance Tests
15.4 Interpreting the Canonical Variates
15.5 Computer Example Using SAS CANCORR
15.6 A€Study That Used Canonical Correlation
15.7 Using SAS for Canonical Correlation on
Two Sets of Factor Scores
15.8 The Redundancy Index of Stewart and Love
15.9 Rotation of Canonical Variates
15.10 Obtaining More Reliable Canonical Variates
15.11 Summary
15.12 Exercises

16.

618
618
619
620
621

623
625
628
630
631
632
632
634

Structural Equation Modeling
639
16.1Introduction
639
16.2 Notation, Terminology, and Software
639
16.3 Causal Inference
642
16.4 Fundamental Topics in SEM
643
16.5 Three Principal SEM Techniques
663
16.6 Observed Variable Path Analysis
663
16.7 Observed Variable Path Analysis With the Mueller
Study668
16.8 Confirmatory Factor Analysis
689
16.9 CFA With Reactions-to-Tests Data
691
16.10 Latent Variable Path Analysis

707
16.11 Latent Variable Path Analysis With Exercise Behavior
Study711
16.12 SEM Considerations
719
16.13 Additional Models in SEM
724
16.14 Final Thoughts
726

xiii


xiv

↜渀屮

↜渀屮 Contents

Appendix 16.1 Abbreviated SAS Output for Final Observed
Variable Path Model
Appendix 16.2 Abbreviated SAS Output for the Final
Latent Variable Path Model for Exercise Behavior

734
736

Appendix A: Statistical Tables

747


Appendix B: Obtaining Nonorthogonal Contrasts in Repeated Measures Designs

763

Detailed Answers

771

Index785


PREFACE

The first five editions of this text have been received warmly, and we are grateful for
that.
This edition, like previous editions, is written for those who use, rather than develop,
advanced statistical methods. The focus is on conceptual understanding rather than
proving results. The narrative and many examples are there to promote understanding,
and a chapter on matrix algebra is included for those who need the extra help. Throughout the book, you will find output from SPSS (version 21) and SAS (version 9.3) with
interpretations. These interpretations are intended to demonstrate what analysis results
mean in the context of a research example and to help you interpret analysis results
properly. In addition to demonstrating how to use the statistical programs effectively,
our goal is to show you the importance of examining data, assessing statistical assumptions, and attending to sample size issues so that the results are generalizable. The
text also includes end-of-chapter exercises for many chapters, which are intended to
promote better understanding of concepts and have you obtain additional practice in
conducting analyses and interpreting results. Detailed answers to the odd-numbered
exercises are included in the back of the book so you can check your work.
NEW TO THIS EDITION
Many changes were made in this edition of the text, including a new lead author of

the text. In 2012, Dr.€Keenan Pituch of the University of Texas at Austin, along with
Dr.€James Stevens, developed a plan to revise this edition and began work. The goals
in revising the text were to provide more guidance on practical matters related to data
analysis, update the text in terms of the statistical procedures used, and firmly align
those procedures with findings from methodological research.
Key changes to this edition are:
 Inclusion of analysis summaries and example results sections
 Focus on just two software programs (SPSS version 21 and SAS version 9.3)


xvi

↜渀屮

↜渀屮 Preface

 New chapters on Binary Logistic Regression (Chapter€11) and Multivariate Multilevel Modeling (Chapter€14)
 Completely rewritten chapters on structural equation modeling (SEM), exploratory factor analysis, and hierarchical linear modeling.
ANALYSIS SUMMARIES AND EXAMPLE RESULTS SECTIONS
The analysis summaries provide a convenient guide for the analysis activities we generally recommend you use when conducting data analysis. Of course, to carry out these
activities in a meaningful way, you have to understand the underlying statistical concepts—something that we continue to promote in this edition. The analysis summaries and example results sections will also help you tie together the analysis activities
involved for a given procedure and illustrate how you may effectively communicate
analysis results.
The analysis summaries and example results sections are provided for several techniques.
Specifically, they are provided and applied to examples for the following procedures:
one-way MANOVA (sections€6.11–6.13), two-way MANOVA (sections€7.6–7.8), oneway MANCOVA (example 8.4 and sections€8.15 and 8.17), exploratory factor analysis
(sections€ 9.12, 9.17, and 9.18), discriminant analysis (sections€ 10.7.1, 10.7.2, 10.8,
10.14, and 10.15), and binary logistic regression (sections€11.19 and 11.20).
FOCUS ON SPSS AND SAS
Another change that has been implemented throughout the text is to focus the use of

software on two programs: SPSS (version 21) and SAS (version 9.3). Previous editions of this text, particularly for hierarchical linear modeling (HLM) and structural
equation modeling applications, have introduced additional programs for these purposes. However, in this edition, we use only SPSS and SAS because these programs
have improved capability to model data from more complex designs, and reviewers
of this edition expressed a preference for maintaining software continuity throughout
the text. This continuity essentially eliminates the need to learn (and/or teach) additional software programs (although we note there are many other excellent programs
available). Note, though, that for the structural equation modeling chapter SAS is used
exclusively, as SPSS requires users to obtain a separate add on module (AMOS) for
such analyses. In addition, SPSS and SAS syntax and output have also been updated
as needed throughout the text.
NEW CHAPTERS
Chapter€11 on binary logistic regression is new to this edition. We included the chapter
on logistic regression, a technique that Alan Agresti has called the “most important


Preface

↜渀屮

↜渀屮

model for categorical response data,” due to the widespread use of this procedure in
the social sciences, given its ability to readily incorporate categorical and continuous predictors in modeling a categorical response. Logistic regression can be used for
explanation and classification, with each of these uses illustrated in the chapter. With
the inclusion of this new chapter, the former chapter on Categorical Data Analysis: The
Log Linear Model has been moved to the website for this text.
Chapter€14 on multivariate multilevel modeling is another new chapter for this edition. This chapter is included because this modeling procedure has several advantages over the traditional MANOVA procedures that appear in Chapters€4–6 and
provides another alternative to analyzing data from a design that has a grouping
variable and several continuous outcomes (with discriminant analysis providing yet
another alternative). The advantages of multivariate multilevel modeling are presented in Chapter€14, where we also show that the newer modeling procedure can
replicate the results of traditional MANOVA. Given that we introduce this additional

and flexible modeling procedure for examining multivariate group differences, we
have eliminated the chapter on stepdown analysis from the text, but make it available
on the web.
REWRITTEN AND IMPROVED CHAPTERS
In addition, the chapter on structural equation modeling has been completely rewritten
by Dr.€Tiffany Whittaker of the University of Texas at Austin. Dr.€Whittaker has taught
a structural equation modeling course for many years and is an active methodological
researcher in this area. In this chapter, she presents the three major applications of
SEM: observed variable path analysis, confirmatory factor analysis, and latent variable path analysis. Note that the placement of confirmatory factor analysis in the SEM
chapter is new to this edition and was done to allow for more extensive coverage of
the common factor model in Chapter€ 9 and because confirmatory factor analysis is
inherently a SEM technique.
Chapter€9 is one of two chapters that have been extensively revised (along with Chapter€13). The major changes to Chapter€9 include the inclusion of parallel analysis to
help determine the number of factors present, an updated section on sample size, sections covering an overall focus on the common factor model, a section (9.7) providing
a student- and teacher-friendly introduction to factor analysis, a new section on creating factor scores, and the new example results and analysis summary sections. The
research examples used here are also new for exploratory factor analysis, and recall
that coverage of confirmatory analysis is now found in Chapter€16.
Major revisions have been made to Chapter€13, Hierarchical Linear Modeling. Section€13.1 has been revised to provide discussion of fixed and random factors to help
you recognize when hierarchical linear modeling may be needed. Section€13.2 uses
a different example than presented in the fifth edition and describes three types of

xvii


xviii

↜渀屮

↜渀屮 Preface


widely used models. Given the use of SPSS and SAS for HLM included in this
edition and a new example used in section€13.5, the remainder of the chapter is
essentially new material. Section€13.7 provides updated information on sample size,
and we would especially like to draw your attention to section€13.6, which is a new
section on the centering of predictor variables, a critical concern for this form of
modeling.
KEY CHAPTER-BY-CHAPTER REVISIONS
There are also many new sections and important revisions in this edition. Here, we
discuss the major changes by chapter.


Chapter€1 (section€1.6) now includes a discussion of issues related to missing data.
Included here are missing data mechanisms, missing data treatments, and illustrative analyses showing how you can select and implement a missing data analysis
treatment.
•The post hoc procedures have been revised for Chapters€4 and 5, which largely
reflect prevailing practices in applied research.
• Chapter€6 adds more information on the use of skewness and kurtosis to evaluate
the normality assumption as well as including the new example results and analysis summary sections for one-way MANOVA. In Chapter€6, we also include a new
data set (which we call the SeniorWISE data set, modeled after an applied study)
that appears in several chapters in the text.
• Chapter€7 has been retitled (somewhat), and in addition to including the example
results and analysis summary sections for two-way MANOVA, includes a new
section on factorial descriptive discriminant analysis.
• Chapter€8, in addition to the example results and analysis summary sections, includes a new section on effect size measures for group comparisons in ANCOVA/
MANCOVA, revised post hoc procedures for MANCOVA, and a new section that
briefly describes a benefit of using multivariate multilevel modeling that is particularly relevant for MANCOVA.
• The introduction to Chapter€10 is revised, and recommendations are updated in
section€ 10.4 for the use of coefficients to interpret discriminant functions. Section€10.7 includes a new research example for discriminant analysis, and section€10.7.5 is particularly important in that we provide recommendations for
selecting among traditional MANOVA, discriminant analysis, and multivariate
multilevel modeling procedures. This chapter includes the new example results

and analysis summary sections for descriptive discriminant analysis and applies
these procedures in sections€10.7 and 10.8.
• In Chapter€12, the major changes include an update of the post hoc procedures
(section€12.6), a new section on one-way trend analysis (section€12.8), and a
revised example and a more extensive discussion of post hoc procedures for
the one-between and one-within subjects factors design (sections€ 12.11 and
12.12).


Preface

↜渀屮

↜渀屮

ONLINE RESOURCES FOR TEXT
The book’s website www.routledge.com/9780415836661 contains the data sets from
the text, SPSS and SAS syntax from the text, and additional data sets (in SPSS and
SAS) that can be used for assignments and extra practice. For instructors, the site hosts
a conversion guide for users of the previous editions, 6 PowerPoint lecture slides providing a detailed walk-through for key examples from the text, detailed answers for all
exercises from the text, and downloadable PDFs of chapters 10 and 14 from the 5th
edition of the text for instructors that wish to continue assigning this content.
INTENDED AUDIENCE
As in previous editions, this book is intended for courses on multivariate statistics
found in psychology, social science, education, and business departments, but the
book also appeals to practicing researchers with little or no training in multivariate
methods.
A word on prerequisites students should have before using this book. They should
have a minimum of two quarter courses in statistics (covering factorial ANOVA and
ANCOVA). A€two-semester sequence of courses in statistics is preferable, as is prior

exposure to multiple regression. The book does not assume a working knowledge of
matrix algebra.
In closing, we hope you find that this edition is interesting to read, informative, and
provides useful guidance when you analyze data for your research projects.
ACKNOWLEDGMENTS
We wish to thank Dr.€Tiffany Whittaker of the University of Texas at Austin for her
valuable contribution to this edition. We would also like to thank Dr.€Wanchen Chang,
formerly a graduate student at the University of Texas at Austin and now a faculty
member at Boise State University, for assisting us with the SPSS and SAS syntax
that is included in Chapter€14. Dr.€Pituch would also like to thank his major professor Dr.€Richard Tate for his useful advice throughout the years and his exemplary
approach to teaching statistics courses.
Also, we would like to say a big thanks to the many reviewers (anonymous and otherwise) who provided many helpful suggestions for this text: Debbie Hahs-Vaughn
(University of Central Florida), Dennis Jackson (University of Windsor), Karin
Schermelleh-Engel (Goethe University), Robert Triscari (Florida Gulf Coast University), Dale Berger (Claremont Graduate University–Claremont McKenna College),
Namok Choi (University of Louisville), Joseph Wu (City University of Hong Kong),
Jorge Tendeiro (Groningen University), Ralph Rippe (Leiden University), and Philip

xix


xx

↜渀屮

↜渀屮 Preface

Schatz (Saint Joseph’s University). We attended to these suggestions whenever
possible.
Dr.€Pituch also wishes to thank commissioning editor Debra Riegert and Dr.€Stevens
for inviting him to work on this edition and for their patience as he worked through the

revisions. We would also like to thank development editor Rebecca Pearce for assisting us in many ways with this text. We would also like to thank the production staff at
Routledge for bringing this edition to completion.


Chapter 1

INTRODUCTION

1.1╇INTRODUCTION
Studies in the social sciences comparing two or more groups very often measure their
participants on several criterion variables. The following are some examples:
1. A researcher is comparing two methods of teaching second-grade reading. On a
posttest the researcher measures the participants on the following basic elements
related to reading: syllabication, blending, sound discrimination, reading rate, and
comprehension.
2. A social psychologist is testing the relative efficacy of three treatments on
self-concept, and measures participants on academic, emotional, and social
aspects of self-concept. Two different approaches to stress management are being
compared.
3. The investigator employs a couple of paper-and-pencil measures of anxiety (say,
the State-Trait Scale and the Subjective Stress Scale) and some physiological
measures.
4. A researcher comparing two types of counseling (Rogerian and Adlerian) on client
satisfaction and client self-acceptance.
A major part of this book involves the statistical analysis of several groups on a set of
criterion measures simultaneously, that is, multivariate analysis of variance, the multivariate referring to the multiple dependent variables.
Cronbach and Snow (1977), writing on aptitude–treatment interaction research, echoed the need for multiple criterion measures:
Learning is multivariate, however. Within any one task a person’s performance
at a point in time can be represented by a set of scores describing aspects of the
performance .€.€. even in laboratory research on rote learning, performance can

be assessed by multiple indices: errors, latencies and resistance to extinction, for


2

↜渀屮

↜渀屮 Introduction

example. These are only moderately correlated, and do not necessarily develop at
the same rate. In the paired associate’s task, sub skills have to be acquired: discriminating among and becoming familiar with the stimulus terms, being able to
produce the response terms, and tying response to stimulus. If these attainments
were separately measured, each would generate a learning curve, and there is no
reason to think that the curves would echo each other. (p.€116)
There are three good reasons that the use of multiple criterion measures in a study
comparing treatments (such as teaching methods, counseling methods, types of reinforcement, diets, etc.) is very sensible:
1. Any worthwhile treatment will affect the participants in more than one way.
Hence, the problem for the investigator is to determine in which specific ways the
participants will be affected, and then find sensitive measurement techniques for
those variables.
2. Through the use of multiple criterion measures we can obtain a more complete and
detailed description of the phenomenon under investigation, whether it is teacher
method effectiveness, counselor effectiveness, diet effectiveness, stress management technique effectiveness, and so€on.
3. Treatments can be expensive to implement, while the cost of obtaining data on
several dependent variables is relatively small and maximizes information€gain.
Because we define a multivariate study as one with several dependent variables, multiple regression (where there is only one dependent variable) and principal components
analysis would not be considered multivariate techniques. However, our distinction is
more semantic than substantive. Therefore, because regression and component analysis are so important and frequently used in social science research, we include them
in this€text.
We have four major objectives for the remainder of this chapter:

1. To review some basic concepts (e.g., type I€error and power) and some issues associated with univariate analysis that are equally important in multivariate analysis.
2. To discuss the importance of identifying outliers, that is, points that split off from
the rest of the data, and deciding what to do about them. We give some examples to show the considerable impact outliers can have on the results in univariate
analysis.
3 To discuss the issue of missing data and describe some recommended missing data
treatments.
4. To give research examples of some of the multivariate analyses to be covered later
in the text and to indicate how these analyses involve generalizations of what the
student has previously learned.
5. To briefly introduce the Statistical Analysis System (SAS) and the IBM Statistical
Package for the Social Sciences (SPSS), whose outputs are discussed throughout
the€text.


Chapter 1

↜渀屮

↜渀屮

1.2╇ TYPE I€ERROR, TYPE II ERROR, AND€POWER
Suppose we have randomly assigned 15 participants to a treatment group and another
15 participants to a control group, and we are comparing them on a single measure of
task performance (a univariate study, because there is a single dependent variable).
You may recall that the t test for independent samples is appropriate here. We wish to
determine whether the difference in the sample means is large enough, given sampling
error, to suggest that the underlying population means are different. Because the sample means estimate the population means, they will generally be in error (i.e., they will
not hit the population values right “on the nose”), and this is called sampling error. We
wish to test the null hypothesis (H0) that the population means are equal:
H0 : μ1€=€μ2

It is called the null hypothesis because saying the population means are equal is equivalent to saying that the difference in the means is 0, that is, μ1 − μ2 = 0, or that the
difference is€null.
Now, statisticians have determined that, given the assumptions of the procedure are
satisfied, if we had populations with equal means and drew samples of size 15 repeatedly and computed a t statistic each time, then 95% of the time we would obtain t
values in the range −2.048 to 2.048. The so-called sampling distribution of t under H0
would look like€this:

t (under H0)

95% of the t values

–2.048

0

2.048

This sampling distribution is extremely important, for it gives us a frame of reference
for judging what is a large value of t. Thus, if our t value was 2.56, it would be very
plausible to reject the H0, since obtaining such a large t value is very unlikely when
H0 is true. Note, however, that if we do so there is a chance we have made an error,
because it is possible (although very improbable) to obtain such a large value for t,
even when the population means are equal. In practice, one must decide how much of
a risk of making this type of error (called a type I€error) one wishes to take. Of course,
one would want that risk to be small, and many have decided a 5% risk is small. This
is formalized in hypothesis testing by saying that we set our level of significance (α)
at the .05 level. That is, we are willing to take a 5% chance of making a type I€error. In
other words, type I€error (level of significance) is the probability of rejecting the null
hypothesis when it is true.


3


4

↜渀屮

↜渀屮 Introduction

Recall that the formula for degrees of freedom for the t test is (n1 + n2 − 2); hence,
for this problem df€=€28. If we had set α€=€.05, then reference to Appendix A.2 of this
book shows that the critical values are −2.048 and 2.048. They are called critical values because they are critical to the decision we will make on H0. These critical values
define critical regions in the sampling distribution. If the value of t falls in the critical
region we reject H0; otherwise we fail to reject:

t (under H0) for df = 28

–2.048

2.048
0

Reject H0

Reject H0

Type I€error is equivalent to saying the groups differ when in fact they do not. The α
level set by the investigator is a subjective decision, but is usually set at .05 or .01 by
most researchers. There are situations, however, when it makes sense to use α levels
other than .05 or .01. For example, if making a type I€error will not have serious

substantive consequences, or if sample size is small, setting α€=€.10 or .15 is quite
reasonable. Why this is reasonable for small sample size will be made clear shortly.
On the other hand, suppose we are in a medical situation where the null hypothesis
is equivalent to saying a drug is unsafe, and the alternative is that the drug is safe.
Here, making a type I€error could be quite serious, for we would be declaring the
drug safe when it is not safe. This could cause some people to be permanently damaged or perhaps even killed. In this case it would make sense to use a very small α,
perhaps .001.
Another type of error that can be made in conducting a statistical test is called a type II
error. The type II error rate, denoted by β, is the probability of accepting H0 when it is
false. Thus, a type II error, in this case, is saying the groups don’t differ when they do.
Now, not only can either type of error occur, but in addition, they are inversely related
(when other factors, e.g., sample size and effect size, affecting these probabilities are
held constant). Thus, holding these factors constant, as we control on type I€error, type
II error increases. This is illustrated here for a two-group problem with 30 participants
per group where the population effect size d (defined later) is .5:
α

β

1−β

.10
.05
.01

.37
.52
.78

.63

.48
.22


×