Tải bản đầy đủ (.pdf) (537 trang)

crc - standard probability and statistics tables and formulae - daniel zwillinger

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.11 MB, 537 trang )

c
 2000 by Chapman & Hall/CRC
CRC
standard
probability
and
Statistics tables
and formulae
CHAPMAN & HALL/CRC
DANIEL ZWILLINGER
Rensselaer Polytechnic Institute
Troy, New York
STEPHEN KOKOSKA
Bloomsburg University
Bloomsburg, Pennsylvania
Boca Raton London New York Washington, D.C.
standard
probability
and
Statistics tables
and formulae
CRC

This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for


creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice:

Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2000 by Chapman & Hall/CRC
No claim to original U.S. Government works
International Standard Book Number 1-58488-059-7
Library of Congress Card Number 99-045786
Printed in the United States of America 2 3 4 5 6 7 8 9 0
Printed on acid-free paper

Library of Congress Cataloging-in-Publication Data

Zwillinger, Daniel, 1957-
CRC standard probability and statistics tables and formulae / Daniel Zwillinger, Stephen Kokoska.
p. cm.
Includes bibliographical references and index.
ISBN 1-58488-059-7 (alk. paper)
1. Probabilities—Tables. 2. Mathematical statistics—Tables. I. Kokoska, Stephen.
II. Title.
QA273.3 .Z95 1999
519.2




02



1—dc21 99-045786

Preface
It has long been the established policy of CRC Press to publish, in handbook
form, the most up-to-date, authoritative, logically arranged, and readily us-
able reference material available. This book fills the need in probability and
statistics.
Prior to the preparation of this book the contents of similar books were con-
sidered. It is easy to fill a statistics reference book with many hundred pages
of tables—indeed, some large books contain statistical tables for only a single
test. The authors of this book focused on the basic principles of statistics.
We have tried to ensure that each topic had an understandable textual in-
troduction as well as easily understood examples. There are more than 80
examples; they usually follow the same format: start with a word problem,
interpret the words as a statistical problem, find the solution, interpret the
solution in words.
We have organized this reference in an efficient and useful format. We believe
both students and researchers will find this reference easy to read and under-
stand. Material is presented in a multi-sectional format, with each section
containing a valuable collection of fundamental reference material—tabular
and expository. This Handbook serves as a guide for determining appropriate
statistical procedures and interpretation of results. We have assembled the
most important concepts in probability and statistics, as experienced through
our own teaching, research, and work in industry.

For most topics, concise yet useful tables were created. In most cases, the
tables were re-generated and verified against existing tables. Even very mod-
est statistical software can generate many of the tables in the book—often to
more decimal places and for more values of the parameters. The values in
this book are designed to illustrate the range of possible values and act as a
handy reference for the most commonly needed values.
This book also contains many useful topics from more advanced areas of statis-
tics, but these topics have fewer examples. Also included are a large collection
of short topics containing many classical results and puzzles. Finally, a section
on notation used in the book and a comprehensive index are also included.
c
 2000 by Chapman & Hall/CRC
In line with the established policy of CRC Press, this Handbook will be kept
as current and timely as is possible. Revisions and anticipated uses of newer
materials and tables will be introduced as the need arises. Suggestions for the
inclusion of new material in subsequent editions and comments concerning
the accuracy of stated information are welcomed.
If any errata are discovered for this book, they will be posted to
/>Many people have helped in the preparation of this manuscript. The authors
are especially grateful to our families who have remained lighthearted and
cheerful throughout the process. A special thanks to Janet and Kent, and to
Joan, Mark, and Jen.
Daniel Zwillinger

Stephen Kokoska

ACKNOWLEDGMENTS
Plans 6.1–6.6, 6A.1–6A.6, and 13.1–13.5 (appearing on pages 331–337) originally appeared
on pages 234–237, 276–279, and 522–523 of W. G. Cochran and G. M. Cox, Experimental
Designs, Second Edition, John Wiley & Sons, Inc, New York, 1957. Reprinted by permission

of John Wiley & Sons, Inc.
The tables of Bartlett’s critical values (in section 10.6.2) are from D. D. Dyer and J. P.
Keating, “On the Determination of Critical Values for Bartlett’s Test”, JASA, Volume 75,
1980, pages 313–319. Reprinted with permission from the Journal of American Statistical
Association. Copyright 1980 by the American Statistical Association. All rights reserved.
The tables of Cochran’s critical values (in section 10.7.1) are from C. Eisenhart, M. W.
Hastay, and W. A. Wallis, Techniques of Statistical Analysis, McGraw-Hill Book Com-
pany,1947,Tables15.1and15.2(pages390-391).ReprintedcourtesyofTheMcGraw-Hill
Companies.
The tables of Dunnett’s critical values (in section 12.1.4.5) are from C. W. Dunnett, “A
Multiple Comparison Procedure for Comparing Several Treatments with a Control”, JASA,
Volume 50, 1955, pages 1096–1121. Reprinted with permission from the Journal of Amer-
ican Statistical Association. Copyright 1980 by the American Statistical Association. All
rights reserved.
The tables of Duncan’s critical values (in section 12.1.4.3) are from L. Hunter, “Critical
Values for Duncan’s New Multiple Range Test”, Biometrics, 1960, Volume 16, pages 671–
685. Reprinted with permission from the Journal of American Statistical Association.
Copyright 1960 by the American Statistical Association. All rights reserved.
Table15.1isreproduced,bypermission,fromASTMManualonQualityControlofMate-
rials, American Society for Testing and Materials, Philadelphia, PA, 1951.
The table in section 15.1.2 and much of Chapter 18 originally appeared in D. Zwillinger,
Standard Mathematical Tables and Formulae, 30th edition, CRC Press, Boca Raton, FL,
1995. Reprinted courtesy of CRC Press, LLC.
Muchofsection17.17istakenfromtheURL />Permission courtesy of John C. Pezzullo.
c
 2000 by Chapman & Hall/CRC
Contents
1Introduction
1.1Background
1.2Datasets

1.3References
2SummarizingData
2.1Tabularandgraphicalprocedures
2.2Numericalsummarymeasures
3 Probability
3.1Algebraofsets
3.2Combinatorialmethods
3.3Probability
3.4Random variables
3.5Mathematicalexpectation
3.6Multivariatedistributions
3.7Inequalities
4FunctionsofRandomVariables
4.1Findingtheprobabilitydistribution
4.2Sumsofrandomvariables
4.3Samplingdistributions
4.4Finitepopulation
4.5Theorems
4.6Orderstatistics
4.7Rangeandstudentizedrange
© 2000 by Chapman & Hall/CRC
5DiscreteProbabilityDistributions
5.1Bernoullidistribution
5.2Betabinomialdistribution
5.3BetaPascaldistribution
5.4Binomialdistribution
5.5Geometricdistribution
5.6Hypergeometricdistribution
5.7Multinomialdistribution
5.8Negativebinomialdistribution

5.9Poissondistribution
5.10Rectangular(discreteuniform)distribution
6ContinuousProbabilityDistributions
6.1Arcsindistribution
6.2Betadistribution
6.3Cauchydistribution
6.4Chi–squaredistribution
6.5Erlangdistribution
6.6Exponentialdistribution
6.7Extreme–valuedistribution
6.8Fdistribution
6.9Gammadistribution
6.10Half–normaldistribution
6.11InverseGaussian(Wald)distribution
6.12Laplacedistribution
6.13Logisticdistribution
6.14Lognormaldistribution
6.15Noncentralchi–squaredistribution
6.16NoncentralFdistribution
6.17Noncentraltdistribution
6.18Normaldistribution
6.19Normaldistribution:multivariate
6.20Paretodistribution
6.21Powerfunctiondistribution
6.22Rayleighdistribution
6.23tdistribution
c
 2000 by Chapman & Hall/CRC
6.24Triangulardistribution
6.25Uniform distribution

6.26Weibulldistribution
6.27Relationshipsamongdistributions
7StandardNormalDistribution
7.1Densityfunctionandrelatedfunctions
7.2Criticalvalues
7.3Tolerancefactorsfornormaldistributions
7.4Operatingcharacteristiccurves
7.5Multivariatenormaldistribution
7.6Distributionofthecorrelationcoefficient
7.7Circularnormalprobabilities
7.8Circularerrorprobabilities
8Estimation
8.1Definitions
8.2Cram´er–Raoinequality
8.3Theorems
8.4Themethodofmoments
8.5Thelikelihoodfunction
8.6Themethodofmaximumlikelihood
8.7InvariancepropertyofMLEs
8.8Differentestimators
8.9Estimatorsforsmallsamples
8.10Estimatorsforlargesamples
9ConfidenceIntervals
9.1Definitions
9.2Commoncriticalvalues
9.3Samplesizecalculations
9.4Summaryofcommonconfidenceintervals
9.5Confidenceintervals:onesample
9.6Confidenceintervals:twosamples
9.7Finitepopulationcorrectionfactor

10HypothesisTesting
c
 2000 by Chapman & Hall/CRC
10.1Introduction
10.2TheNeyman–Pearsonlemma
10.3Likelihoodratiotests
10.4Goodnessoffittest
10.5Contingencytables
10.6Bartlett’stest
10.7Cochran’stest
10.8Numberofobservationsrequired
10.9Criticalvaluesfortestingoutliers
10.10Significancetestin2×2contingencytables
10.11DeterminingvaluesinBernoullitrials
11RegressionAnalysis
11.1Simplelinearregression
11.2Multiplelinearregression
11.3Orthogonalpolynomials
12AnalysisofVariance
12.1One-wayanova
12.2Two-wayanova
12.3Three-factorexperiments
12.4Manova
12.5Factoranalysis
12.6Latinsquaredesign
13ExperimentalDesign
13.1Latinsquares
13.2Graeco–Latinsquares
13.3Blockdesigns
13.4Factorialexperimentation:2factors

13.52
r
Factorialexperiments
13.6Confoundingin2
n
factorialexperiments
13.7Tablesfordesignofexperiments
13.8References
14NonparametricStatistics
14.1Friedmantestforrandomizedblockdesign
c
 2000 by Chapman & Hall/CRC
14.2Kendall’srankcorrelationcoefficient
14.3Kolmogorov–Smirnofftests
14.4Kruskal–Wallistest
14.5Therunstest
14.6Thesigntest
14.7Spearman’srankcorrelationcoefficient
14.8Wilcoxonmatched-pairssigned-rankstest
14.9Wilcoxonrank–sum (Mann–Whitney)test
14.10Wilcoxonsigned-ranktest
15QualityControlandRiskAnalysis
15.1Qualityassurance
15.2Acceptancesampling
15.3Reliability
15.4Riskanalysisanddecisionrules
16GeneralLinearModels
16.1Notation
16.2Thegenerallinearmodel
16.3Summaryofrulesformatrixoperations

16.4Quadraticforms
16.5Generallinearhypothesisoffullrank
16.6Generallinearmodeloflessthanfullrank
17MiscellaneousTopics
17.1Geometricprobability
17.2Informationandcommunicationtheory
17.3Kalmanfiltering
17.4Largedeviations(theoryofrareevents)
17.5Markovchains
17.6Martingales
17.7Measuretheoreticalprobability
17.8MonteCarlointegrationtechniques
17.9Queuingtheory
17.10Random m atrixeigenvalues
17.11Random num bergeneration
17.12Resamplingmethods
c
 2000 by Chapman & Hall/CRC
17.13Self-similarprocesses
17.14Signalprocessing
17.15Stochasticcalculus
17.16Classicandinterestingproblems
17.17Electronicresources
17.18Tables
18SpecialFunctions
18.1Besselfunctions
18.2Betafunction
18.3Ceilingandfloorfunctions
18.4Deltafunction
18.5Errorfunctions

18.6Exponentialfunction
18.7FactorialsandPochhammer’ssymbol
18.8Gammafunction
18.9Hypergeometricfunctions
18.10Logarithmicfunctions
18.11Partitions
18.12Signum function
18.13Stirlingnumbers
18.14Sumsofpowersofintegers
18.15Tablesoforthogonalpolynomials
18.16References
Notation
c
 2000 by Chapman & Hall/CRC
CHAPTER1
Introduction
Contents
1.1Background
1.2Datasets
1.3References
1.1 BACKGROUND
The purpose of this book is to provide a modern set of tables and a com-
prehensive list of definitions, concepts, theorems, and formulae in probability
and statistics. While the numbers in these tables have not changed since they
were first computed (in some cases, several hundred years ago), the presenta-
tion format here is modernized. In addition, nearly all table values have been
re-computed to ensure accuracy.
Almost every table is presented along with a textual description and at least
one example using a value from the table. Most concepts are illustrated with
examples and step-by-step solutions. Several data sets are described in this

chapter; they are used in this book in order for users to be able to check
algorithms.
The emphasis of this book is on what is often called basic statistics. Most
real-world statistics users will be able to refer to this book in order to quickly
verify a formula, definition, or theorem. In addition, the set of tables here
should make this a complete statistics reference tool. Some more advanced
useful and current topics, such as Brownian motion and decision theory are
also included.
1.2 DATA SETS
Wehaveestablishedafewdatasetswhichwehaveusedinexamplesthrough-
out this book. With these, a user can check a local statistics program by
verifying that it returns the same values as given in this book. For exam-
ple, the correlation coefficient between the first 100 elements of the sequence
of integers {1, 2, 3 } and the first 100 elements of the sequence of squares
{1, 4, 9 } is 0.96885. Using this value is an easy way to check for correct
computation of a computer program. These data sets may be obtained from
/>c
 2000 by Chapman & Hall/CRC
Ticket data: Forty random speeding tickets were selected from the courthouse
records in Columbia County. The speed indicated on each ticket is given in
the table below.
58 72 64 65 67 92 55 51 69 73
64 59 65 55 75 56 89 60 84 68
74 67 55 68 74 43 67 71 72 66
62 63 83 64 51 63 49 78 65 75
Swimming pool data: Water samples from 35 randomly selected pools in Bev-
erly Hills were tested for acidity. The following table lists the PH for each
sample.
6.4 6.6 6.2 7.2 6.2 8.1 7.0
7.0 5.9 5.7 7.0 7.4 6.5 6.8

7.0 7.0 6.0 6.3 5.6 6.3 5.8
5.9 7.2 7.3 7.7 6.8 5.2 5.2
6.4 6.3 6.2 7.5 6.7 6.4 7.8
Soda pop data: A new soda machine placed in the Mathematics Building on
campus recorded the following sales data for one week in April.
Soda Number of cans
Pepsi 72
Wild Cherry Pepsi 60
Diet Pepsi 85
Seven Up 54
Mountain Dew 32
Lipton Ice Tea 64
1.3 REFERENCES
Gathered here are some of the books referenced in later sections; each has a
broad coverage of the topics it addresses.
1. W. G. Cochran and G. M. Cox, Experimental Designs, Second Edition,
John Wiley & Sons, Inc., New York, 1957.
2. C. J. Colbourn and J. H. Dinitz, CRC Handbook of Combinatorial De-
signs, CRC Press, Boca Raton, FL, 1996.
3. L. Devroye, Non-Uniform Random Variate Generation, Springer–Verlag,
New York, 1986.
4. W. Feller, An Introduction to Probability Theory and Its Applications,
Volumes 1 and 2, John Wiley & Sons, New York, 1968.
5. C. W. Gardiner, Handbook of Stochastic Methods, Second edition, Springer–
Verlag, New York, 1985.
6. D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical
Procedures, CRC Press LLC, Boca Raton, FL, 1997.
c
 2000 by Chapman & Hall/CRC
CHAPTER 2

Summarizing Data
Contents
2.1Tabularandgraphicalprocedures
2.1.1Stem-and-leafplot
2.1.2Frequencydistribution
2.1.3Histogram
2.1.4Frequencypolygons
2.1.5Chernofffaces
2.2Numericalsummarymeasures
2.2.1(Arithmetic)mean
2.2.2Weighted(arithmetic)mean
2.2.3Geometricmean
2.2.4Harmonicmean
2.2.5Mode
2.2.6Median
2.2.7p%trimmedmean
2.2.8Quartiles
2.2.9Deciles
2.2.10Percentiles
2.2.11Meandeviation
2.2.12Variance
2.2.13Standarddeviation
2.2.14Standarderrors
2.2.15Rootmeansquare
2.2.16Range
2.2.17Interquartilerange
2.2.18Quartiledeviation
2.2.19Boxplots
2.2.20Coefficientofvariation
2.2.21Coefficientofquartilevariation

2.2.22Zscore
2.2.23Moments
2.2.24Measuresofskewness
c
 2000 by Chapman & Hall/CRC
2.2.25Measuresofkurtosis
2.2.26Datatransformations
2.2.27Sheppard’scorrectionsforgrouping
Numerical descriptive statistics and graphical techniques may be used to sum-
marize information about central tendency and/or variability.
2.1 TABULAR AND GRAPHICAL PROCEDURES
2.1.1 Stem-and-leaf plot
A stem-and-leaf plot is a a graphical summary used to describe a set of ob-
servations (as symmetric, skewed, etc.). Each observation is displayed on the
graph and should have at least two digits. Split each observation (at the same
point) into a stem (one or more of the leading digit(s)) and a leaf (remaining
digits). Select the split point so that there are 5–20 total stems. List the
stems in a column to the left, and write each leaf in the corresponding stem
row.
Example 2.1:
Construct a stem-and-leaf plot for the Ticket Data (page 2).
Solution:
Stem Leaf
4 39
5
11555689
6
02334445556777889
7
122344558

8
349
9
2
Stem = 10, Leaf = 1
Figure 2.1: Stem–and–leaf plot for Ticket Data.
2.1.2 Frequency distribution
A frequency distribution is a tabular method for summarizing continuous or
discrete numerical data or categorical data.
(1) Partition the measurement axis into 5–20 (usually equal) reasonable
subintervals called classes, or class intervals. Thus, each observation
falls into exactly one class.
(2) Record, or tally, the number of observations in each class, called the
frequency of each class.
(3) Compute the proportion of observations in each class, called the relative
frequency.
(4) Compute the proportion of observations in each class and all preceding
classes, called the cumulative relative frequency.
c
 2000 by Chapman & Hall/CRC
Example 2.2: Construct a frequency distribution for the Ticket Data (page 2).
Solution:
(S1) Determine the classes. It seems reasonable to use 40 to less than 50, 50 to less
than 60, , 90 to less than 100.
Note: For continuous data, one end of each class must be open. This ensures
that each observation will fall into only one class. The open end of each class
may be either the left or right, but should be consistent.
(S2) Record the number of observations in each class.
(S3) Compute the relative frequency and cumulative relative frequency for each class.
(S4)TheresultingfrequencydistributionisinFigure2.2.

Cumulative
Relative relative
Class Frequency frequency frequency
[40, 50) 2 0.050 0.050
[50, 60) 8 0.200 0.250
[60, 70) 17 0.425 0.625
[70, 80) 9 0.225 0.900
[80, 90) 3 0.075 0.975
[90, 100) 1 0.025 1.000
Figure 2.2: Frequency distribution for Ticket Data.
2.1.3 Histogram
A histogram is a graphical representation of a frequency distribution. A (rela-
tive) frequency histogram is a plot of (relative) frequency versus class interval.
Rectangles are constructed over each class with height proportional (usually
equal) to the class (relative) frequency. A frequency and relative frequency
histogram have the same shape, but different scales on the vertical axis.
Example 2.3:
Construct a frequency histogram for the Ticket Data (page 2).
Solution:
(S1)UsingthefrequencydistributioninFigure2.2,constructrectanglesaboveeach
class, with height equal to class frequency.
(S2)TheresultinghistogramisinFigure2.3.
Note: A probability histogram is constructed so that the area of each rectangle
equals the relative frequency. If the class widths are unequal, this histogram
presents a more accurate description of the distribution.
2.1.4 Frequency polygons
A frequency polygon is a line plot of points with x coordinate being class
midpoint and y coordinate being class frequency. Often the graph extends to
c
 2000 by Chapman & Hall/CRC

Figure 2.3: Frequency histogram for Ticket Data.
an additional empty class on both ends. The relative frequency may be used
in place of frequency.
Example 2.4:
Construct a frequency polygon for the Ticket Data (page 2).
Solution:
(S1)UsingthefrequencydistributioninFigure2.2,ploteachpointandconnectthe
graph.
(S2)TheresultingfrequencypolygonisinFigure2.4.
Figure 2.4: Frequency polygon for Ticket Data.
An ogive,orcumulative frequency polygon, is a plot of cumulative fre-
quencyversustheupperclasslimit.Figure2.5isanogivefortheTicketData
(page 2).
Another type of frequency polygon is a more-than cumulative frequency poly-
gon. For each class this plots the number of observations in that class and
every class above versus the lower class limit.
c
 2000 by Chapman & Hall/CRC
Figure 2.5: Ogive for Ticket Data.
A bar chart is often used to graphically summarize discrete or categorical
data. A rectangle is drawn over each bin with height proportional to frequency.
The chart may be drawn with horizontal rectangles, in three dimensions, and
maybeusedtocomparetwoormoresetsofobservations.Figure2.6isabar
chart for the Soda Pop Data (page 2).
Figure 2.6: Bar chart for Soda Pop Data.
A pie chart is used to illustrate parts of the total. A circle is divided into
slicesproportionaltothebinfrequency.Figure2.7isapiechartfortheSoda
Pop Data (page 2).
2.1.5 Chernoff faces
Chernoff faces are used to illustrate trends in multidimensional data. They

are effective because people are used to differentiating between facial features.
Chernoff faces have been used for cluster, discriminant, and time-series anal-
yses. Facial features that might be controllable by the data include:
(a) ear: level, radius
(b) eyebrow: height, slope, length
(c) eyes: height, size, separation, eccentricity, pupil position or size
c
 2000 by Chapman & Hall/CRC
Figure 2.7: Pie chart for Soda Pop Data.
(d) face: width, half-face height, lower or upper eccentricity
(e) mouth: position of center, curvature, length, openness
(f) nose: width, length
TheChernofffacesinFigure2.8comefromdataaboutthisbook.Forthe
even chapters:
(a) eye size is proportional to the approximate number of pages
(b) mouth size is proportional to the approximate number of words
(c) face shape is proportional to the approximate number of occurrences of
the word “the”
The data are as follows:
Chapter 2 4 6 8 10 12 14 16 18
Number of pages 18 30 56 8 36 40 40 26 23
Number of words 4514 5426 12234 2392 9948 18418 8179 11739 5186
Occurrences of “the” 159 147 159 47 153 118 264 223 82
Aninteractiveprogram forcreatingChernofffacesisavailableathttp://
www.hesketh.com/schampeo/projects/Faces/interactive.shtml.SeeH.
Chernoff,“Theuseof faces to represent points in a K-dimensional space
graphically,” Journal of the American Statistical Association, Vol. 68, No. 342,
1973, pages 361–368.
2.2 NUMERICAL SUMMARY MEASURES
The following conventions will be used in the definitions and formulas in this

section.
(C1) Ungrouped data: Let x
1
,x
2
,x
3
, ,x
n
be a set of observations.
(C2) Grouped data: Let x
1
,x
2
,x
3
, ,x
k
be a set of class marks from a fre-
quency distribution, or a representative set of observations, with corre-
c
 2000 by Chapman & Hall/CRC
Figure 2.8: Chernoff faces for chapter data.
sponding frequencies f
1
,f
2
,f
3
, ,f

k
. The total number of observations
is n =
k

i=1
f
i
. Let c denote the (constant) width of each bin and x
o
one
of the class marks selected to be the computing origin. Each class mark,
x
i
, may be coded by u
i
=(x
i
− x
o
)/c. Each u
i
will be an integer and
the bin mark taken as the computing origin will be coded as a 0.
2.2.1 (Arithmetic) mean
The (arithmetic) mean of a set of observations is the sum of the observations
divided by the total number of observations.
(1) Ungrouped data:
x =
1

n
n

i=1
x
i
=
x
1
+ x
2
+ x
3
+ ···+ x
n
n
(2.1)
(2) Grouped data:
x =
1
n
k

i=1
f
i
x
i
=
f

1
x
1
+ f
2
x
2
+ f
3
x
3
+ ···+ f
n
x
n
n
(2.2)
c
 2000 by Chapman & Hall/CRC
(3) Coded data:
x = x
o
+ c ·
k

i=1
f
i
u
i

n
(2.3)
2.2.2 Weighted (arithmetic) mean
Let w
i
≥ 0 be the weight associated with observation x
i
. The total weight is
given by
n

i=1
w
i
and the weighted mean is
x
w
=
n

i=1
w
i
x
i
n

i=1
w
i

=
w
1
x
1
+ w
2
x
2
+ w
3
x
3
+ ···+ w
n
x
n
w
1
+ w
2
+ w
3
+ ···+ w
n
. (2.4)
2.2.3 Geometric mean
For ungrouped data such that x
i
> 0, the geometric mean is the n

th
root of
the product of the observations:
GM =
n

x
1
· x
2
· x
3
···x
n
. (2.5)
In logarithmic form:
log(GM) =
1
n
n

i=1
log x
i
=
log x
1
+ log x
2
+ log x

3
+ ···+ log x
n
n
. (2.6)
For grouped data with each class mark x
i
> 0:
GM =
n

x
f
1
1
· x
f
2
2
· x
f
3
3
···x
f
k
k
. (2.7)
In logarithmic form:
log(GM) =

1
n
k

i=1
f
i
log(x
i
) (2.8)
=
f
1
log(x
1
)+f
2
log(x
2
)+f
3
log(x
3
)+···+ f
k
log(x
k
)
n
.

2.2.4 Harmonic mean
For ungrouped data the harmonic mean is given by
HM =
n
n

i=1
1
x
i
=
n
1
x
1
+
1
x
2
+
1
x
3
+ ···+
1
x
n
. (2.9)
c
 2000 by Chapman & Hall/CRC

For grouped data:
HM =
n
k

i=1
f
i
x
i
=
n
f
1
x
1
+
f
2
x
2
+
f
3
x
3
+ ···+
f
k
x

k
. (2.10)
Note: The equation involving the arithmetic, geometric, and harmonic mean
is
HM ≤ GM ≤
x. (2.11)
Equality holds if all n observations are equal.
2.2.5 Mode
For ungrouped data, the mode, M
o
, is the value that occurs most often, or with
the greatest frequency. A mode may not exist, for example, if all observations
occur with the same frequency. If the mode does exist, it may not be unique,
for example, if two observations occur with the greatest frequency.
For grouped data, select the class containing the largest frequency, called
the modal class. Let L be the lower boundary of the modal class, d
L
the
difference in frequencies between the modal class and the class immediately
below, and d
H
the difference in frequencies between the modal class and the
class immediately above. The mode may be approximated by
M
o
≈ L + c ·
d
L
d
L

+ d
H
. (2.12)
2.2.6 Median
The median, ˜x, is another measure of central tendency, resistant to outliers.
For ungrouped data, arrange the observations in order from smallest to largest.
If n is odd, the median is the middle value. If n is even, the median is the
mean of the two middle values.
For grouped data, select the class containing the median (median class). Let
L be the lower boundary of the median class, f
m
the frequency of the median
class, and CF the sum of frequencies for all classes below the median class (a
cumulative frequency). The median may be approximated by
˜x ≈ L + c ·
n
2
− CF
f
m
. (2.13)
Note:Ifx>˜x the distribution is positively skewed. If x<˜x the distribution
is negatively skewed. If
x ≈ ˜x the distribution is approximately symmetric.
2.2.7 p% trimmed mean
A trimmed mean is a measure of central tendency and a compromise between
a mean and a median. The mean is more sensitive to outliers, and the median
is less sensitive to outliers. Order the observations from smallest to largest.
c
 2000 by Chapman & Hall/CRC

Delete the smallest p% and the largest p% of the observations. The p%
trimmed mean,
x
tr
(p)
, is the arithmetic mean of the remaining observations.
Note:Ifp%ofn (observations) is not an integer, several (computer) algo-
rithms exist for interpolating at each end of the distribution and for deter-
mining
x
tr
(p)
.
Example 2.5:
Using the Swimming Pool data (page 2) find the mean, median, and
mode. Compute the geometric mean and the harmonic mean, and verify the relationship
between these three measures.
Solution:
(S1)
x =
1
35
(6.4+6.6+6.2+···+7.8)=6.5886
(S2) ˜x =6.5, the middle values when the observations are arranged in order from
smallest to largest.
(S3) M
o
=7.0, the observation that occurs most often.
(S4) GM =
35


(6.4)(6.6)(6.2) ···(7.8)=6.5513
(S5) HM =
35
(1/6.4) + (1/6.6)+(1/6.2) + ···+(1/7.8)
=6.5137
(S6) To verify the inequality: 6.5137
  
HM
≤ 6.5513
  
GM
≤ 6.5886
  
x
2.2.8 Quartiles
Quartiles split the data into four parts. For ungrouped data, arrange the
observations in order from smallest to largest.
(1) The second quartile is the median: Q
2
=˜x.
(2) If n is even:
The first quartile, Q
1
, is the median of the smallest n/2 observations;
and the third quartile, Q
3
, is the median of the largest n/2 observations.
(3) If n is odd:
The first quartile, Q

1
, is the median of the smallest (n +1)/2 observa-
tions; and the third quartile, Q
3
, is the median of the largest (n +1)/2
observations.
For grouped data, the quartiles are computed by applying equation (2.13) for
the median. Compute the following:
L
1
= the lower boundary of the class containing Q
1
.
L
3
= the lower boundary of the class containing Q
3
.
f
1
= the frequency of the class containing the first quartile.
f
3
= the frequency of the class containing the third quartile.
CF
1
= cumulative frequency for classes below the one containing Q
1
.
CF

3
= cumulative frequency for classes below the one containing Q
3
.
c
 2000 by Chapman & Hall/CRC
The (approximate) quartiles are given by
Q
1
= L
1
+ c ·
n
4
− CF
1
f
1
Q
3
= L
3
+ c ·
3n
4
− CF
3
f
3
. (2.14)

2.2.9 Deciles
Deciles split the data into 10 parts.
(1) For ungrouped data, arrange the observations in order from smallest to
largest. The i
th
decile, D
i
(for i =1, 2, ,9), is the i(n +1)/10
th
ob-
servation. It may be necessary to interpolate between successive values.
(2) For grouped data, apply equation (2.13) (as in equation (2.14)) for the
median to find the approximate deciles. D
i
is in the class containing
the in/10
th
largest observation.
2.2.10 Percentiles
Percentiles split the data into 100 parts.
(1) For ungrouped data, arrange the observations in order from smallest to
largest. The i
th
percentile, P
i
(for i =1, 2, ,99), is the i(n +1)/100
th
observation. It may be necessary to interpolate between successive val-
ues.
(2) For grouped data, apply equation (2.13) (as in equation (2.14)) for the

median to find the approximate percentiles. P
i
is in the class containing
the in/100
th
largest observation.
2.2.11 Mean deviation
The mean deviation is a measure of variability based on the absolute value of
the deviations about the mean or median.
(1) For ungrouped data:
MD =
1
n
n

i=1
|x
i
− x| or MD =
1
n
n

i=1
|x
i
− ˜x|. (2.15)
(2) For grouped data:
MD =
1

n
k

i=1
f
i
|x
i
− x| or MD =
1
n
k

i=1
f
i
|x
i
− ˜x|. (2.16)
2.2.12 Variance
The variance is a measure of variability based on the squared deviations about
the mean.
c
 2000 by Chapman & Hall/CRC
(1) For ungrouped data:
s
2
=
1
n − 1

n

i=1
(x
i
− x)
2
. (2.17)
The computational formula for s
2
:
s
2
=
1
n − 1


n

i=1
x
2
i

1
n

n


i=1
x
i

2


=
1
n − 1

n

i=1
x
2
i
− nx
2

. (2.18)
(2) For grouped data:
s
2
=
1
n − 1
k

i=1

f
i
(x
i
− x)
2
. (2.19)
The computational formula for s
2
:
s
2
=
1
n − 1


k

i=1
f
i
x
2
i

1
n

k


i=1
f
i
x
i

2


=
1
n − 1

k

i=1
f
i
x
2
i
− nx
2

.
(2.20)
(3) For coded data:
s
2

=
c
n − 1


k

i=1
f
i
u
2
i

1
n

k

i=1
f
i
u
i

2


. (2.21)
2.2.13 Standard deviation

The standard deviation is the positive square root of the variance: s =

s
2
.
The probable error is 0.6745 times the standard deviation.
2.2.14 Standard errors
The standard error of a statistic is the standard deviation of the sampling dis-
tribution of that statistic. The standard error of a statistic is often designated
by σ with a subscript indicating the statistic.
2.2.14.1 Standard error of the mean
The standard error of the mean is used in hypothesis testing and is an indi-
cation of the accuracy of the estimate
x.
SEM = s/

n. (2.22)
c
 2000 by Chapman & Hall/CRC

×