Statistics for
Business and Economics
7th Edition
Chapter 14
Analysis of Categorical Data
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-1
Chapter Goals
After completing this chapter, you should be able to:
Use the chi-square goodness-of-fit test to determine
whether data fits specified probabilities
Perform tests for the Poisson and Normal distributions
Set up a contingency analysis table and perform a chisquare test of association
Use the sign test for paired or matched samples
Recognize when and how to use the Wilcoxon signed
rank test for paired or matched samples
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-2
Chapter Goals
(continued)
After completing this chapter, you should be able to:
Use a sign test for a single population median
Apply a normal approximation for the Wilcoxon signed
rank test
Know when and how to perform a Mann-Whitney U-test
Explain Spearman rank correlation and perform a test
for association
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-3
Nonparametric Statistics
Nonparametric Statistics
Fewer restrictive assumptions about data
levels and underlying probability distributions
Population distributions may be skewed
The level of data measurement may only be
ordinal or nominal
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-4
Goodness-of-Fit Tests
14.1
Does sample data conform to a hypothesized
distribution?
Examples:
Do sample results conform to specified expected
probabilities?
Are technical support calls equal across all days of
the week? (i.e., do calls follow a uniform
distribution?)
Do measurements from a production process
follow a normal distribution?
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-5
Chi-Square Goodness-of-Fit Test
(continued)
Are technical support calls equal across all days of the
week? (i.e., do calls follow a uniform distribution?)
Sample data for 10 days per day of week:
Sum of calls for this day:
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
290
250
238
257
265
230
192
= 1722
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-6
Logic of Goodness-of-Fit Test
If calls are uniformly distributed, the 1722 calls
would be expected to be equally divided across
the 7 days:
1722
246 expected calls per day if uniform
7
Chi-Square Goodness-of-Fit Test: test to see if
the sample results are consistent with the
expected results
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-7
Observed vs. Expected
Frequencies
Observed
Oi
Expected
Ei
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
290
250
238
257
265
230
192
246
246
246
246
246
246
246
TOTAL
1722
1722
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-8
Chi-Square Test Statistic
H0: The distribution of calls is uniform
over days of the week
H1: The distribution of calls is not uniform
The test statistic is
2
(O
E
)
i
2 i
Ei
i1
K
(where d.f. K 1)
where:
K = number of categories
Oi = observed frequency for category i
Ei = expected frequency for category i
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-9
The Rejection Region
H0: The distribution of calls is uniform
over days of the week
H1: The distribution of calls is not uniform
2
(O
E
)
i
2 i
Ei
i 1
K
Reject H0 if
2
2
α
(with k – 1 degrees
of freedom)
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
0
2
Do not
reject H0
2
Reject H0
Ch. 14-10
Chi-Square Test Statistic
H0: The distribution of calls is uniform
over days of the week
H1: The distribution of calls is not uniform
2
2
2
(290
246)
(250
246)
(192
246)
2
...
23.05
246
246
246
k – 1 = 6 (7 days of the week) so
use 6 degrees of freedom:
2.05 = 12.5916
Conclusion:
2 = 23.05 > 2 = 12.5916 so
reject H0 and conclude that the
distribution is not uniform
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
= .05
0
Do not
reject H0
Reject H0
2.05 = 12.5916
2
Ch. 14-11
14.2
Goodness-of-Fit Tests, Population
Parameters Unknown
Idea:
Test whether data follow a specified distribution
(such as binomial, Poisson, or normal) . . .
. . . without assuming the parameters of the
distribution are known
Use sample data to estimate the unknown
population parameters
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-12
Goodness-of-Fit Tests, Population
Parameters Unknown
(continued)
Suppose that a null hypothesis specifies category
probabilities that depend on the estimation (from the
data) of m unknown population parameters
The appropriate goodness-of-fit test is the same as in
the previously section . . .
(Oi Ei )2
Ei
i1
2
K
. . . except that the number of degrees of freedom for
the chi-square random variable is
Degrees of Freedom (K m 1)
Where K is the number of categories
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-13
14.3
Test of Normality
The assumption that data follow a normal
distribution is common in statistics
Normality was assessed in prior chapters (for
example, with Normal probability plots in
Chapter 5)
Here, a chi-square test is developed
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-14
Test of Normality
(continued)
Two population parameters can be estimated using
sample data:
n
3
(x
x
)
i
Skewness i 1
ns3
n
4
(x
x
)
i
Kurtosis i 1
ns 4
For a normal distribution,
Skewness = 0
Kurtosis = 3
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-15
Jarque-Bera
Test for Normality
Consider the null hypothesis that the population
distribution is normal
The Jarque-Bera Test for Normality is based on the closeness the
sample skewness to 0 and the sample kurtosis to 3
The test statistic is
(Skewness)2 (Kurtosis 3)2
JB n
6
24
as the number of sample observations becomes very large, this
statistic has a chi-square distribution with 2 degrees of freedom
The null hypothesis is rejected for large values of the test statistic
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-16
Jarque-Bera
Test for Normality
(continued)
The chi-square approximation is close only for very
large sample sizes
If the sample size is not very large, the BowmanShelton test statistic is compared to significance points
from text Table 14.9
Sample
size N
10%
point
5% point
Sample
size N
10%
point
5% point
20
30
40
50
75
100
125
150
2.13
2.49
2.70
2.90
3.09
3.14
3.31
3.43
3.26
3.71
3.99
4.26
4.27
4.29
4.34
4.39
200
250
300
400
500
800
∞
3.48
3.54
3.68
3.76
3.91
4.32
4.61
4.43
4.61
4.60
4.74
4.82
5.46
5.99
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-17
Example: Jarque-Bera
Test for Normality
The average daily temperature has been recorded for
200 randomly selected days, with sample skewness
0.232 and kurtosis 3.319
Test the null hypothesis that the true distribution is
normal
(Skewness)2 (Kurtosis 3) 2
(0.232)2 (3.319 3) 2
JB n
200
2.642
6
24
6
24
From Table 14.9 the 10% critical value for n = 200 is
3.48, so there is not sufficient evidence to reject that the
population is normal
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-18
14.3
Contingency Tables
Contingency Tables
Used to classify sample observations according
to a pair of attributes
Also called a cross-classification or crosstabulation table
Assume r categories for attribute A and c
categories for attribute B
Then there are (r x c) possible cross-classifications
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-19
r x c Contingency Table
Attribute B
Attribute A
1
2
...
C
Totals
1
2
.
.
.
r
Totals
O11
O12
O1c
R1
O21
O22
O2c
R2
.
.
.
Or1
.
.
.
Or2
.
.
.
Orc
.
.
.
Rr
C1
C2
…
…
…
…
…
…
…
Cc
n
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-20
Test for Association
Consider n observations tabulated in an r x c
contingency table
Denote by Oij the number of observations in
the cell that is in the ith row and the jth column
The null hypothesis is
H0 : No association exists
between the two attributes in the population
The appropriate test is a chi-square test with
(r-1)(c-1) degrees of freedom
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-21
Test for Association
(continued)
Let Ri and Cj be the row and column totals
The expected number of observations in cell row i and
column j, given that H0 is true, is
Eij
R iC j
n
A test of association at a significance level is based
on the chi-square distribution and the following decision
rule
r
c
Reject H0 if χ 2
i1 j1
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
(Oij Eij )2
Eij
χ (r2 1)c 1),α
Ch. 14-22
Contingency Table Example
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
H0: There is no association between
hand preference and gender
H1: Hand preference is not independent of gender
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-23
Contingency Table Example
(continued)
Sample results organized in a contingency table:
Hand Preference
sample size = n = 300:
120 Females, 12
were left handed
180 Males, 24 were
left handed
Gender
Left
Right
Female
12
108
120
Male
24
156
180
36
264
300
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-24
Logic of the Test
H0: There is no association between
hand preference and gender
H1: Hand preference is not independent of gender
If H0 is true, then the proportion of left-handed females
should be the same as the proportion of left-handed
males
The two proportions above should be the same as the
proportion of left-handed people overall
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 14-25