Tải bản đầy đủ (.pdf) (303 trang)

Sensory Discrimination Tests and Measurements- Statistical Principles, Procedures and Tables 2006

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.81 MB, 303 trang )

Sensory Discrimination
Tests and Measurements
Statistical Principles, Procedures and Tables


Sensory Discrimination
Tests and Measurements
Statistical Principles, Procedures and Tables

Jian Bi
Sensometrics Research and Service
Richmond, Virginia, USA


Jian Bi is a Senior Statistician and the President of
Sensometrics Research and Service, Richmond,
Virginia.
C 2006 Jian Bi
All rights reserved

Blackwell Publishing Professional
2121 State Avenue, Ames, Iowa 50014, USA
Orders: 1-800-862-6657
Office: 1-515-292-0140
Fax: 1-515-292-3348
Web site: www.blackwellprofessional.com
Blackwell Publishing Ltd
9600 Garsington Road, Oxford OX4 2DQ, UK
Tel.: +44 (0)1865 776868
Blackwell Publishing Asia
550 Swanston Street, Carlton, Victoria 3053, Australia


Tel.: +61 (0)3 8359 1011
Authorization to photocopy items for internal or
personal use, or the internal or personal use of specific
clients, is granted by Jian Bi, provided that the base
fee of $.10 per copy is paid directly to the Copyright
Clearance Center, 222 Rosewood Drive, Danvers,

MA 01923. For those organizations that have been
granted a photocopy license by CCC, a separate
system of payments has been arranged. The fee codes
for users of the Transactional Reporting Service are
ISBN-13: 978-0-8138-1111-6; ISBN-10:
0-8138-1111-2/2005 $.10.
First edition, 2006
Library of Congress Cataloging-in-Publication Data
Bi, Jian, 1949–
Sensory discrimination tests and measurements :
statistical principles, procedures, and tables / Jian Bi.–
1st ed.
p. cm.
Includes bibliographical references.
ISBN-13: 978-0-8138-1111-6 (alk. paper)
ISBN-10: 0-8138-1111-2
1. Agriculture—Statistical methods. 2. Sensory
discrimination—Statistical methods. I. Title.
S566.55B55 2006
630 .72 7—dc22
2005017101
The last digit is the print number: 9 8 7 6 5 4 3 2 1



To Yulin


Contents

Preface
1 Introduction
1.1 A Brief Review of Sensory Analysis Methodologies
1.2 Method, Test, and Measurement
1.3 Standard Discrimination Methods
1.4 Classification of Sensory Discrimination Methods
References

ix
1
1
2
2
3
5

2 Standard Discrimination Tests
2.1 Binomial Model for Discrimination Testing
2.2 Discrimination Tests Using Forced-Choice Methods
2.3 Discrimination Tests Using the Methods with Response Bias
References

6
6

7
12
20

3 Statistical Power Analysis for Standard Discrimination Tests
3.1 Introduction
3.2 Power and Sample Size for Forced-Choice Methods
3.3 Power and Sample Size for the Methods with Response Bias
3.4 Efficiency Comparisons of Discrimination Tests
References

21
21
22
27
38
44

4 Modified Discrimination Tests
4.1 The Modified Triangle Test
4.2 The Degree of Difference Test
4.3 The Double Discrimination Test
4.4 The Preference Test with “No Preference” Option
References

45
45
56
61
72

76

5 Multiple-Sample Discrimination Tests
5.1 Multiple-Sample Comparison Based on Proportions
5.2 Multiple-Sample Comparison Based on Ranks
5.3 Multiple-Sample Comparison Based on Categorical Scales
References

78
78
82
98
104

6 Replicated Discrimination Tests: Beta-Binomial Model
6.1 Introduction
6.2 The Beta-Binomial Distribution
6.3 Estimation of Parameters of Beta-Binomial Model

106
106
108
109
vii


viii

contents
6.4 Applications of Beta-Binomial Model in Replicated Tests

6.5 Testing Power and Sample Size for Beta-Binomial Tests
References
Appendix 6A

113
122
127
129

7

Replicated Discrimination Tests: Corrected Beta-Binomial Model
7.1 Introduction
7.2 The Corrected Beta-Binomial Distribution
7.3 Estimation of Parameters of Corrected Beta-Binomial Model
7.4 Statistical Testing for Parameters in Corrected Beta-Binomial Model
7.5 Testing Power and Sample Size
References
Appendix 7A

138
138
138
142
146
148
150
151

8


Replicated Discrimination Tests: Dirichlet-Multinomial Model
8.1 The Dirichlet-Multinomial Distribution
8.2 Estimation of Parameters of Dirichlet-Multinomial Model
8.3 Applications of DM model in Replicated Tests
8.4 Testing Power for Dirichlet-Multinomial Model
References

163
163
165
167
179
182

9

Measurements of Sensory Difference: Thurstonian Model
9.1 Introduction
9.2 Thurstonian ␦
9.3 Variance of d
9.4 Tables for d and Variance of d
References

184
184
185
190
237
240


10

Statistical Analysis for d Data
10.1 Estimates of Population or Group d
10.2 Statistical Inference for d Data
References

243
243
248
254

11

Similarity Testing
11.1 Introduction
11.2 Similarity Testing for Preference
11.3 Similarity Testing Using Forced-Choice Methods
11.4 Similarity Testing Using the A–Not A and the
Same–Different Methods
References
Appendix 11A

255
255
256
259

Appendix A


List of S-Plus Codes

Author Index
Subject Index

261
268
269
287
289
293


Preface

Discriminative analysis, including discrimination tests and measurements, is the most fundamental type of methodology in sensory science. The validation of the methodology depends to some extent on sound statistical models. The objective of this book is to deal with
statistical aspects of the methodology and to provide the reader with statistical principles,
procedures and tables for some methods. The book attempts to give a unified picture of
the state of the subject and to reflect some features of advanced sensory discriminative
analysis.
This book consists of eleven chapters. It is organized as follows:
Chapter 1 briefly reviews sensory methodologies with emphasis on six standard, widely
used discrimination methods: the 2-AFC, 3-AFC, Duo–Trio, Triangle, A–Not A, and the
Same–Different methods.
Chapters 2 to 5 discuss discrimination testing including standard discrimination tests
(Chapters 2–3), modified discrimination tests (Chapter 4), and multiple-sample discrimination tests (Chapter 5) under the conventional assumption that the consumer population is
composed of “discriminator” and “non-discriminator” and panelists of a laboratory panel
have the same discrimination ability.
Chapters 6 to 8 present a unified approach to replicated discrimination tests using a

beta-binomial framework under the assumption that discrimination ability or preference
for each individual consumer and panelist is not a constant but a random variable. The
assumptions under discrimination testing discussed in Chapters 2 to 5 and Chapters 6 to 8
are philosophically different.
Chapters 9 to 10 are devoted to a discussion on sensory measurement using Thurstonian
discriminal distance ␦ (or d ).
Chapter 11, the last chapter, discusses similarity testing, which is practically and theoretically important but often confusing.
The book is intended for researchers and practitioners in the sensory and consumer field
and has been written keeping both the statistical and non-statistical readers in mind. It is
not difficult to apply most of the methods by following the numerical examples using the
corresponding formulas and tables provided in the book. For some of the methods involving complicated calculations, computer programs are needed. Thanks to modern computer
technology, calculations are much easier than before. The extent of computational complication involved in a method should not be regarded as a major concern in the selection
of methods. For some statistical considerations behind the methodology and some mathematical derivations in the book, readers with a more statistical background will understand
them without major difficulty.
Some S-PLUS codes, which appear in the book and are listed in Appendix A, are available
from the author on request. The author may be contacted via e-mail at

ix


x

preface

Acknowledgments

I am greatly indebted to the Series Editor, Dr. Max Gacula, who encouraged me to write
this book, reviewed the manuscript, and provided insightful comments.
I wish to express my gratitude to Professor Edgar Chambers, Dr. Morten Meilgaard,
Professor Michael O’Mahony, and Dr. Daniel Ennis for their valuable support and help for

the past years.
I would like to thank the publisher and my editors Mark Barrett, Dede Pedersen, Susan
Borts, and Judi Brown at Blackwell Publishing and Suditi Srivastava at TechBooks for
publishing my book and bringing the project to completion.
Finally, I wish to thank deeply my wife, Yulin, for her patience, understanding, and
encouragement during the preparation of this book.
Jian Bi


Sensory Discrimination Tests and Measurements: Statistical Principles, Procedures and Tables
Jian Bi
Copyright © 2006 by Jian Bi

1 Introduction

1.1 A brief review of sensory analysis methodologies

To conduct valid tests and to provide reliable sensory measurements are the main functions of
sensory analysis. Statistical inference is the theoretical basis of sensory tests. Psychometrics,
which provides invariable indexes, which is independent of methods, is the theoretical basis
of sensory measurements.
Sensory analysis can be divided into two parts: laboratory sensory analysis and consumer
sensory analysis. In the laboratory sensory analysis, a trained panel is used as an analytical
instrument to measure sensory properties of products. In the consumer sensory analysis,
a sample of specified consumer population is used to test and predict consumer responses
for products. The two types of sensory analysis have different goals and functions, but they
share some of the same methodologies.
Discriminative analysis and descriptive analysis are the main classes of methodology
for both the laboratory and consumer sensory analyses. Discriminative analysis includes
discrimination tests and measurements. Discrimination tests are used to determine, usually

using a 2-point scale or a ranking scale, whether a difference exists between treatments
for confusable sensory properties of products. Discrimination measurements are used to
measure, using an index, the extent of the difference. There are two sources of sensory
differences: intensity and preference. Discriminative analysis is referred to difference test
when testing difference of intensity. Discriminative analysis is referred to preference test
when testing difference of preference. Descriptive analysis is to determine, using a rating scale, how much a specific characteristic difference exists among products, which is
quantitative descriptive analysis, and to characterize a product’s sensory attributes, which
is qualitative descriptive analysis. Quantitative descriptive analysis for preference is also
called acceptance testing.
Acceptance or preference testing for a laboratory panel is of very limited value
(Amerine et al. 1965). However, the consumer discriminative and descriptive analyses for
both intensity and references are valuable. The laboratory difference testing, using a trained
panel under controlled conditions, has been called the Sensory Evaluation I, whereas the
consumer difference testing, using a sample of untrained consumers under ordinary using
(eating) conditions, has been called the Sensory Evaluation II (O’Mahony 1995). They are
different types of difference testing. Misusing the two types of difference testing will lead
to misleading conclusions. The controversy over whether the consumer can be used for
difference testing may ignore the fact that the laboratory and consumer difference tests
have different goals and functions.
The distinction between the discriminative analysis and the quantitative descriptive analysis is not absolute from the viewpoint of modern sensory analysis. The Thurstonian model
that will be discussed in Chapters 9–10 of this book can be used for both discriminative
1


2

sensory discrimination tests and measurements

analysis and quantitative descriptive analysis. The Thurstonian ␦ (or d ), which is a measure
of sensory difference, can be obtained from any kind of scales used in discriminative and

descriptive analyses. In addition, rating scale, which is typically used in descriptive analysis,
is also used in some modified discrimination tests.
Besides discriminative analysis and descriptive analysis, there are other classes of
sensory methodologies, i.e., sensitivity analysis, time-intensity (TI) analysis, and similarity
testing. Sensitivity analysis is to determine sensory thresholds, including individual and
population thresholds. Threshold is a statistical concept. It is an intensity that produces a
response with a 0.5 probability. There are many specific statistical methods for estimating
and testing thresholds (for review, see, e.g., Bi and Ennis 1997). Time-intensity analysis or
shelf-life analysis is used to determine the relationship between sensory intensity and time.
Survival analysis, which is a well-developed field, provides sound statistical methodology
for TI analysis. Time-intensity analysis is conventionally included in the descriptive analysis. Considering the specifications of the methodology, it seems that TI analysis should
be separated from the conventional descriptive analysis. Similarity testing is relatively new
and is not well developed in the sensory field. Unlike discrimination testing, the objective of
similarity testing is to demonstrate similarity rather than difference. Similarity testing uses
the same sensory analysis methods for discrimination tests, but different statistical models.
This book is primarily concerned with methodology, mainly in statistical aspects, of
sensory discriminative analysis including laboratory and consumer discriminative analyses.
Similarity testing is briefly discussed in Chapter 11.
1.2 Method, test, and measurement

In this book, a distinction is made among the three terms: “sensory discrimination method,”
“sensory discrimination test,” and “sensory discrimination measurement.”
In sensory discriminative analysis, some procedures are used for experiments. The procedures are called discrimination methods, e.g., the Duo–Trio method, the Triangular method.
When the discrimination procedures are used for statistical hypothesis testing, or in other
words, when statistical testing is conducted for the data from a discrimination procedure,
the procedure is called discrimination testing, e.g., the Duo–Trio test, the Triangular test.
When the discrimination procedures are used for measurement, or in other words, when an
index, e.g., Thurstonian ␦ (or d ), is produced using the data from a discrimination procedure, the procedure is called discrimination measurement, e.g., the Duo–Trio measurement,
the Triangular measurement.
1.3 Standard discrimination methods


Six standard and basic discrimination methods are the focus of this book. They are:
(a) The 2-Alternative Forced-Choice method (2-AFC) (Green and Swets 1966): This
method is also called the paired comparison method (Dawson and Harris 1951,
Peryam 1958). In this method, the panelist receives a pair of coded samples, A
and B, for comparison on the basis of some specified sensory characteristic. The
possible pairs are AB and BA. The panelist is asked to select the sample with the


introduction

(b)

(c)

(d)

(e)

(f)

3

strongest (or the weakest) sensory characteristic. The panelist has to select one even
if he or she cannot detect the difference.
The 3-Alternative Forced-Choice method (3-AFC) (Green and Swets 1966): Three
samples of two products A and B are presented to each panelist. Two of them are the
same. The possible sets of samples are AAB, ABA, BAA; or ABB, BAB, BBA. The
panelist is asked to select the sample with the strongest or the weakest characteristic.
The panelist has to select one sample even if he or she cannot identify the one with

the strongest or the weakest sensory characteristic.
The Duo–Trio method (Dawson and Harris 1951, Peryam 1958): Three samples of
two products A and B are presented to each panelist. Two of them are the same.
The possible sets of samples are AAB, ABA, ABB, BAA, BAB, and BBA. The
first one is labeled as the “control.” The panelist is asked which one in the two test
samples is the same as the control sample. The panelist has to select one sample to
match the “control” sample even if he or she cannot identify which one is the same
as the control sample.
The Triangular (Triangle) method (Dawson and Harris 1951, Peryam 1958): Three
samples of two products A and B are presented to each panelist. Two of them are
the same. The possible sets of samples are AAB, ABA, BAA, ABB, BAB, and
BBA. The panelist is asked to select the odd sample. The panelist has to select one
sample even if he or she cannot identify the odd one.
The A–Not A method (Peryam 1958): Familiarize the panelists with the samples
“A” and “Not A.” One sample which is either “A” or “Not A” is presented to each
panelist. The panelist is asked if the sample is “A” or “Not A.”
The Same–Different method (see, e.g., Pfaffmann 1954, Amerine et al. 1965,
Macmillan and Kaplan 1977, Meilgaard et al. 1991, among others, for the method
in different names): A pair of samples is presented to each panelist. The pair is one
of the four possible sample pairs: AA, BB, AB, and BA, where A and B are the
two products for comparison. The panelist is asked if the sample pair that he or she
received is the same or different.

1.4 Classification of sensory discrimination methods

Sensory discrimination methods are typically classified according to the number of samples presented for evaluation, i.e., the single sample (stimulus), the two samples, the three
samples, and the multiple samples. This classification is natural, but it does not reflect the
inherent characteristic in the methods. In this book, the discrimination methods are classified according to the decision rules and cognitive strategies involved in the methods. This
kind of classification may be more reasonable and profound. In the following chapters, we
will see how the methods in the same class correspond to the same type of statistical models

and decision rules.
1.4.1 Methods requiring and not requiring the nature of difference

There are two different types of instructions in the discrimination method. One type of
instruction is to ask the panelists to indicate the nature of difference in the products for


4

sensory discrimination tests and measurements

evaluation, e.g., “Which sample is sweeter?” (the 2-AFC and 3-AFC methods); “Is the
sample A or Not A?” (the A–Not A method). The other type of instruction is related to
the comparison of distance of difference, e.g., “Which of the two test samples is same as
the control sample?” (the Duo–Trio method); “Which sample is the odd one in the three
samples?” (the Triangular method); “Are the two samples the ‘same’ or ‘different’?” (the
Same–Different method). The two types of instructions involve different cognitive strategies
and result in different proportions of correct responses. Hence the discrimination methods
can be divided into these two types: the methods using the “skimming” strategy and the
methods using the “comparison of distance” strategy (O’Mahony et al. 1994).
1.4.2 Methods with and without response bias

Response bias is a basic problem with sensory discrimination methods. Many authors, e.g.,
Torgerson (1958), Green and Swets (1966), Macmillan and Creelman (1991), O’Mahony
(1989, 1992, 1995), addressed this problem. Sensory discrimination methods are designed
for detection and measurement of confusable sensory differences. There is no response bias
if the difference is large enough. However, response bias may occur when the difference
between two products is so small that a panelist makes an unsure judgment. In this situation,
the decision criterion of how large a difference can be judged as a difference may take a
role in the decision process. Criterion variation, i.e., strictness or laxness of criterion causes

response bias. A response bias is a psychological tendency to favor one side of a criterion.
Response bias is independent of sensitivity. This is why the methods with response bias
(e.g., the A–Not A and the Same–Different methods) can also be used for difference testing.
However, response bias affects test power. The influence of response bias on difference
testing will be discussed in Chapter 3.
Forced-choice procedures can be used to stabilize decision criterion. Hence most sensory
discrimination methods are designed in a forced-choice procedure. A forced-choice procedure must have at least three characteristics: (1) Two sides of a criterion must be presented
in a forced-choice procedure. The two sides of a criterion may be “strong” and “weak,” if
the criterion is about the nature of the difference of products. The two sides of a criterion
may be “same” and “different,” if the criterion is about the distance of the difference of
products. Because a single sample or a same type of sample cannot contain two sides of a
criterion, evaluating a single sample or same type of samples is not a forced-choice procedure. Because a single pair of samples or a same type of sample pairs cannot contain two
sides of a criterion about the distance of a difference, evaluating a single sample pair or a
same type of sample pairs is not a forced-choice procedure, either. (2) A panelist should
be instructed that the samples presented for evaluation contain the two sides of a criterion.
(3) A response must be given in terms of one clearly defined category. The “don’t know”
response is not allowed.
In the six standard and basic sensory discrimination methods, the 2-AFC, 3-AFC, Triangular, and Duo–Trio methods are the forced-choice methods. In the 2-AFC and 3-AFC
methods, the criterion is about the nature of the difference for products. Two and three
samples that contain two products are presented and instructed to a panelist in the methods.
A panelist is asked to select the sample with the “strong” or the “weak” sensory property,
even if the panelist cannot detect the difference. In the Duo–Trio and Triangular methods,
the criterion is about comparison of distance of difference. A “same” sample pair and an


introduction
Table 1.1

5


A two-way classification of six standard and basic sensory discrimination methods

Without response bias
(Forced-choice procedure)
With response bias

Requiring the nature
of difference

Comparing distance
of difference

2-AFC
3-AFC
A–Not A

Duo–Trio
Triangular
Same–Different

“odd” sample are composed of the samples presented in the methods. A panelist is asked
to select the odd sample, even if he or she cannot find the odd sample.
In the six standard and basic sensory discrimination methods, the A–Not A method and
the Same–Different method are the methods with response bias, because only one sample,
either sample A or Not A, is presented to a panelist in the A–Not A method; and only one
sample pair, either a concordant sample pair or a discordant sample pair, is presented to a
panelist in the Same–Different method.
The six standard and basic sensory discrimination methods are classified based on response bias and strategies for determination of difference. Table 1.1 gives a two-way classification for the methods.

References

Amerine, M. A., Pangborn, R. M. and Roessler, E. B. 1965. Principles of Sensory Evaluation of Food. Academic
Press, New York, NY.
Bi, J. and Ennis, D. M. 1998. Sensory threshold: Concepts and methods. Journal of Sensory Studies 13, 133–148.
Dawson, E. H. and Harris, B. L. 1951. Sensory methods for measuring differences in food quality. Agriculture
Information Bulletin 34, US Department of Agriculture, Washington, DC.
Green, D. M. and Swets, J. A. 1966. Signal Detection – Theory and Psychophysics. John Wiley, New York.
Macmillan, N. A. and Kaplan, H. L. 1977. The psychophysics of categorical perception. Psychological Review
84, 452–471.
Macmillan, N. A. and Creelman, C. D. 1991. Detection Theory: A User’s Guide. Cambridge University Press,
New York.
Meilgaard, M., Civille, G. V. and Carr, B. T. 1991. Sensory Evaluation Techniques (2nd ed.), CRC Press, Boca
Raton, FL.
O’Mahony, M. 1989. Cognitive aspects of difference testing and descriptive analysis: Criterion variation and
concept formation. In Psychological Basis of Sensory Evaluation, eds R. L. McBride and H. J. H. MacFie.
Elsevier Applied Science, New York, pp. 177–139.
O’Mahony, M. 1992. Understanding discrimination tests: A user-friendly treatment of response bias, rating and
ranking R-index tests and their relationship to signal detection. Journal of Sensory Studies 7, 1–47.
O’Mahony, M. 1995. Sensory measurement in food science: Fitting methods to goals. Food Technology 49, 72–82.
O’Mahony, M., Susumu, M. and Ishii, R. 1994. A theoretical note on difference tests: Methods, paradoxes and
cognitive strategies. Journal of Sensory Studies 9, 247–272.
Peryam, D. R. 1958. Sensory difference tests. Food Technology 12, 231–236.
Pfaffmann, C. 1954. Variables affecting difference tests. In Food Acceptance Testing Methodology, A Symposium.
National Academy of Science and National Research Council, Washington, DC, pp. 4–20.
Torgerson, W. S. 1958. Theory and Methods of Scaling. John Wiley, New York.


Sensory Discrimination Tests and Measurements: Statistical Principles, Procedures and Tables
Jian Bi
Copyright © 2006 by Jian Bi


2 Standard discrimination tests

Discrimination testing is one of the main functions of discriminative analysis. It includes
difference testing and preference testing. In this chapter, the standard discrimination tests,
i.e., the discrimination testing using six standard discrimination methods under conventional
conditions will be discussed. All the six methods can be used for difference testing. Of these
six, only the paired comparison method (2-Alternative Forced-Choice method) can be used
for both difference testing and preference testing.

2.1 Binomial model for discrimination testing

Discrimination testing is assumed to be involved in a binomial experiment. The number of
correct responses in a discrimination testing is assumed to be a binomial variable following
a binomial distribution. In this section, the validity of using the binomial model for a
discrimination testing will be discussed.
Binomial experiment A binomial experiment possesses the following properties:
(a)
(b)
(c)
(d)

The experiment consists of n trials.
Each response is a binary variable that may be classified as a success or a failure.
The trials are independent.
The probability of success, denoted by p, remains constant from trial to trial.

Binomial variable The number of successes in n trials of a binomial experiment is called
a binomial variable, X , which follows a binomial distribution.
Binomial distribution The probability that there are exactly x successes in n independent
trials in a binomial experiment is given by the probability function

P(x; p, n) =

n
x

p x (1 − p)n−x ,

x = 0, 1, 2, . . . , n.

(2.1.1)

The cumulative distribution function is given by
x

F(x) =
k=0

n
i

p k (1 − p)n−k .

(2.1.2)

The parameters of the binomial distribution are n and p. The mean is E(X ) = np and the
variance is Var(X ) = np(1 − p).
In a standard discrimination testing, n responses (trials) are obtained from n panelists.
Each panelist gives only one response so that the n responses can be regarded as independent
of each other. The response of each panelist is a binary variable because each response
results in one of two possible outcomes and the “no difference” response is not allowed

6


standard discrimination tests

7

in the tests. Obviously, the first three properties of a binomial experiment are satisfied in a
standard discrimination testing. The question that often arises is how to understand the fourth
property of a binomial experiment in a standard discrimination testing. The question, how
to understand each panel, has the same probability of correct responses. The conventional
assumption for a consumer discrimination testing is that a consumer panel is a representative
sample of a specific consumer population. Consumers in a specific population are divided
into discriminator and nondiscriminator for the products compared. Because each panelist
has the same probability of becoming a discriminator, it is equivalent to that each panelist
has the same probability of correct responses. For a laboratory panel, which is regarded
as an instrument and is not a sample of consumer population any more, the underling
assumption is that the panelists have the same discrimination ability. Hence each panelist
can be assumed to have the same probability of correct responses.
The conventional sensory difference and preference tests are based on statistical hypothesis testing for proportions. For the forced-choice methods, the testing involves comparison
of a proportion with a specified value. For the methods with response bias, the testing mainly
involves comparison of two proportions.
2.2 Discrimination tests using forced-choice methods
2.2.1 Guessing model
2.2.1.1 Guessing model for difference tests There is a guessing model for a difference test
using a forced-choice method. The guessing model indicates the relationship among three
quantities – probability of correct responses or preference, pc , probability of correct guess,
p0 , and proportion of discriminators (for consumer discrimination testing) or probability of
discrimination (for laboratory discrimination testing), pd :


pc = pd + p0 (1 − pd ).

(2.2.1)

If the two products are the same, the probability of a correct response for each panelist
should be a chance probability ( p0 ) in a forced-choice method. Otherwise, if the two products
are different, a discriminator gives a correct response with a probability of 1, whereas a
nondiscriminator gives a correct response with a chance probability p0 . There is a pd
probability to get a consumer panelist who is just a discriminator and there is a 1 − pd
probability to get a consumer panelist who is just a non-discriminator. According to the
theorem on total probabilities,1 the probability of a correct response or preference for each
consumer panelist should be as given in (2.2.1). The similar situation is for a laboratory
panelist. For each trained panelist, the probabilities of discrimination and non-discrimination
are pd and 1 − pd , respectively. If the panelist can discriminate the products, the probability
of a correct response is 1, whereas if the panelist cannot discriminate the products, the
probability of a correct response is the guessing probability. Hence the probability of a
correct response for each trained panelist should also be as given in (2.2.1) according to the
theorem on total probabilities.
1

Theorem on total probabilities: If an arbitrary event E intersects the mutually exclusive and collectively exhaustive
event Ai , then the probability of event E is P(E) = i P(Ai )P(E/Ai ), where P(E/Ai ) is the conditional
probability of E at the condition Ai (see, e.g., Sachs 1982).


8

sensory discrimination tests and measurements

2.2.1.2 Guessing model for preference testing The guessing model for the consumer preference testing is different from that for the difference testing. There are two independent

proportions, pa and pb , which denote the proportions of consumers preferring product
A and B, respectively, in a consumer population. It is assumed that pa + pb ≤ 1 and
pn = 1 − pa − pb is the proportion of consumers with no preference. A consumer panelist
should give response “A” with probability 1, if he or she prefers A; should give response
“A” with probability 0, if he or she prefers B; should give response “A” with probability
0.5, if he (or she) has genuinely no preference, but “No preference” option is not allowed
in a test. Hence the total probability of preferring A in a preference test should be

PA = pa + pn /2 = (1 + pa − pb )/2.

(2.2.2)

The total probability of preferring B in a preference test should be
PB = 1 − PA = (1 − pa + pb )/2.

(2.2.3)

It should be noted that (2.2.2) and (2.2.3) are not independent of each other.
2.2.2 Hypothesis test for discrimination
2.2.2.1 Null and alternative hypotheses Testing whether there is a difference between
two products is the same as testing if pd = 0 or pc = p0 . Hence discrimination tests using
a forced-choice method involve comparison of one proportion with a fixed value, i.e., p0
= 0.5 for the 2-AFC and the Duo–Trio methods and p0 = 1/3 for the 3-AFC and the
triangular methods. The null hypothesis is H0 : pc = p0 and the alternative hypothesis is
H1 : pc > p0 for a one-sided test or H1 : pc = p0 for a two-sided test.
Testing whether there are different preferences for two products is same as testing if
pa = pb or PA = 0.5 (or PB = 0.5 ).
In discrimination testing, the objective is to reject the null hypothesis. If the null hypothesis is not rejected, it is inappropriate to conclude that the null hypothesis is proved or
established regardless of the sample size.
2.2.2.2 One-sided and two-sided tests There is only one-sided test situation for the

3-AFC, the Duo–Trio, and the triangular tests because only pc > p0 is possible and concerned when the null hypothesis is rejected. However, there are both one-sided and two-sided
testing situations for the preference and nondirectional 2-AFC tests. The choice depends on
the purpose of the experiment. For example, in a test for sweetness of two products (current
product and a new product), we know in advance that the new product contains more sugar
than the current product. In this situation, the one-sided test should be selected because only
one direction of possible difference is of interest. Or, for example, in a preference test for
two products, wherein we do not know in advance which one is more popular, the two-sided
test should be selected. The decision to use a one-sided or a two-sided test should be made
before the experiment.
2.2.2.3 Type I and type II errors In hypothesis testing two types of errors may be involved. A type I error has been committed if we reject the null hypothesis when it is
true. This error is denoted as ␣ and is also called significance level. ␣ = 0.1, 0.05, 0.01


standard discrimination tests

9

are conventionally selected. A type II error has been committed if we accept the null hypothesis when it is false. This error is denoted as ␤ and ␤ = 0.2, 0.1 are conventionally
selected.
2.2.2.4 Test statistic and critical value The test statistic based on the binomial distribution
in (2.1.2) is the number of correct responses, X . The critical values for one-sided and twosided tests are given in Table 2.1 according to
c
k=0

n
p0 k (1 − p0 )n−k ≥ 1 − ␣
i

(2.2.4)


n
p0 k (1 − p0 )n−k ≥ 1 − ␣/2,
i

(2.2.5)

and
c
k=0

where ␣ is the significance level and c is the critical value. Table 2.1 gives critical values
(c) for sample size n from 10 to 100, ␣ = 0.05 and 0.1 for the preference and nondirectional 2-AFC, directional 2-AFC and Duo–Trio, and one-sided 3-AFC and triangular tests,
respectively. If the observed number of correct responses or preference is larger than the
corresponding critical value, a conclusion of significant difference between the products
for comparison can be made.
If the sample size is outside the range of values given in Table 2.1, an appropriation of the
binomial distribution by the normal distribution can be used. The test statistic is as given in
(2.2.6), which follows approximately a standard normal distribution:
X − np0 − 0.5
.
Z= √
np0 (1 − p0 )

(2.2.6)

The critical values at ␣ = 0.01, 0.05, and 0.1 are 2.33, 1.65, and 1.28, respectively, for
the one-sided test and are 2.58, 1.96, and 1.65, respectively, for the two-sided test.
Example 2.2.1 For illustration of the procedures in this section, a numerical example
is given below. In order to determine if there is detectable difference between a current
product and an improved product for preference, 100 consumer panelists were drawn from

a consumer population of heavy users of the product and a significance level ␣ = 0.05
is selected. The test is two-sided because any one of the two products can be preferred.
The null hypothesis is H0 : pc = 0.5 and the alternative hypothesis is H1 : pc = 0.5. The
observed numbers of preference for the new product and the current product are 62 and
38, respectively. Because the larger number (62) of the two numbers (62 and 38) is larger
than corresponding critical value (61) in Table 2.1, a conclusion is drawn that there is a
significant difference between the two products for preference in the specific consumer
population at a 0.05 significance level. The consumer has a preference for the new product.
If the normal approximation is used, from (2.2.6)
62 − 100 × 0.5 − 0.5
Z=√
= 2.3 > 1.96.
100 × 0.5 × (1 − 0.5)
Hence the same conclusion can be drawn.


10

sensory discrimination tests and measurements

Table 2.1

Minimum number of correct responses for difference and preference tests using forced-choice

methods
2-AFC and Duo–Trio
(One-sided)

2-AFC (Two-sided)


3-AFC and Triangular
(One-sided)

N

␣ = 0.01

␣ = 0.05

␣ = 0.1

␣ = 0.01

␣ = 0.05

␣ = 0.1

␣ = 0.01

␣ = 0.05

␣ = 0.1

10
11
12
13
14
15
16

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

47
48
49
50
51
52
53
54
55
56
57
58
59

10
11
11
12
13
13
14
15
15
16
17
17
18
19
19
20

20
21
22
22
23
24
24
25
25
26
27
27
28
28
29
30
30
31
31
32
33
33
34
34
35
36
36
37
37
38

39
39
40
40

9
10
10
11
12
12
13
13
14
15
15
16
17
17
18
18
19
20
20
21
21
22
23
23
24

24
25
25
26
27
27
28
28
29
29
30
31
31
32
32
33
33
34
35
35
36
36
37
37
38

9
9
10
10

11
12
12
13
13
14
15
15
16
16
17
18
18
19
19
20
20
21
22
22
23
23
24
24
25
26
26
27
27
28

28
29
30
30
31
31
32
32
33
33
34
35
35
36
36
37

10
10
11
12
12
13
14
14
15
15
16
17
17

18
19
19
20
20
21
22
22
23
24
24
25
25
26
27
27
28
28
29
29
30
31
31
32
32
33
34
34
35
35

36
36
37
38
38
39
39

9
9
10
10
11
12
12
13
13
14
15
15
16
16
17
18
18
19
19
20
20
21

22
22
23
23
24
24
25
26
26
27
27
28
28
29
30
30
31
31
32
32
33
33
34
35
35
36
36
37

8

9
9
10
10
11
12
12
13
13
14
14
15
16
16
17
17
18
18
19
20
20
21
21
22
22
23
23
24
24
25

26
26
27
27
28
28
29
29
30
31
31
32
32
33
33
34
34
35
35

8
8
9
9
10
10
11
11
12
12

13
13
14
14
14
15
15
16
16
17
17
17
18
18
19
19
20
20
20
21
21
22
22
23
23
23
24
24
25
25

25
26
26
27
27
27
28
28
29
29

7
7
8
8
9
9
9
10
10
11
11
12
12
12
13
13
14
14
14

15
15
16
16
16
17
17
18
18
18
19
19
20
20
20
21
21
22
22
22
23
23
23
24
24
25
25
25
26
26

26

6
7
7
7
8
8
9
9
10
10
10
11
11
12
12
12
13
13
13
14
14
15
15
15
16
16
17
17

17
18
18
18
19
19
20
20
20
21
21
21
22
22
23
23
23
24
24
24
25
25


standard discrimination tests
Table 2.1

Contd
2-AFC and Duo–Trio
(One-sided)


2-AFC (Two-sided)
N
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110

11

3-AFC and Triangular
(One-sided)


␣ = 0.01

␣ = 0.05

␣ = 0.1

␣ = 0.01

␣ = 0.05

␣ = 0.1

␣ = 0.01

␣ = 0.05

␣ = 0.1

41
41
42
43
43
44
44
45
46
46
47
47

48
48
49
50
50
51
51
52
52
53
54
54
55
55
56
56
57
58
58
59
59
60
60
61
62
62
63
63
64
64

65
66
66
67
67
68
68
69
69

39
39
40
40
41
41
42
42
43
44
44
45
45
46
46
47
48
48
49
49

50
50
51
51
52
53
53
54
54
55
55
56
56
57
57
58
59
59
60
60
61
61
62
62
63
64
64
65
65
66

66

37
38
38
39
40
40
41
41
42
42
43
43
44
45
45
46
46
47
47
48
48
49
49
50
51
51
52
52

53
53
54
54
55
55
56
57
57
58
58
59
59
60
60
61
61
62
62
63
64
64
65

40
41
41
42
42
43

43
44
45
45
46
46
47
47
48
49
49
50
50
51
51
52
52
53
54
54
55
55
56
56
57
58
58
59
59
60

60
61
61
62
63
63
64
64
65
65
66
66
67
68
68

37
38
38
39
40
40
41
41
42
42
43
43
44
45

45
46
46
47
47
48
48
49
49
50
51
51
25
52
53
53
54
54
55
55
56
57
57
58
58
59
59
60
60
61

61
62
62
63
64
64
65

36
37
37
38
38
39
39
40
40
41
41
42
42
43
44
44
45
45
46
46
47
47

48
48
49
49
50
50
51
52
52
53
53
54
54
55
55
56
56
57
57
58
58
59
60
60
61
61
62
62
63


29
30
30
31
31
31
32
32
33
33
33
34
34
35
35
35
36
36
37
37
37
38
38
39
39
39
40
40
41
41

41
42
42
42
43
43
44
44
44
45
45
46
46
46
47
47
47
48
48
49
49

27
27
28
28
28
29
29
30

30
30
31
31
31
32
32
33
33
33
34
34
34
35
35
36
36
36
37
37
37
38
38
38
39
39
40
40
40
41

41
41
42
42
43
43
43
44
44
44
45
45
45

25
26
26
27
27
27
28
28
28
29
29
30
30
30
31
31

31
32
32
32
33
33
34
34
34
35
35
35
36
36
36
37
37
38
38
38
39
39
39
40
40
40
41
41
41
42

42
43
43
43
44


12

sensory discrimination tests and measurements

2.2.3 Parameter estimate
2.2.3.1 Estimate of proportion of discriminator or probability of discrimination Once we
have concluded that the two products for comparison are significantly different, we can
estimate the proportion of discriminators for the products in a specific consumer population
or the probability of discrimination for the products in a trained panel. We can get the
estimate of pd from

pˆ d =

pˆ c − p0
,
1 − p0

(2.2.7)

where pˆ c is the observed proportion of correct responses or preference, pˆ c = x/N .
An approximate 95% confidence interval for pd is given by
pˆ d ± 1.96 V ( pˆ d ),


(2.2.8)

where V ( pˆ d ) is the estimate of variance of pˆ d . According to the Taylor series, pˆ d = f ( pˆ c ) ≈
f ( pˆ c0 ) + f ( pˆ c0 )( pˆ c − pˆ c0 ), where pˆ c0 denotes an observation of pˆ c and f ( pˆ c0 ) denotes
the first derivative with respect to pˆ c evaluated at pˆ c0 . Hence Var( pˆ d ) = f 2 ( pˆ c0 )Var( pˆ c ),
i.e.,
V ( pˆ d ) =

pˆ c (1 − pˆ c )
1
.
2
(1 − p0 )
N

(2.2.9)



(1 − 0.62)/100
− 0.5
Example 2.2.2 For Example 2.2.1, pˆ d = 0.62
= 0.24, V ( pˆ d ) = 0.62 ×1−0.5
=
1−0.5

0.097, and pˆ d ± 1.96 V ( pˆ d ) = 0.24 ± 1.96 × 0.097 = (0.05, 0.43). This means that the
estimated proportion of discriminators for the two products is 0.24 and the 95% confidence
interval for the proportion is (0.05, 0.43).


We should interpret and use the estimate of pd with caution. The only difference between
pc and pd is that the guessing effect is included in pc and excluded in pd . The quantity pd
is the proportion of correct responses above chance. However, pd is still dependent on the
method used. It is not a pure index of difference or discrimination. We will discuss further
this problem in Chapter 9.
2.2.3.2 Estimate of proportions of preference It is often required to estimate proportions
of preference, pa and pb , from a preference test. However, it is clearly impossible to do
this with equation (2.2.2) or (2.2.3) for a conventional preference testing. There are two
independent parameters, but only one independent equation. In order to estimate pa and pb ,
a replicated test is needed. See Section 4.3 or 4.4 of Chapter 4 for estimates of pa and pb ,
using the data from a double preference testing without No preference option or two-visit
method with No preference option.

2.3 Discrimination tests using the methods with response bias

In the methods with response bias, there is no guessing probability as p0 in a forcedchoice method. This is the main distinction between the two types of methods. The data
for discrimination tests using the A–Not A or the same–different method can be set out


standard discrimination tests

13

in a fourfold table. However, there are different probability structures for the 2 × 2 tables
corresponding to different designs for the tests. The different design for the A–Not A method
will be illustrated below. The similar situation is for the same–different method when sample
A is defined as a pair of matched samples and Not A as a pair of unmatched samples.
2.3.1 Hypothesis test for the data from a monadic design

In the monadic design, each panelist receives only one sample, either A or Not A. The total

numbers of the panelists who receive sample A and sample Not A are fixed in advance of
an experiment. The 2 × 2 table for the data is given as Table 2.2.
Table 2.2

The 2 × 2 table for data from a monadic A–Not A test
Sample

Response
Total

“A”
“Not A”

A

Not A

Total

n 11
n 21
N1

n 12
n 22
N2

n1.
N − n 1.
N


In the monadic design, the purpose is to test if the proportion of “A” responses of the
panelists who receive sample A is the same as the proportion of “A” responses of the
panelists who receive sample Not A. This is a statistical comparison of two independent
proportions for two populations with sample sizes N1 and N2 , respectively.
The null hypothesis is
¯
H0 : pA = pN = p,
i.e., the proportion of “A” responses for sample A is equal to the proportion of “A” responses
for sample Not A. The alternative hypothesis is
H1 : pA > pN ,
i.e., the proportion of “A” responses for sample A is larger than the proportion of “A”
responses for sample Not A. This means that the two products are significantly different.
There are several test statistics that can be used for comparison of two independent
proportions.
2.3.1.1 Chi-square test for homogeneity The Pearson’s chi-square statistic is
2

2

␹ P2 =
j=1 i=1

(n i j − Eˆ i j )2
,
Eˆ i j

(2.3.1)

where n i j and Eˆ i j denote the observed frequencies and estimates of expected frequencies

in the cells of a 2 × 2 contingency table. This is a test of homogeneity when the sample
sizes for the A and Not A samples are specified in advance. Under the null hypothesis H0 :
¯ the best estimate of the probability of response “A” is pˆ¯ = (n 11 + n 12 )/N ,
pA = pN = p,


14

sensory discrimination tests and measurements

while the best estimate of the probability of response “Not A” is 1 − pˆ¯ = (n 21 + n 22 )/N .
Hence the best estimates of the frequencies for the responses in the four cells are
Eˆ 11 = N1 (n 11 + n 12 )/N
Eˆ 21 = N1 (n 21 + n 22 )/N

Eˆ 12 = N2 (n 11 + n 12 )/N
,
Eˆ 22 = N2 (n 21 + n 22 )/N

where N1 is the total number of responses for the A sample and N2 is the total number of
responses for the Not A sample. Pearson’s chi-square statistic, ␹ P2 , follows asymptotically
a ␹ 2 distribution with 1 degree of freedom. For the one-sided test at significance levels ␣ =
0.01, 0.05, and 0.1, the corresponding critical values for a ␹ 2 distribution with 1 degree of
2
2
2
= 5.4, ␹ 0.9
= 2.7, and ␹ 0.8
= 1.64, respectively.
freedom are ␹ 0.98

Yates’ continuity correction is often used for the data in a 2 × 2 contingency table. In
this case (2.3.1) becomes
2

2

␹ P2 =
j=1 i=1

(|n i j − Eˆ i j | − 0.5)2
.
Eˆ i j

(2.3.2)

Example 2.3.1 For the data in Table 2.3, Eˆ 11 = Eˆ 12 = 100 × (62 + 44)/200 = 53 and
Eˆ 21 = Eˆ 22 = 100 × (38 + 56)/200 = 47, hence, according to (2.3.1)
␹ P2 =

(44 − 53)2
(38 − 47)2
(56 − 47)2
(62 − 53)2
+
+
+
= 6.5 > 2.7.
53
53
47

47

The p-value corresponding to one-sided test is the half of the probability of a chi-square
distribution with 1 degree of freedom in the range of 6.5 and infinite. It is 0.01079/2 =
0.0054. The conclusion is that there is a significant difference between the two products at
a 0.05 significance level.
Using equation (2.3.2) with continuity correction, ␹ 2 = 5.8. The corresponding p-value
is 0.008.
Table 2.3

Data for Example 2.3.1
Sample

Response
Total

“A”
“Not A”

A

Not A

Total

62
38
100

44

56
100

200

2.3.1.2 Z-test for difference of two proportions The second test statistic to test whether or
not the two proportions in the two populations, from which we have samples, pA and pN ,
are equal is the Z statistic as (2.3.3) or (2.3.4) with continuity correction:

Z =
Z =

pˆ A − pˆ N
¯ˆ − p)(1/N
¯ˆ
p(1
A + 1/NN )
pˆ A − pˆ N − 0.5 × (1/NA + 1/NN )
.
¯ˆ − p)(1/N
¯ˆ
p(1
A + 1/NN )

(2.3.3)
(2.3.4)


standard discrimination tests


15

It can be proved that Z 2 in (2.3.3) or (2.3.4) is equal to the quantity in (2.3.1) or (2.3.2)
¯ under the null hypothesis is the
provided that the estimate of population parameter, p,
n 12
weighted mean of pˆ A and pˆ N , i.e., p¯ˆ = pˆ A NNA + pˆ N NNN = n 11 +
.
N
Example 2.3.2 For the data in Table 2.3, p¯ˆ =
Z=√

62 + 44
200

= 0.53. According to (2.3.3)

0.62 − 0.44
= 2.55 > z 0.95 = 1.64.
0.53 × (1 − 0.53) × (0.01 + 0.01)

The p-value is 0.0054. The same conclusion as for Example 2.3.1 can be drawn at a
significance level ␣ = 0.05. Using (2.3.4) with continuity correction, Z = 2.41 and the
p-value is 0.008.
2.3.1.3 Fisher’s exact test The third statistic for comparison of two independent proportions is Fisher’s exact test, which is also referred to as Fisher–Irwin test. It is noted that for
the data from a monadic design, i.e., for the data in a 2 × 2 table with fixed column totals,
both the chi-square statistic with 1 degree of freedom and the Z statistic are approximate
distributions. When the sample size is not large enough, the approximation is not satisfied.
Fisher (1934) and Irwin (1935) developed a test statistic based on the exact hypergeometric
distribution.

For given row and column marginal totals, the value in any one cell in a 2 × 2 table
determines the other three cell counts. The hypergeometric distribution expresses probability
for the four cell counts in terms of the count in one cell, e.g., the cell (1, 2) – response “A”
for Not A sample – alone. Under the null hypothesis, H0 : pA = pN , the probability of a
particular value x for that count in the cell equals

P(x) =

NA
n 1. − x

NN
x

N
n 1.

The binomial coefficients are
5
2

=

.

a
b

(2.3.5)


=

a!
,
b!(a − b)!

e.g.,

5×4×3×2×1
5!
=
= 10.
2! × (5 − 2)!
2×1×3×2×1

To test H0 : pA = pN against H1 : pA > pN , the p-value is the sum of hypothesis probabilities for outcomes having the same marginal totals, i.e.,
n 12

p=

P(x),

(2.3.6)

x=x0

where x0 is the possible minimum value in the cell (1, 2). x0 = 0 if n 1. − NA < 0, otherwise, x0 = n 1. − NA . If the p-value calculated from (2.3.6) is smaller than the specified
significance level, the null hypothesis can be rejected.



16

sensory discrimination tests and measurements

Example 2.3.3 For the data in Table 2.3, because n 1. − NA = 106 − 100 = 6 > 0, hence
x0 = 6. According to (2.3.6), the p-value is
44

P(x) =

p=
x=6

100
100
106 − 6
6
200
106

+ ··· +

100
106 − 44
200
106

100
44


= 0.008.

The p-value obtained from the Fisher’s exact test is the same as the results from the chisquare test and the Z -test with the continuity correction because the sample size in the
example is large enough.
2.3.2 Hypothesis test for the data from a mixed design

The main difference between the mixed design and the monadic design is that in the latter,
the numbers of sample A and Not A are fixed in advance. In the mixed design, only the
total numbers of the panelists are fixed in advance. Each panelist draws randomly a sample
(either A or Not A) from a sample pool. The number of samples in a sample pool should be
much larger than the number of panelists. We do not know in advance how many panelists
will receive sample A and sample Not A. In this design, both sample and response are
random variables. The 2 × 2 table for the data is given as Table 2.4.
Table 2.4

The 2 × 2 table for data from a mixed A–Not A test
Sample

Response
Total

“A”
“Not A”

A

Not A

Total


n 11
n 21
n 11 + n 21

n 12
n 22
n 12 + n 22

n 11 + n 12
n 21 + n 22
n

In the mixed design, the purpose is to test if the “A” or “Not A” response is associated
with the presentation of sample A or Not A. The statistical test is of independence for
two variables, X and Y , for one population with sample size N . Each variable has two
categories (0, 1). X = 1 means “A” response and X = 0 means “Not A” response. Y = 1
means sample A and Y = 0 means sample Not A. Each of the N panelists falls into one of
the four categories: (1, 1), (1, 0), (0, 1), and (0, 0), i.e., (“A”, A), (“A”, Not A), (“Not A”,
A), and (“Not A”, Not A).
The null hypothesis is that the two variables, i.e., the responses and the samples, are
independent of each other. It means that H0 : pi j = pi. p. j , i.e., each cell probability will
equal the product of its respective row and column probabilities. The alternative hypothesis
is H1 : pi j = pi. p. j , i.e., it suggests that there is some relationship between the samples and
responses. The larger the differences { pˆ i j − pˆ i. pˆ . j } or {n i j − Eˆ i j }, the stronger the evidence
against H0 . If the null hypothesis is rejected and the alternative hypothesis is accepted, it
suggests that the responses are not independent from the presentation of the samples. Hence
we can conclude that sample A and Not A are significantly different.
The test statistic is the same as (2.3.1) and (2.3.2) numerically. However, the statistical
interpretation and the derivation for the test statistics for independence test in a mixed design



standard discrimination tests

17

and the test for homogeneity in a monadic design are quite different. In addition, in the test
for homogeneity, the one-sided test is always used because pA < pN is not reasonable,
whereas in the test for independence, the two-sided test is always used. The one-sided test
2
should be selected for a ␣ significance level.
for homogeneity means the critical value ␹ 1−2␣
2
should be selected for a
The two-sided test for independence means the critical value ␹ 1−␣
␣ significance level.
Example 2.3.4 Two hundred panelists participated in a A–Not A test. A mixed design was
used. The results are displayed in Table 2.5. The chi-square test for independence, using
the same statistic as (2.3.1), shows that
␹2 =

(42 − 88 × 119/200)2
(35 − 81 × 112/200)2
(46 − 88 × 81/200)2
+
+
88 × 81/200
88 × 119/200
81 × 112/200
(77 − 119 × 112/200)2
= 3.01 + 2.05 + 2.37 + 1.61 = 9.04.

+
119 × 112/200

The associated p-value is 0.003. Thus we can conclude that at any reasonable significance
level the responses of the panelists and the samples are dependent. In other words, the two
products are significantly different.
Table 2.5

Data for Example 2.3.4
Sample

“A”
“Not A”

Response
Total

A

Not A

Total

46
42
88

35
77
112


81
119
200

2.3.3 Hypothesis test for the data from a paired design

In a paired design, each panelist of N panelists evaluates both sample A and sample Not
A, but the panelist should not be told that the samples evaluated are one sample A and one
Not A. The data can be summarized in Table 2.6. The purpose of the test is to compare
the two proportions: the proportion of response “A” for sample A and the proportion of
response “A” for sample Not A. Because each panelist evaluates both sample A and sample
Not A, the two proportions are not independent.
Table 2.6

The 2 × 2 table for data from a paired A–Not A test
Sample A

Sample Not A

“A”
“Not A”

“A”

“Not A”

a
c


b
d

Total

N


×