Tải bản đầy đủ (.pdf) (624 trang)

New handbook of mathematical psychology vol 1 foundations and methodology (2016)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.22 MB, 624 trang )


New Handbook of Mathematical Psychology
The field of mathematical psychology began in the 1950s and includes
both psychological theorizing in which mathematics plays a key role, and
applied mathematics motivated by substantive problems in psychology.
Central to its success was the publication of the first Handbook of Mathematical Psychology in the 1960s. The psychological sciences have since
expanded to include new areas of research, and significant advances have
been made in both traditional psychological domains and in the applications of the computational sciences to psychology. Upholding the rigor of
the original Handbook, the New Handbook of Mathematical Psychology
reflects the current state of the field by exploring the mathematical and
computational foundations of new developments over the last half century. The first volume focuses on select mathematical ideas, theories, and
modeling approaches to form a foundational treatment of mathematical
psychology.
w il l i am h. batc h el d er is Professor of Cognitive Sciences at the
University of California Irvine.
hans coloni us is Professor of Psychology at Oldenburg University,
Germany.
eh ti bar n. d z h a fa rov is Professor of Psychological Sciences at
Purdue University.
jay myung is Professor of Psychology at Ohio State University.



New Handbook of
Mathematical Psychology
Volume 1. Foundations and Methodology

Edited by

William H. Batchelder
Hans Colonius


Ehtibar N. Dzhafarov
Jay Myung


University Printing House, Cambridge CB2 8BS, United Kingdom
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781107029088
© Cambridge University Press 2017
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2017
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalogue record for this publication is available from the British Library
ISBN 978-1-107-02908-8 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.


Contents

List of contributors
Preface
w i ll i am h . batche l de r, h a n s c o lo n i u s,

e h t i bar n. d zh afarov, and jay my u n g

page vii

ix

1 Selected concepts from probability

h a n s c o lo n i u s

1

2 Probability, random variables, and selectivity

e h t i bar n. d z h a farov an d jan n e k u ja la

85

3 Functional equations

c he tat ng

151

4 Network analysis

j o hn p. boyd and w illiam h . batche l de r

194


5 Knowledge spaces and learning spaces

jean -pau l d o i g n o n a n d je a n-clau de falmagn e

274

6 Evolutionary game theory

j. m c k e n z i e a l e xa n d e r

322

7 Choice, preference, and utility: probabilistic and deterministic
representations

a. a. j. m a r l ey a n d m i c h e l re g e n w ette r

374

8 Discrete state models of cognition

w i ll i am h . batche l de r

454

9 Bayesian hierarchical models of cognition

jeffrey n. ro u d e r, richard d. m o r ey,
a n d m i chael s. pr att e


504

10 Model evaluation and selection

jay m y u n g, dan i e l r. cavagnaro,
and m ark a. pi tt

552

Index

599
v



Contributors

j. mckenz i e al e xande r , London School of Economics (UK)
w i l l i a m h. batc h el d er , University of California at Irvine (USA)
j o h n p. boyd, Institute for Mathematical Behavioral Sciences, University of
California at Irvine (USA)
da niel r. c avagnaro, Mihaylo College of Business and Economics, California State University at Fullerton (USA)
h ans colo ni us, Oldenburg University (Germany)
j e a n-paul doignon, Département de Mathématique, Université Libre de
Bruxelles (Belgium)
ehtiba r n. d z h a fa rov, Purdue University (USA)
j e a n-claude fa l m agne, Department of Cognitive Sciences, University of
California at Irvine (USA)
ja nne v. k u ja la , University of Jyväskylä (Finland)

anth ony a . j. m a r l ey, Department of Psychology, University of Victoria
(Canada)
r i c h a r d d. m o rey, University of Groningen (The Netherlands)
jay myung, Ohio State University (USA)
c h e tat n g, Department of Pure Mathematics, University of Waterloo (Canada)
m a r k a . pi tt, Department of Psychology, Ohio State University (USA)
m i c h a e l s. p r atte, Department of Psychology, Vanderbilt University (USA)
michel regenwetter , Department of Psychology, University of Illinois at
Urbana-Champaign (USA)
jeffrey n. rou de r , Department of Psychological Sciences, University of
Missouri (USA)

vii



Preface

About mathematical psychology
There are three fuzzy and interrelated understandings of what mathematical psychology is: part of mathematics, part of psychology, and analytic methodology. We call them “fuzzy” because we do not offer a rigorous way of defining
them. As a rule, a work in mathematical psychology, including the chapters of this
handbook, can always be argued to conform to more than one if not all three of
these understandings (hence our calling them “interrelated”). Therefore, it seems
safer to think of them as three constituents of mathematical psychology that may
be differently expressed in any given line of work.

1. Part of mathematics
Mathematical psychology can be understood as a collection of mathematical developments inspired and motivated by problems in psychology (or at least those traditionally related to psychology). A good example for this is the algebraic theory
of semiorders proposed by R. Duncan Luce (1956). In algebra and unidimensional
topology there are many structures that can be called orders. The simplest one is

the total, or linear order (S, ), characterized by the following properties: for any
a, b, c ∈ S,
(O1)
(O2)
(O3)

a b or b a;
if a b and b c then a c;
if a b and b a then a = b.

The ordering relation here has the intuitive meaning of “not greater than.” One
can, of course, think of many other kinds of order. For instance, if we replace the
property (O1) with
(O4)

a

a,

we obtain a weaker (less restrictive) structure, called a partial order. If we add
to the properties (O1–O3) the requirement that every nonempty subset X of S
possesses an element aX such that aX a for any a ∈ X , then we obtain a stronger
(more restrictive) structure called a well-order. Clearly, one needs motivation for
ix


x

Preface


introducing and studying various types of order, and for one of them, semiorders,
it comes from psychology.1
Luce (1956) introduces the issue by the following example:
Find a subject who prefers a cup of coffee with one cube of sugar to one with five
cubes (this should not be difficult). Now prepare 401 cups of coffee with
i
(1 + 100
)x grams of sugar, i = 0, 1, . . . , 400, where x is the weight of one cube
of sugar. It is evident that he will be indifferent between cup i and cup i + 1, for
any i, but by choice he is not indifferent between i = 0 and i = 400. (p. 179)

This example involves several idealizations, e.g., Luce ignores here the probabilistic nature of a person’s judgments of sweetness/bitterness, treating the issue as if
a given pair of cups of coffee was always judged in one and the same way. However, this idealized example leads to the idea that there may be an interesting order
such that a ≺ b only if a and b are “sufficiently far apart”; otherwise a and b are
“too similar” to be ordered (a ∼ b). Luce formalizes this idea by the following four
properties of the structure (S, ≺, ∼): for any a, b, c, d ∈ S,
(SO1)
(SO2)
(SO3)
(SO4)

exactly one of three possibilities obtains: either a ≺ b, or b ≺ a or else
a ∼ b;
a ∼ a;
if a ≺ b, b ∼ c and c ≺ d then a ≺ d;
if a ≺ b, b ≺ c and b ∼ d then either a ∼ d or c ∼ d does not hold.

There are more compact ways of characterizing semiorders, but Luce’s seems most
intuitive.
Are there familiar mathematical entities that satisfy the requirements (SO1–

SO4)? Consider a set of reals and suppose that A is a set of half-open intervals
[x, y) with the following property: if [x1 , y1 ) and [x2 , y2 ) belong to A, then x1 ≤ x2
holds if and only if y1 ≤ y2 . Let’s call the intervals in A monotonically ordered.
Define [x, y) ≺ [v, w) to mean y ≤ v. Define [x, y) ∼ [v, w) to mean that the two
intervals overlap. It is easy to check then that the system of monotonically ordered
intervals with the ≺ and ∼ relations just defined forms a semiorder.
As it turns out, under certain constraints imposed on S, the reverse of this statement is also true. To simplify the mathematics, let us assume that S can be oneto-one mapped onto an interval (finite or infinite) of real numbers. Thus, in Luce’s
example with cups of coffee we can assume that each cup is uniquely characterized by the weight of sugar in it. Then all possible cups of coffee form a set S
that is bijectively mapped onto an interval of reals between 1 and 5 cubes of sugar
(ignoring discreteness due to the molecular structure of sugar). Under this assumption, it follows from a theorem proved by Peter Fishburn (1973) that the semiorder
(S, ≺, ∼) has a monotonically ordered representation. The latter means that there
is a monotonically ordered set A of real intervals and a function f : S → A such
1 The real history, as often happens, is more complicated, and psychology was the main but not the
only source of motivation here (see Fishburn and Monjardet, 1992).


Preface

that, for all a, b ∈ S,
a ≺ b if and only if f (a) = [x1 , y1 ), f (b) = [x2 , y2 ), and y1 ≤ x2 ;

(0.1)

a ∼ b if and only if f (a) = [x1 , y1 ), f (b) = [x2 , y2 ), and [x1 , y1 ) ∩ [x2 , y2 ) = ∅.
(0.2)
Although as a source of inspiration for abstract mathematics psychology cannot compete with physics, it has motivated several lines of mathematical development. Thus, a highly sophisticated study of m-point-homogeneous and n-pointunique monotonic homeomorphisms (mappings that are continuous together with
their inverses) of conventionally ordered real numbers launched by Louis Narens
(1985) and Theodore M. Alper (1987) was motivated by a well-known classification of measurements by Stanley Smith Stevens (1946). In turn, this classification
was inspired by the variety of measurement procedures used in psychology, some
of them clearly different from those used in physics. Psychology has inspired and

continues to inspire abstract foundational studies in the representational theory of
measurement (essentially an area of abstract algebra with elements of topology),
probability theory, geometries based on nonmetric dissimilarity measures, topological and pre-topological structures, etc. Finally and prominently, the modern
developments in the area of functional equations, beginning with the highly influential work of János Aczél (1966), have been heavily influenced by problems in or
closely related to psychology.

2. Part of psychology
According to this understanding, mathematical psychology is simply psychological
theorizing and model-building in which mathematics plays a central role (but does
not necessarily evolve into new mathematical developments). A classical example
of work that falls within this category is Gustav Theodeor Fechner’s derivation of
his celebrated logarithmic law in the Elemente der Psychophysik (1861, Ch. 17).2
From this book (and this law) many date the beginnings of scientific psychology.
The problem Fechner faced was how to relate “magnitude of physical stimulus” to
“magnitude of psychological sensation,” and he came up with a principle: equal
ratios of stimulus magnitudes correspond to equal differences of sensation magnitudes. This means that for any stimulus values x1 , x2 (real numbers at or above
some positive threshold value x0 ) we have
ψ (x2 ) − ψ (x1 ) = F

x2
x1

,

(0.3)

where ψ is the hypothetical psychophysical function (mapping stimulus magnitudes into positive reals representing sensation magnitudes), and F is some
unknown function.
2 The account that follows is not a reconstruction but Fechner’s factual derivation (pp. 34–36 of vol. 2
of the Elemente). It has been largely overlooked in favor of the less general and less clearly presented

derivation of the logarithmic law in Chapter 16 (see Dzhafarov and Colonius, 2012).

xi


xii

Preface

Once the equation was written, Fechner investigated it as a purely mathematical
object. First, he observed its consequence: for any three suprathreshold stimuli
x1 , x2 , x3 ,
F

x3
x1

=F

x3
x2

+F

x2
.
x1

(0.4)


Second, he observed that u = x2/x1 and v = x3/x2 can be any positive reals, and x3/x1
is the product of the two. We have therefore, for any u > 0 and v > 0,
F (uv ) = F (u) + F (v ).

(0.5)

This is an example of a simple functional equation: the function is unknown, but it
is constrained by an identity that holds over a certain domain (positive reals).
Functional equations were introduced in pure mathematics only 40 years before
Fechner’s publication, by Augustin-Louis Cauchy, in his famous Cours d’analyse
(1821). Cauchy showed there that the only continuous solution for Equation (0.5)
is the logarithmic function
F (x) ≡ k log x, x > 0,

(0.6)

where k is a constant. The functional equations of this kind were later called the
Cauchy functional equations. We know now that one need not even assume that F
is continuous. Thus, it is clear from (0.3) that F must be positive on at least some
interval of values for x2/x1 : if x2 is much larger than x1 , it is empirically plausible
to assume that ψ (x2 ) > ψ (x1 ). This alone is sufficient to derive (0.6) as the only
possible solution for (0.5), and to conclude that k is a positive constant.
The rest of the work for Fechner was also purely mathematical, but more elementary. Putting in (0.1) x2 = x (an arbitrary value) and x1 = x0 (the threshold
value), one obtains
ψ (x) − ψ (x0 ) = ψ (x) = k log

x
,
x0


(0.7)

which is the logarithmic law of psychophysics. Fechner thus used sophisticated (by
standards of his time) mathematical work by Cauchy to derive the first justifiable
quantitative relation in the history of psychology. The value of Fechner’s reasoning
is entirely in psychology, bringing nothing new to mathematics, but the reasoning
itself is entirely mathematical.
There are many other problems and areas in psychology whose analysis falls
within the considered category because it essentially consists of purely mathematical reasoning. Thus, analysis of response times that involves distribution or quantile functions is one such area, and so are some areas of psychophysics (especially,
theory of detection and discrimination), certain paradigms of decision making,
memory and learning, etc.


Preface

3. Analytic methodology
A third way one can think of mathematical psychology is as an applied, or service
field, a set methodological principles and techniques of experimental design, data
analysis, and model assessment developed for use by psychologists. The spectrum
of examples here extends from purely statistical research to methodology based on
substantive theoretical constructs falling within the scope of the first of our three
understandings of mathematical psychology.
A simple but representative example of the latter category is H. Richard Blackwell’s (1953) correction-for-guessing formula and recommended experimental
design. Blackwell considered a simple detection experiment: an observer is shown
a stimulus that may have a certain property and asked whether she is aware of this
property being present (Yes or No). Thus, the property may be an increment of
intensity B in the center of a larger field of some fixed intensity B. Depending
on the value of B, the observer responds Yes with some probability p. Blackwell
found that this probability p( B) was not zero even at B = 0. It was obvious to
Blackwell (but not to the detection theorists later on) that this indicated that the

observer was “guessing” that the signal was there, with probability pg = p(0). It is
clear, however, that the observer cannot distinguish the situation in which B = 0
(and therefore, according to Blackwell, she could not perceive an intensity increment) from one in which B > 0 but she failed to detect it. Assuming that B is
detected with probability pd ( B), we have the following tree of possibilities:
B ▲▲▲
▲▲▲▲1−p
▲▲▲▲ d( B)
▲▲▲▲
▲▲▲
!)
not
detected

✐✐
pg ✐✐✐✐✐

1−pg
1
✐✐✐✐
 px ✐✐✐✐✐✐

Yes
No


✉✉✉✉✉



✉✉✉✉

v~ ✉✉✉
detected
pd(

B)

We can now express the probability p( B) of the observer responding Yes to B
through the probability pd ( B) that she detects B and the probability pg that she
says Yes even though she has not detected B:
p( B) = pd ( B) + (1 − pd ( B))pg .
The value of pd ( B) decreases with decreasing
this value therefore the formula turns into
p(0) = pg,

(0.8)
B, reaching zero at

B = 0. At
(0.9)

as it should. That is, pg is directly observable (more precisely, can be estimated
from data): it is the probability with which the observer says Yes to “catch” or
“empty” stimuli, those with B = 0. Blackwell therefore should insist that catch
trials be an integral part of experimental design in any Yes/No detection experiment. Once pg = p(0) is known (estimated), one can “correct” the observed

xiii


xiv


Preface

(estimated) probability p( B) for any nonzero
detection:
pd ( B) =

p( B) − p(0)
.
1 − p(0)

B into the true probability of

(0.10)

Therefore, we end up with a strong recommendation on experimental design
(which is universally followed by all experimenters) and a formula for finding true
detection probabilities (which is by now all but abandoned). Therefore, Blackwell’s
work is an example of a methodological development to be used in experimental
design and data analysis. At the same time, however, it is also a substantive model
of sensory detection, and as such falls within the category of work in psychology
in which mathematics plays a central role. The mathematics here is technically
simple but ingeniously applied.
The list of methodological developments based on substantive psychological
ideas is long. Other classical examples it includes are Louis Leon Thurstone’s
(1927) analysis of pairwise comparisons and Georg Rasch’s analysis of item difficulty and responder aptitude (1960).
On the other pole of the spectrum we find methodological developments that
have purely data-analytic character, and their relation to psychology is determined by historical tradition rather than internal logic of these developments. For
instance, nowadays we see a rapid growth of sophisticated Bayesian data-analytic
and model-comparison procedures, as well as those based on resampling and
permutation techniques. Some psychologists prefer to consider all these appliedstatistical developments part of psychometrics rather than mathematical psychology. The relationship between the two disciplines is complex, but they are traditionally separated, with different societies and journals.


About this handbook
The New Handbook of Mathematical Psychology (NHMP) is about all
three of the meanings of mathematical psychology outlined above. The title of the
handbook stems from a very important series of three volumes called the Handbook of Mathematical Psychology (HMP), edited by R. Duncan Luce, Robert R.
Bush, and Eugene Galanter (1963a; 1963b; 1965). These three volumes played
an essential role in defining the character of a new field called mathematical psychology that had begun only 10 years earlier. The 21 chapters of the HMP, totaling 1800 pages, were written by scholars who had ingeniously employed serious
mathematics in their work, such as information theory, automata theory, probability theory (including stochastic processes), logic, modern algebra, and set theory. The HMP sparked a great deal of research eventually leading, among other
things, to the founding of the European Mathematical Psychology Group, the Society for Mathematical Psychology, the Journal of Mathematical Psychology, and a


Preface

number of special graduate programs within psychology departments in Europe
and the USA. In our view, the main feature of the HMP was that it focused on
foundational issues and emphasized mathematical ideas. These two foci were central to the philosophy of the editors of the HMP, who believed that the foundations of any serious science had to be mathematical. It is in this sense that our
concept of the NHMP derives from the HMP. We realize, however, we are attempting to fill very big shoes. Also, we are facing more complex circumstances than
were the editors and authors of the HMP. In the early 1960s there were fewer
topics to cover, and there was less material to cover in each topic: the chapters
therefore could very well be both conveyors of major mathematical themes and
surveyors of empirical studies. We have to be more selective to make our task
manageable.
One could see it as a success of mathematical psychology that almost every area
of psychology nowadays employs a variety of formal models and analytic methods,
some of them quite sophisticated. It seem also the case, however, that the task of
constructing new formal models in an area has to some extent displaced mathematical foundational work. Thus, in our modern age of computation, it is possible to
use formal probabilistic models and estimate them with standard statistical packages without a deep understanding of the probabilistic and mathematical underpinnings of the models’ assumptions. We hope the NHMP will serve to counteract
such tendencies.
Our goal in this and subsequent volumes of the NHMP is to focus on foundational issues, on mathematical themes, ideas, theories, and approaches rather than
on empirical facts and specific competing models. Empirical studies are reviewed

in the NHMP primarily to motivate or illustrate a class of models or a mathematical
formulation. Rather than briefly touching on a large number of pertinent topics in
an attempt to provide a comprehensive overview, each chapter discusses in depth
and with relevant mathematical explanations just a few topics chosen for being
fundamental or highly representative of the field.
In relation to our “three fuzzy and interrelated understandings” of mathematical psychology, the first four chapters of the present volume can be classed into
the category “part of mathematics,” as they deal primarily with broad mathematical themes. Chapter 1, by Hans Colonius, discusses the important notions
of probabilistic couplings and probabilistic copulas, as well as other foundational
notions of probabilistic analysis, such as F´rechet–Hoeffding inequalities and different forms of stochastic dependence and stochastic ordering. The theme of foundations of probability with a prominent role of probabilistic couplings continues in
Chapter 2, by Ehtibar Dzhafarov and Janne Kujala. It deals with systems of random
variables recorded under variable conditions and adds the notion of selectiveness
(in the dependence of the random variables on these conditions) to the conceptual
framework of probability theory. Chapter 3, by Che Tat Ng, takes on the traditional topic of functional equations. As we have seen, their use in mathematical
psychology dates back to Gustav Theodeor Fechner. Chapter 4, by John Boyd and

xv


xvi

Preface

William Batchelder, takes on the field of network analysis, focusing on discrete
networks representable by graphs and digraphs. The chapter presents algebraic
(matrix) methods of network analysis, as well as probabilistic networks, such as
Markov random fields.
Chapters 5–8 can be classed into the category “part of psychology,” as they
primarily deal with substantive theories and classes of models. In Chapter 5,
Jean-Paul Doignon and Jean-Claude Falmagne describe a theory of knowledge
and learning spaces, which are highly abstract pre- (or proto-) topological constructs that nevertheless have considerable applied value in assessment and guidance of knowledge acquisition. Chapter 6, by McKenzie Alexander, is about interdisciplinary applications of classical game theory to dynamic systems, such as

behavior of animals, cultural norms, or linguistic conventions, and about how
these systems evolve into evolutionary stable structures within a Darwinian concept of adaptability. The classical topic of choice, preference, and utility models is taken on in Chapter 7, by Anthony A. J. Marley and Michel Regenwetter.
The chapter focuses primarily on probabilistic models, treating deterministic representations as their special case. Chapter 8, by William Batchelder, deals with
another classical topic, that of modeling cognitive processes by discrete state models representable as a special class of parameterized full binary trees. Such models range from discrete state models of signal detection to Markov chain models
of learning and memory to the large class of multinomial processing tree (MPT)
models.
The last two chapters of the handbook deal primarily with the relation between
psychological models and empirical data. They can therefore be classed into the
category of “analytic methodology.” Chapter 9, by Jeffrey Rouder, Richard Morey,
and Michael Pratte, deals with data structures where several participants each give
responses to several classes of similar experimental items. The chapter describes
how Bayesian hierarchical models can specify both subject and item parameter
distributions. In Chapter 10, Jay Myung, Daniel Cavagnaro, and Mark Pitt discuss
statistical techniques, both Bayesian and frequentist, of evaluating and comparing
parametric probabilistic models applied to a given body of data, as well as ways to
optimally select a sequence of experimental conditions in data gathering to maximally differentiate the competing models.
There is no particular order in which the chapters in the NHMP should be read:
they are independent of each other. We strived to ensure that each chapter is selfcontained, requiring no prior knowledge of the material except for a certain level
of mathematical maturity (ability to read mathematics) and some knowledge of
basic mathematics. The latter includes calculus, elementary probability theory, and
elementary set theory, say, within the scope of one- or two-semester introductory
courses at mathematics departments. The intended readership of the handbook are
behavioral and social scientists, mathematicians, computer scientists, and analytic
philosophers – ranging from graduate students, or even advanced undergraduates,
to experts in one of these fields.


Preface

References

Aczél, J. (1966). Lectures on Functional Equations and Their Applications. (Mathematics
in Science and Engineering 19.) New York, NY: Academic Press.
Alper, T. M. (1987). A classification of all order-preserving homeomorphism groups of
the real that satisfy finite uniqueness. Journal of Mathematical Psychology, 31:
135–154.
Blackwell, H. R. (1953). Psychological Thresholds: Experimental Studies of Methods of
Measurement (Bulletin No. 36). Ann Arbor, MJ: University of Michigan, Engineering Research Institute.
Cauchy, A.-L. (1821). Cours d’analyse de l’École royale polytechnique. Paris: Imprimerie
royale.
Dzhafarov, E. N., and Colonius, H. (2012). The Fechnerian idea. American Journal of Psychology, 124: 127–140.
Fechner, G. T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf & Härtel.
Fishburn, P. (1973). Interval representations for interval orders and semiorders. Journal of
Mathematical Psychology, 10; 91–105.
Fishburn, P., and Monjardet, B. (1992). Norbert Wiener on the theory of measurement
(1914, 1915, 1921). Journal of Mathematical Psychology, 36; 165–184.
Luce, R. D. (1956). Semiorders and a theory of utility discrinlination. Econometrica, 24:
178–191.
Luce, R. D. Bush, R. R. and Galanter, E. (1963a). Handbook of Mathematical Psychology,
vol. 1. New York, NY: Wiley.
Luce, R. D. Bush, R. R. and Galanter, E. (1963b). Handbook of Mathematical Psychology,
vol. 2. New York, NY: Wiley.
Luce, R. D. Bush, R. R. and Galanter, E. (1965). Handbook of Mathematical Psychology,
vol. 3. New York, NY: Wiley.
Narens, L. (1985). Abstract Measurement Theory. Cambridge, MA: MIT University Press.
Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Paedagogike Institut.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103; 677–680
Thurstone, L. L. (1927). Psychophysical analysis. American Journal of Psychology, 38;
368–389.

xvii




1 Selected concepts
from probability
Hans Colonius

1.1 Introduction
1.1.1 Goal of this chapter
1.1.2 Overview
1.2 Basics
1.2.1 σ -Algebra, probability space, independence, random variable,
and distribution function
1.2.2 Random vectors, marginal and conditional distribution
1.2.3 Expectation, other moments, and tail probabilities
1.2.4 Product spaces and convolution
1.2.5 Stochastic processes
The Poisson process
The non-homogeneous Poisson process
1.3 Specific topics
1.3.1 Exchangeability
1.3.2 Quantile functions
1.3.3 Survival analysis
Survival function and hazard function
Hazard quantile function
Competing risks models: hazard-based approach
Competing risks models: latent-failure times approach
Some non-identifiability results
1.3.4 Order statistics, extreme values, and records
Order statistics

Extreme value statistics
Record values
1.3.5 Coupling
Coupling event inequality and maximal coupling
1.3.6 Fréchet–Hoeffding bounds and Fréchet distribution classes
Fréchet–Hoeffding bounds for n = 2
Fréchet–Hoeffding bounds for n ≥ 3
Fréchet distribution classes with given higher-order marginals
Best bounds on the distribution of sums

2
2
6
7
7
13
16
19
21
22
23
24
24
25
27
27
29
30
32
33

34
34
37
40
44
48
50
50
52
53
55

1


2

h. c o lo n i u s

1.3.7 Copula theory
Definition, examples, and Sklar’s theorem
Copula density and pair copula constructions (vines)
Survival copula, dual and co-copula
Copulas with singular components
Archimedean copulas
Example: Clayton copula and the copula of max and min
order statistics
Operations on distributions not derivable from operations on
random variables
1.3.8 Concepts of dependence

Positive dependence
Negative dependence
Measuring dependence
1.3.9 Stochastic orders
Univariate stochastic orders
Univariate variability orders
Multivariate stochastic orders
Positive dependence orders
1.4 Bibliographic references
1.4.1 Monographs
1.4.2 Selected applications in mathematical psychology
1.5 Acknowledgments
References

56
56
58
59
60
60
62
62
63
63
66
67
68
68
71
75

76
78
78
78
79
79

1.1 Introduction
1.1.1 Goal of this chapter
Since the early beginnings of mathematical psychology, concepts from probability
theory have always played a major role in developing and testing formal models of
behavior and in providing tools for data-analytic methods. Moreover, fundamental measurement theory, an area where such concepts have not been mainstream,
has been diagnosed as wanting of a sound probabilistic base by founders of the
field (see Luce, 1997). This chapter is neither a treatise on the role of probability
in mathematical psychology nor does it give an overview of its most successful
applications. The goal is to present, in a coherent fashion, a number of probabilistic concepts that, in my view, have not always found appropriate consideration in
mathematical psychology. Most of these concepts have been around in mathematics for several decades, like coupling, order statistics, records, and copulas; some
of them, like the latter, have seen a surge of interest in recent years, with copula
theory providing a new means of modeling dependence in high-dimensional data


Selected concepts from probability

(see Joe, 2015). A brief description of the different concepts and their interrelations
follows in the second part of this introduction.
The following three examples illustrate the type of concepts addressed in this
chapter. It is no coincidence that they all relate, in different ways, to the measurement of reaction time (RT), which may be considered a prototypical example
of a random variable in the field. Since the time of Dutch physiologist Franciscus C. Donders (Donders, 1868/1969), mathematical psychologists have developed
increasingly sophisticated models and methods for the analysis of RTs.1 Nevertheless, the probabilistic concepts selected for this chapter are, in principle, applicable
in any context where some form of randomness has been defined.

Example 1.1 (Random variables vs. distribution functions) Assume that the time
to respond to a stimulus depends on the attentional state of the individual; the
response may be the realization of a random variable with distribution function
FH in the high-attention state and FL in the low-attention state. The distribution of
observed RTs could then be modeled as a mixture distribution,
F (t ) = pFH (t ) + (1 − p)FL (t ),
for all t ≥ 0 with 0 ≤ p ≤ 1 the probability of responding in a state of high
attention.
Alternatively, models of RT are often defined directly in terms of operations
on random variables. Consider, for example, Donders’ method of subtraction in
the detection task; if two experimental conditions differ by an additional decision
stage, D, total response time may be conceived of as the sum of two random variables, D + R, where R is the time for responding to a high-intensity stimulus.
In the case of a mixture distribution, one may wonder whether it might also be
possible to represent the observed RTs as the sum of two random variables H and
L, say, or, more generally, if the observed RTs follow the distribution function of
some Z(H, L), where Z is a measurable two-place function of H and L. In fact,
the answer is negative and follows as a classic result from the theory of copulas
(Nelsen, 2006), to be treated later in this chapter.
Example 1.2 (Coupling for audiovisual interaction) In a classic study of intersensory facilitation, Hershenson (1962) compared reaction time to a moderately
intense visual or acoustic stimulus to the RT when both stimuli were presented
more or less simultaneously. Mean RT of a well-practiced subject to the sound
(RTA , say) was approximately 120 ms, mean RT to the light (RTV ) about 160 ms.
When both stimuli were presented synchronously, mean RT was still about 120 ms.
Hershenson reasoned that intersensory facilitation could only occur if the “neural
events” triggered by the visual and acoustic stimuli occurred simultaneously somewhere in the processing. That is, “physiological synchrony,” rather than “physical
(stimulus) synchrony” was required. Thus, he presented bimodal stimuli with light
leading sound giving the slower system a kind of “head start.” In the absence of
1 For monographs, see Townsend and Ashby (1983), Luce (1986), Schweickert et al. (2012).

3



h. c o lo n i u s

AD

160
MEAN RT

4

120

0

20

40
ISI

60

80

Figure 1.1 Bimodal (mean) reaction time to light and sound with
interstimulus interval (ISI) and sound following light, RTV = 160 ms,
RTA = 120 ms. Upper graph: prediction in absence of interaction, lower
graph: observed mean RTs; data from Diederich and Colonius (1987).

interaction, reaction time to the bimodal stimulus with presentation of the acoustic

delayed by τ ms, denoted as RTV τ A , is expected to increase linearly until the sound
is delivered 40 ms after the light (the upper graph in Figure 1.1). Actual results,
however, looked more like the lower graph in Figure 1.1, where maximal facilitation occurs at about physiological synchrony. Raab (1962) suggested an explanation in terms of a probability summation (or, race) mechanism: response time
to the bimodal stimulus, RTV τ A , is considered to be the winner of a race between
the processing times for the unimodal stimuli, i.e., RTV τ A ≡ min{RTV , RTA + τ }. It
then follows for the expected values (mean RTs):
E[RTV τ A ] = E[min{RTV , RTA + τ }] ≤ min{E[RTV ], E[RTA + τ ]},
a prediction that is consistent with the observed facilitation. It has later been shown
that this prediction is not sufficient for explaining the observed amount of facilitation, and the discussion of how the effect should be modeled is ongoing, attracting
a lot of attention in both psychology and neuroscience.
However, as already observed by Luce (1986, p. 130), the above inequality only
makes sense if one adds the assumption that the three random variables RTV τ A ,
RTV , and RTA are jointly distributed. The existence of a joint distribution is not
automatic because each variable relates to a different underlying probability space
defined by the experimental condition: visual, auditory, or bimodal stimulus presentation. From the theory of coupling (Thorisson, 2000), constructing such a joint
distribution is always possible by assuming stochastic independence of the random
variables. However – and this is the main point of this example – independence is
not the only coupling possibility, and alternative assumptions yielding distributions


Selected concepts from probability

0.015

0.010

0.005

0.000
0


50

100

150

200

(ms)
250

Figure 1.2 Inverse gaussian (dashed line) and gamma densities with identical
mean (60 ms) and standard deviation (35 ms).

with certain dependency properties may be more appropriate to describe empirical
data.
Example 1.3 (Characterizing RT distributions: hazard function) Sometimes, a
stochastic model can be shown to predict a specific parametric distribution, e.g.,
drawing on some asymptotic limit argument (central limit theorem or convergence
to extreme-value distributions). It is often notoriously difficult to tell apart two densities when only a histogram estimate from a finite sample is available. Figure 1.2
provides an example of two theoretically important distributions, the gamma and
the inverse gaussian densities with identical means and standard deviations, where
the rather similar shapes make it difficult to distinguish them on the basis of a
histogram.
An alternative, but equivalent, representation of these distributions is terms of
their hazard functions (see Section 1.10). The hazard function hX of random variable X with distribution function FX (x) and density fX (x) is defined as
hX (x) =

fX (x)

.
1 − FX (x)

As Figure 1.3 illustrates, the gamma hazard function is increasing with decreasing slope, whereas the inverse gaussian is first increasing and then decreasing.
Although estimating hazard functions also has its intricacies (Kalbfleisch and
Prentice, 2002), especially at the right tail, there is a better chance to tell the
distributions apart based on estimates of the hazard function than on the density
or distribution function. Still other methods to distinguish classes of distribution
functions are based on the concept of quantile function (see Section 1.3.2),
among them the method of delta plots, which has recently drawn the attention of
researchers in RT modeling (Schwarz and Miller, 2012). Moreover, an underlying
theme of this chapter is to provide tools for a model builder that do not depend on
committing oneself to a particular parametric distribution assumption.

5


6

h. c o lo n i u s

0.05
0.04
0.03
0.02
0.01
0.00
0

50


100

150

200

(ms)
250

Figure 1.3 Hazard functions of the inverse gaussian (dashed line) and gamma
distributions corresponding to the densities of Figure 1.2.

We hope to convey in this chapter that even seemingly simple situations, like
the one described in Example 1.2, may require some careful consideration of the
underlying probabilistic concepts.

1.1.2 Overview
In trying to keep the chapter somewhat self-contained, the first part presents
basic concepts of probability and stochastic processes, including some elementary notions of measure theory. Because of space limitations, some relevant topics
had to be omitted (e.g., random walks, Markov chains), or are only mentioned in
passing (e.g., martingale theory). For the same reason, statistical aspects are considered only when suggested by the context.2 Choosing what material to cover was
guided by the specific requirements of the topics in the second, main part of the
chapter.
The second part begins with a brief introduction to the notion of exchangeability
(with a reference to an application in vision) and its role in the celebrated “theorem
of de Finetti.” An up-to-date presentation of quantile (density) functions follows,
a notion that emerges in many areas including survival analysis. The latter topic,
while central to RT analysis, has also found applications in diverse areas, like decision making and memory, and is treated next at some length, covering an important
non-identifiability result. Next follow three related topics: order statistics, extreme

values, and the theory of records. Whereas the first and, to a lesser degree, the
second of these topics have become frequent tools in modeling psychological processes, the third has not yet found the role that it arguably deserves.
The method of coupling, briefly mentioned in introductory Example 1.2, is
a classic tool of probability theory concerned with the construction of a joint
2 For statistical issues of reaction time analysis, see the competent treatments by Van Zandt (2000,
2002); and Ulrich and Miller (1994), for discussing effects of truncation.


×