11 Probability and statistics:
preliminaries to random
vibrations
11.1
Introduction
This chapter covers some fundamental aspects of probability theory and
serves the purpose of providing the necessary tools for the treatment of
random vibrations, which will be discussed in the next chapter. Probably,
most of the readers have already some familiarity with the subject because
probability theory and statistics—directly or indirectly—pervade almost all
aspects of human activities, and, in particular, many branches of all scientific
disciplines. Nonetheless, in the philosophy of this text (as in Chapters 2 and
3 and in the appendices), the idea is to introduce and discuss some basic
concepts with the intention of following a continuous line of reasoning from
simple to more complex topics and the hope of giving the reader a useful
source of reference for a clear understanding of this text in the first place,
but of other more specialized books as well.
11.2
The concept of probability
In everyday conversation, probability is a loosely defined term employed to
indicate the measure of one’s belief in the occurrence of a future event when
this event may or may not occur. Moreover, we use this word by indirectly
making some common assumptions: (1) probabilities near 1 (100%) indicate
that the event is extremely likely to occur, (2) probabilities near zero indicate
that the event is almost not likely to occur and (3) probabilities near 0.5 (50%)
indicate a ‘fair chance’, i.e. that the event is just as likely to occur as not.
If we try to be more specific, we can consider the way in which we assign
probabilities to events and note that, historically, three main approaches
have developed through the centuries. We can call them the personal
approach, the relative frequency approach and the classical approach. The
personal approach reflects a personal opinion and, as such, is always
applicable because anyone can have a personal opinion about anything.
However, it is not very fruitful for our purposes. The relative frequency
approach is more objective and pertains to cases in which an ‘experiment’
Copyright © 2003 Taylor & Francis Group LLC
can be repeated many times and the results observed; P[A], the probability
of occurrence of event A is given as
(11.1)
where nA is the number of times that event A occurred and n is the total
number of times that the experiment was run. This approach is surely useful
in itself but, obviously, cannot deal with a one-shot situation and, in any case,
is a definition of an a posteriori probability (i.e. we must perform the
experiment to determine P[A]). The idea behind this definition is that the
ratio on the r.h.s. of eq (11.1) is almost constant for sufficiently large values of n.
Finally, the classical approach can be used when it can be reasonably
assumed that the possible outcomes of the experiment are equally likely; then
(11.2)
where n(A) is the number of ways in which outcome A can occur and n(S)
is the number of ways in which the experiment can proceed. Note that in
this case we do not really need to perform the experiment because eq (11.2)
defines an a priori probability. A typical example is the tossing of a fair coin;
without an experiment we can say that n(S)=2 (head or tail) and the
probability of, say, a head is
Pictorially (and also for
historical reasons), we may view eq (11.2) as the ‘gambler’s definition’ of
probability.
However, consider the following simple and classical ‘meeting problem’:
two people decide to meet at a given place anytime between noon and 1
p.m.. The one who arrives first is obliged to wait 20 min and then leave. If
their arrival times are independent, what is the probability that they actually
meet? The answer is 5/9 (as the reader is invited to verify) but the point is
that this problem cannot be tackled with the definitions of probability given
above.
We will not pursue the subject here, but it is evident that the definitions
above cannot deal with a large number of problems of great interest. As a
matter of fact, a detailed analysis of both definitions (11.1) and (11.2)—
because of their intrinsic limitations, logical flaws and lack of stringency—
shows that they are inadequate to form a solid basis for a more rigorous
mathematical theory of probability. Also, the von Mises definition, which
extends the relative frequency approach by writing
(11.3)
suffers serious limitations and runs into insurmountable logical difficulties.
Copyright © 2003 Taylor & Francis Group LLC
The solutions to these difficulties was given by the axiomatic theory of
probability introduced by Kolmogorov. Before introducing this theory,
however, it is worth considering some basic ideas which may be useful as
guidelines for Kolmogorov’s abstract formulation.
Let us consider eq (11.2), we note that, in order to determine what is
‘probable’, we must first determine what is ‘possible’; this means that we
have to make a list of possibilities for the experiment. Some common
definitions are as follows: a possible outcome of our experiment is called an
event and we can distinguish between simple events, which can happen only
in one way, and compound events, which can happen in more than one distinct
way. In the rolling of a die, for example, a simple event is the observation of
a 6, whereas a compound event is the observation of an even number (2, 4 or
6). In other words, simple events cannot be decomposed and are also called
sample points. The set of all possible sample points is called a sample space.
Now, adopting the notation of elementary set theory, we view the sample
space as a set W whose elements Ej are the sample points. If the sample
space is discrete, i.e. contains a finite or countable number of sample points,
any compound event A is a subset of W and can be viewed as a collection
of two or more sample points, i.e. as the ‘union’ of two or more sample
points. In the die-rolling experiment above, for example, we can write
where we call A the event ‘observation of an even number’, E2 the sample
point ‘observation of a 2’ and so on. In this case, it is evident that
and, since E2, E4 and E6 are mutually exclusive
(11.4a)
The natural extension of eq (11.4a) is
(11.4b)
Moreover, if we denote by
have also
the complement of set A (i.e.
),we
(11.4c)
and if we consider two events, say B and C, which are not mutually exclusive, then
(11.4d)
where the intersection symbol n is well known from set theory and is
Copyright © 2003 Taylor & Francis Group LLC
often called the compound probability, i.e. the probability that events B and
C occur simultaneously. (Note that one often finds also the symbols A+B for
and AB for
) Again, in the rolling of a fair die, for example, let
and
then
and, as expected,
For three nonmutually exclusive sets, it is not difficult to extend eq
(11.4d) to
(11.4e)
as the reader is invited to verify.
Incidentally, it is evident that the method that we are following requires
counting; for example, the counting of sample points and/or a complete
itemization of equiprobable sets of sample points. For large sample spaces
this may not be an easy task. Fortunately, aid comes from combinatorial
analysis from which we know that the number of permutations (arrangements
of objects in a definite order) of n distinct objects taken r at a time is given by
(11.5)
while the number of combinations (arrangements of objects without regard
to order) of n distinct objects taken r at a time is
(11.6)
For example, if n=3 (objects a, b and c) and r=2, the fact that the number of
combination is less than the number of permutations is evident if one thinks
that in a permutation the arrangement of objects {a, b} is considered different
from the arrangement {b, a}, whereas in a combination they count as one
single arrangement.
These tools simplify the counting considerably. For example, suppose that
a big company has hired 15 new engineers for the same job in different
plants. If a particular plant has four vacancies, in how many ways can they
fill these positions? The answer is now straightforward and is given by
C15,4=1365. Moreover, note also that the calculations of factorials can be
often made easier by using Stirling’s formula, i.e.
which
results in errors smaller that 1% for
Returning now to our main discussion, we can make a final comment
before introducing the axiomatic theory of probability: the fact that two
events B and C are mutually exclusive is formalized in the language of sets
Copyright © 2003 Taylor & Francis Group LLC
as
where Ø is the empty set. So, we need to include this event in
the sample space and require that
By so doing, we obtain the
expected result that eq (11.4d) reduces to the sum P[B]+P[C] whenever events
B and C are mutually exclusive. In probability terminology, Ø is called the
‘impossible event’.
11.2.1 Probability—axiomatic formulation and some
fundamental results
We define a probability space as a triplet
where:
1. W is a set whose elements are called elementary events.
2.
is a σ-algebra of subsets of W which are called events.
3. P is a probability function, i.e. a real-valued function with domain
such that:
(a)
for every
(b) P[W]=1 and
(c)
when
if the Aj s are mutually disjoint events, i.e.
For completeness, we recall here the definition of s-algebra: a collection
subsets of a given set W is a s-algebra if
1.
2. If
3. If
and
of
then
and
for every index j=1, 2, 3,…then
Two observations can be made immediately. First—although it may not seem
obvious—the axiomatic definition includes as particular cases both the
classical and the relative frequency definitions of probability without suffering
their limitations; second, this definition does not tell us what value of
probability to assign to a given event
This is in no way a limitation
of this definition but simply means that we will have to model our experiment
in some way in order to obtain values for the probability of events. In fact,
many problems of interest deal with sets of identical events which are not
equally likely (for example, the rolling of a biased die).
Let us introduce now two other definitions of practical importance:
conditional probability and the independence of events. Intuitively, we can
argue that the probability of an event can vary depending upon the occurrence
or nonoccurrence of one or more related events: in fact, it is different to ask
in the die-rolling experiment ‘What is the probability of a 6?’ or ‘What is the
probability of a 6 given that an even number has fallen?’. The answer to the
first question is 1/6 while the answer to the second question is 1/3. This is
the concept of conditional probability, i.e. the probability of an event A
Copyright © 2003 Taylor & Francis Group LLC
given that an event B has already occurred. The symbol for conditional
probability is P[A|B] and its definition is
(11.7)
provided that
It is not difficult to see that, for a given probability
space
satisfies the three axioms above and is a probability
function in its own right. Equation (11.7) yields immediately the
multiplication rule for probabilities, i.e.
(11.8a)
which can be generalized to a number of events
as follows:
(11. 8b)
If the occurrence of event B has no effect on the probability assigned to
an event A, then A and B are said to be independent and we can express this
fact in terms of conditional probability as
(11.9a)
or, equivalently
(11.9b)
Clearly, two mutually exclusive events are not independent because, from
eq (11.7), we have P[A|B]=0 when
Also, if A and B are two
independent events, we get from eq (11.7)
(11.10a)
which is referred to as the multiplication theorem for independent events.
(Note that some authors give eq (11.10a) as the definition of independent
events). For n mutually (or collectively) independent events eq (11.8b) yields
(11.10b)
A word of caution is necessary at this point: three (or more) random
events can be independent in pairs without being mutually independent. This
is illustrated by the example that follows.
Copyright © 2003 Taylor & Francis Group LLC
Example 11.1. Consider a lottery with eight numbers (1–8) and let
respectively, be the simple events of extraction of 1, extraction
of 2, etc. Let
Now,
that
and
It is then easy to verify
and
which means that the events are pairwise
independent. However,
meaning that the three events are not mutually, or collectively, independent.
Another important result is known as the total probability formula. Let
be n mutually exclusive events such that
where
W is the sample space. Then, a generic event B can be expressed as
(11.11)
where the n events
are mutually exclusive. Owing to the third axiom
of probability, this implies
so that, by using the multiplication theorem, we get the total probability
formula
(11.12)
which remains true for
With the same assumptions as above on the events
let
us now consider a particular event Ak; the definition of conditional probability
yields
(11.13)
Copyright © 2003 Taylor & Francis Group LLC
where eq (11.12) has been taken into account. Also, by virtue of eq (11.8a) we
can write
so that substituting in eq (11.13) we get
(11.14)
which is known as Bayes’ formula and deserves some comments. First, the
formula is true if
Second, eq (11.14) is particularly useful for experiments
consisting of stages. Typically, the Ajs are events defined in terms of a first stage
(or, otherwise, the P[Aj] are known for some reason), while B is an event defined
in terms of the whole experiment including a second stage; asking for P[Ak|B]
is then, in a sense, ‘backward’, we ask for the probability of an event defined at
the first stage conditioned by what happens in a later stage. In Bayes’ formula
this probability is given in terms of the ‘natural’ conditioning, i.e. conditioning
on what happens at the first stage of the experiment. This is why the P[Aj] are
called the a priori (or prior) probabilities, whereas P[Ak|B] is called a posteriori
(posterior or inverse) probability. The advantage of this approach is to be able
to modify the original predictions by incorporating new data. Obviously, the
initial hypotheses play an important role in this case; if the initial assumptions
are based on an insufficient knowledge of the mechanism of the process, the
prior probabilities are no better than reasonable guesses.
Example 11.2. Among voters in a certain area, 40% support party 1 and
60% support party 2. Additional research indicates that a certain election
issue is favoured by 30% of supporters of party 1 and by 70% of supporters
of party 2. One person at random from that area—when asked—says that
he/she favours the issue in question. What is the probability that he/she is a
supporter of party 2? Now, let
•
•
•
A1 be the event that a person supports party 1, so that P[A1]=0.4;
A2 be the event that a person supports party 2, so that P[A2]=0.6;
B be the event that a person at random in the area favours the issue in
question.
Prior knowledge (the results of the research) indicate that P[B|A1]=0.3 and
P[B|A2]=0.7. The problem asks for the a posteriori probability P[A2|B], i.e. the
probability that the person who was asked supports party 2 given the fact that
he/she favours that specific election issue. From Bayes’ formula we get
Then, obviously, we can also infer that
Copyright © 2003 Taylor & Francis Group LLC
11.3
Random variables, probability distribution functions
and probability density functions
Events of major interest in science and engineering are those identified by
numbers. Moreover—since we assume that the reader is already familiar with
the term ‘variable’—we can state that a random variable is a real variable
whose observed values are determined by chance or by a number of causes
beyond our control which defy any attempt at a deterministic description. In
this regard, it is important to note that the engineer’s and applied scientist’s
approach is not so much to ask whether a certain quantity is a random variable
or not (which is often debatable), but to ask whether that quantity can be
modelled as a random variable and if this approach leads to meaningful results.
In mathematical terms, let x be any real number, then a random variable
on the probability space (W, , P) is a function
( is the set of
real numbers) such that the sets
are events, i.e.
In words, let X be a real-valued function defined on
W; given a real number x, we call Bx the set of all elementary events w for
which
If, for every x the sets Bx belong to the σ-algebra , then X
is a (one-dimensional) random variable.
The above definition may seem a bit intricate at first glance, but a little
thought will show that it provides us precisely with what we need. In fact,
we can now assign a definite meaning to expression P[Bx], i.e. the probability
that the random variable X corresponding to a given experiment will assume
a value less than or equal to x. It is then straightforward, for a given random
as
variable X, to define the function
(11.15)
which is called the cumulative distribution function (cdf, or the distribution
function) of the random variable X. From the definition, the following
properties can be easily proved:
(11.16)
where x1, and x2 are any two real numbers such that In other words,
distribution functions are monotonically non-decreasing functions which start
Copyright © 2003 Taylor & Francis Group LLC
at zero for
and increase to unity for
It should be noted that
every random variable defines uniquely its distribution functions but a given
distribution function corresponds to an arbitrary number of different random
variables. Moreover, the probabilistic properties of a random variable can
be completely characterized by its distribution function.
Among all possible random variables, an important distinction can be
made between discrete and continuous random variables. The term discrete
means that the random variable can assume only a finite or countably infinite
number of distinct possible values
Then, a complete description
can be obtained by knowing the probabilities
for
k=1, 2, 3,…by defining the distribution function as
(11.17)
where we use the symbol θ for the Heaviside function (which we already
encountered in Chapters 2 and 5), i.e.
(11.18)
The distribution function of a discrete random variable is defined over
the entire real line and is a ‘step’ function with a number of jumps or
discontinuities occurring at any point xk. A typical and simple example is
provided by the die-rolling experiment where X is the numerical value
observed in the rolling of the die. In this case,
etc. and
for every k=1, 2,…, 6. Then
for
for
for
A continuous random variable, on the other hand, can assume any value
in some interval of the real line. For a large and important class of random
variables there exist a certain non-negative function (x) which satisfies
the relationship
so that
for
(11.19)
where px(x) is called the probability density function (pdf) and η is a dummy
Copyright © 2003 Taylor & Francis Group LLC
variable of integration. The main properties of
follows:
(x) can be summarized as
(11.20)
The second property is often called the normalization condition and is
equivalent to
Also, it is important to notice a fundamental
difference with respect to discrete random variables: the probability that the
continuous random variable X assumes a specific value x is zero and
probabilities must be defined over an interval. Specifically, if
(x) is
continuous at x we have
(11.21a)
and, obviously
(11.21b)
Example 11.3. Discrete random variables—binomial, Poisson and geometric
distributions. Let us consider a fixed number (n) of typical ‘Bernoulli trials’.
A ‘Bernoulli trial’ is an experiment with only two possible outcomes which
are usually called ‘success’ and ‘failure’. Furthermore, the probability of
success is p and does not change from trial to trial, the probability of failure
is
and the trials are independent. The discrete random variable of
interest X is the number of successes during the n trials. It is shown in every
book on statistics that the probability of having x successes is given by
(11.22)
where x=1, 2, 3,…, n and 0
binomial distribution with parameters n and p when its density function is
given by eq (11.22).
Now, suppose that p is very small and suppose that n becomes very large
in such a way that the product pn is equal to a constant . In mathematical
terms, provided that
we can let
and
then
Copyright © 2003 Taylor & Francis Group LLC
because
A random variable X with a pdf given by
(11.23)
is said to have a Poisson distribution with parameter . Equation (11.23)
is a good approximation for the binomial equation (11.22) when either
or
Poisson-distributed random variables
arise in a number of situations, the most common of which concern ‘rare’
events, i.e. events with a small probability of occurrence. The parameter
then represents the average number of occurrences of the event per
measurement unit (i.e. a unit of time, length, area, space, etc.). For example,
knowing that at a certain intersection we have on average 1.7 car accidents
per month, the probability of zero accidents in a month is given by
The fact that the number of accidents follows a
Poisson distribution can be roughly established as follows. Divide a month
into n intervals, each of which is so small that at most one accident can
occur with a probability
Then, during each interval (if the occurrence
of accidents can be considered as independent from interval to interval)
we have a Bernoulli trial where the probability of ‘success’ p is relatively
small if n is large and
Note that we do not need to know the
values of n and/or p (which can be, to a certain extent, arbitrary), but it is
sufficient to verify that the underlying assumptions of the Poisson
distribution hold.
If now, in a series of Bernoulli trials we consider X to be the number of
trials before the first success occurs we are, broadly speaking, dealing with
the same problem as in the first case but we are asking a different question
Copyright © 2003 Taylor & Francis Group LLC
(the number of trials is not fixed in this case). It is not difficult to show that
this circumstance leads to the geometric distribution, which is written
(11.24)
where x=1, 2, 3,…and 0
has a geometric distribution with parameter p when its pdf is given by eq
(11.24).
Example 11.4. Continuous random variable—the normal or Gaussian
distribution. The most important and widely used continuous probability
distribution was first described by de Moivre in the second half of the
eighteenth century but was implemented as a useful practical tool only half
a century later by Gauss and Laplace. Its importance is due to the central
limit theorem which we will discuss in a later section. A random variable X
is said to have a Gaussian (or normal) distribution with parameters µ, and
if its pdf is given by
(11.25)
For practical use, it is convenient to cast eq (11.25) in a standardized
format which can be more easily expressed in tabular form. It is not difficult
to see that, by defining the new random variable Z as
(11.26)
we obtain the standard form
(11.27)
whose cdf is given by
(11.28)
since
and we defined
Equation (11.28) has been given because either FZ(z) or Φ(z) are commonly
found in statistical tables.
Copyright © 2003 Taylor & Francis Group LLC
Also, it can be shown (local Laplace-de Moivre theorem, see for example
Gnedenko [1]) that when n and np are both large—i.e. for
—we have
(11.29)
meaning that the binomial distribution can be approximated by a Gaussian
distribution. The r.h.s. of eq (11.29) is called the Gaussian approximation to
the binomial distribution.
Example 11.5. For purposes of illustration, let us take a probabilistic approach
to a deterministic problem. Consider the sinusoidal deterministic signal
We ask, for any given value of amplitude x
probability that the amplitude of our signal lies between x and x+dx?
From our previous discussion it is evident that we are asking for the pdf of
the ‘random’ variable X, i.e. the amplitude of our signal. This can be obtained
by calculating the time that the signal amplitude spends between x and x+dx
during an entire period
Now, from
we get
which yields
(11.30)
Within a period T the amplitude passes in the interval from x to x+dx
twice, so that the total amount of time that it spends in such an interval is
2dt; hence
(11.31)
where the last expression holds because
But, noting that 2dt/T is
exactly
i.e. the probability that, within a period, the amplitude lies
between x and x+dx, we get
(11.32)
Copyright © 2003 Taylor & Francis Group LLC
Fig. 11.1 Amplitude PDF of sinusoidal signal.
which is shown in Fig. 11.1 for x0=1. From this graph it can be noted that
a sinusoidal wave spends more time near its peak values than it does near its
abscissa axis (i.e. its mean value).
11.4 Descriptors of random variable behaviour
From the discussion of preceding sections, it is evident that the complete
description of the behaviour of a random variable is provided by its
distribution function. However, a certain degree of information—although
not complete in many cases—can be obtained by well-known descriptors
such as the mean value, the standard deviation etc. These familiar concepts
are special cases of a series of descriptors called moments of a random
variable. For a continuous random variable X, we define the first moment
of X, indicated by
(11.33a)
or, for a discrete random variable
(11.33b)
Copyright © 2003 Taylor & Francis Group LLC
Equations (11.33a) or (11.33b) define what is usually called in engineering
terms the mean (or also the ‘expected value’) of X and is indicated by the
symbol µX. Similarly, the second moment is the expected value of X2—i.e.
E[X2]—and has a special name, the mean squared value of X, which for a
continuous random variable is written as
(11.34)
The square root of the second moment
is called the root-meansquare value or rms of X.
By analogy, we define the mth moment (m=1, 2, 3,…) of X as
(11.35a)
and in general, for a function f(X) of the random variable we have
(11.35b)
Equations (11.35a) are just particular cases of eq (11.35b).
When we first subtract its mean from the random variable and then
calculate the expected values, we speak of central moments, i.e. the mth
central moment is given by
(11.36)
In particular, the second central moment
is well known and
has a special name: the variance, usually indicated with the symbols
or
Var[X]. Note that the variance can also be evaluated by
(11.37)
which is just a particular case of the fact that central moments can be
evaluated in terms of ordinary (noncentral) moments by virtue of the binomial
theorem. In formulas we have
(11.38)
Copyright © 2003 Taylor & Francis Group LLC
The square root of the variance, i.e.
is called the standard
deviation and we commonly find the symbols sX or SD[X].
Example 11.6. Let us consider some of the pdfs introduced in previous sections
and calculate their mean and variance. For the binomial distribution, for
example, we can show that
(11.39)
The first of eqs (11.39) can be obtained as follows:
where the last equality holds because the summation represents the sum of
all the ordinates of the binomial distribution and must be equal to 1 for the
normalization condition. For the second of eqs (11.39) we can use eq (11.37)
so that we only need the term E[X2]. This is given by
so that
Copyright © 2003 Taylor & Francis Group LLC
We leave to the reader the proof that for a Poisson-distributed random
variable we have
(11.40)
while for a geometric pdf we get
(11.41)
If now we turn our attention to the continuous Gaussian pdf of eq (11.25)
we can calculate the mean using eq (11.33a) with the change of variable so
that
and we get
(11.42)
because the first integral is equal to zero, while
analogous line of reasoning leads to
An
(11.43)
so that the parameters µ and σ of the Gaussian pdf are precisely the mean
and the standard deviation of the random variable X.
Often, it is convenient to use a standardized form of the random variable
in question. Given a random variable X, we can define the random variable
Z as
which is linearly related to X and always has zero
mean and unit variance. The third and fourth moments of this standardized
random variable are also given special names and are called skewness and
kurtosis. Indicating these descriptors with the symbols α 3 and α 4,
respectively, we have
(11.44)
The meanings of the quantities above are generally well known by every
engineer or scientist. The mean (together with the median and the mode) is
a measure of location, while the variance and the standard deviation are
Copyright © 2003 Taylor & Francis Group LLC
measures of scatter (dispersion) of the random variable about its mean.
Skewness and kurtosis, in turn, have to do with the shape of the probability
density function. More specifically, the skewness will be zero for a pdf
symmetrical about the mean, positive for a pdf with a longer tail to the right
and negative when the tail to the left is more prominent. Kurtosis, on the
other hand describes the ‘peakedness’ of the pdf. For example, a Gaussian
pdf has
and a pdf with, say,
has a high, narrow peak and
fatter tails (with respect to a pdf with smaller kurtosis) far away from the
mean.
For the measures of location or of central tendency, we can briefly say
that the mean is the abscissa of the ‘centre of gravity’ of the area under the
pdf curve, the median M2 is the value of the random variable for which
(11.45)
while the mode M3 is the abscissa of the maximum of the pdf.
At this point, it is important to note that there exist many other distribution
functions and we have only considered a few of the most frequently
encountered. The interested reader is referred to specific texts for more
information.
The usefulness of moment analysis is due to the fact that, in many
problems, we do not know exactly the pdf but we only have an idea of
which one it might be. Then, a knowledge of the first few moments—which,
in turn, are estimated from the experimental data—can allow the evaluation
of the parameters of the (unknown) pdf, in order to accept or reject our
initial hypothesis. In other words, the lowest order moments of a random
variable constitute a first step towards a description of the distribution
function underlying a random process. The information they provide is
incomplete (all the moments would be needed to have a complete description)
but is very useful for many practical purposes.
11.4.1 Characteristic function of a random variable
We introduce here the concept of characteristic function of a random variable
which, besides the cdf and the pdf, provides an alternative way of completely
characterizing a random variable. The characteristic function of a random
variable X is denoted
and is defined as
(11.46a)
where ω is a real variable. It is evident that pX(x) and X(ω) bear a close
resemblance to a Fourier transform pair. Furthermore, we know that the pdf
Copyright © 2003 Taylor & Francis Group LLC
has a Fourier transform because, owing to the normalization condition, the
integral (11.46) verifies the Dirichlet condition
Also
(11.46b)
A principal use of the characteristic function has to do with its momentgenerating property. If we differentiate eq (11.46a) with respect to ω we
obtain
then, letting
in the above expression, we get
(11.47)
Continuing this process and differentiating m times, if the mth moment of
X is finite, we have
(11.48)
meaning that if we know the characteristic function we can find the moments
of the random variable in question by simply differentiating that function
and then evaluating the derivative at
Of course, if we know the pdf
we always have to perform the integration of eq (11.46a), but if more than
one moment is needed this is one integration only, rather than one for each
moment to be calculated.
Thus, if all the moments of X exist we can expand in a Taylor series the
function
about the origin to get
(11.49)
For example, for a Gaussian distributed random variable X we can once again
make use of the standardized random variable
so that
Copyright © 2003 Taylor & Francis Group LLC
Furthermore
and finally
(11.50)
From eq (11.50) it is easy to determine that
so that, as
expected (eq (11.47))
It is left to the reader to verify that
Then, by virtue of eqs (11.48) and (11.37) we
get which is the same result as eq (11.43). It must be noted that for a Gaussian
distribution all moments are functions of the two parameters µ and s only,
meaning that the normal distribution is completely characterized by its mean
and variance.
Finally, it may be worth mentioning the fact that the so called logcharacteristic function is also convenient in some circumstances. This function
is defined as the natural logarithm of
11.5 More than one random variable
All the concepts introduced in the previous sections can be extended to the case
of two or more random variables. Consider a probability space (W, , P) and
let
be n random variables according to the
definition of Section 11.3. Then we can consider n real numbers xj and introduce
the joint cumulative distribution function
as
(11.51)
If and whenever convenient, both the Xjs and the xjs can be written as
column vectors, i.e.
and
so that
the joint distribution function is written simply Fx(x). Equation (11.51) in
words means that the joint cdf expresses the probability that all the
inequalities
take place simultaneously.
Copyright © 2003 Taylor & Francis Group LLC
If now, for simplicity, we consider the case of two random variables X
and Y (the ‘bivariate’ case) it is not difficult to see that the following properties
hold:
(11.52)
If there exists a function pXY(x, y) such that for every x and y
(11.53)
this function is called the joint probability density function of X and Y. This
joint pdf can be obtained from FXY(x, y) by differentiation, i.e.
(11.54)
The one-dimensional functions
(11.55)
are called marginal distributions of the random variables X and Y, respectively.
Also, we have the following properties for
(11.56)
Copyright © 2003 Taylor & Francis Group LLC
and the one-dimensional functions pX(x) and pY(y) are called marginal density
functions: pX(x)dx is the probability that
while Y can assume
any value within its range of definition. Similarly, pY(y)dy is the probability
that
when X can assume any value between – and + . These
concepts can be extended to the case of n random variables.
In Section 11.2.1 we introduced the concept of conditional probability.
Following the definition given by eq (11.7) we can define the cdf FX(x|y) as
(11.57)
and similarly for FY(y|x). In terms of probability density functions, the
conditional pdf that
given that
can be expressed as
(11.58)
provided that
From eq (11.58) it follows that
(11.59a)
where pY(y) is the marginal pdf of Y. Similarly
(11.59b)
so that
(11.60)
which is the multiplication rule for infinitesimal probabilities, i.e. the
counterpart of eq (11.8a). The key idea in this case is that a conditional pdf
is truly a probability density function, meaning that, for example, we can
calculate the expected value of X given that Y=y from the expression
(11.61)
In this regard we may note that E[X|y] is a function of y, i.e. different
conditional expected values are obtained for different values of y. If now we
let Y range over all its possible values we obtain a function of the random
Copyright © 2003 Taylor & Francis Group LLC
variable Y (i.e.)
and we can calculate its expected value as
(taking eqs (11.35b), (11.61) and (11.60) into account)
which expresses the interesting result
(11.62)
Similarly These formulas often provide a more efficient way for calculating
the expected values E[X] or E[Y].
Proceeding in our discussion, we can now consider the important concept
of independence. In terms of random variables, independence has to do with
the fact that knowledge of, ay, X gives no information whatsoever on Y and
vice versa. This occurrence is expressed mathematically by the fact that the
joint distribution function can be written as a product of the individual
marginal distribution functions, i.e. the random variables X and Y are
independent if and only if
(11.63)
or, equivalently
(11.64)
If now we consider the descriptors of two or more random variables we
can define the joint moments of X and Y defined by the expression
(11.65)
or the central moments
(11.66)
where
and
Copyright © 2003 Taylor & Francis Group LLC
Particularly important is the
second-order central moment which is called the covariance
(or ΓXY are all widely adopted symbols) of X and Y, i.e.
(11.67a)
which is often expressed in nondimensional form by introducing the
correlation coefficient ρXY:
(11.67b)
For two independent variables eq (11.64) holds, this means
and
so that if the two standard
deviations sX and sY are not equal to zero we have
(11.68)
Equation (11.68) expresses the fact that the two random variables are
uncorrelated. It must be noted that two independent variables are uncorrelated
but the reverse is not necessarily true, i.e. if eq (11.68) or
holds, it
does not necessarily mean that X and Y are independent. However, this
statement is true for normally (Gaussian) distributed random variables.
The correlation coefficient satisfies the inequalities
and is a
measure of how closely the two random variables are linearly related. In the
two extreme cases
or
there is a perfect linear relationship
between X and Y.
In the case of n random variables
the matrix notation proves
to be convenient and one can form the n×n matrix of products XXT and
introduce the covariance and the correlation matrices K and r. This latter,
for example, is given by
(11.69)
For n mutually independent random variables r=I where I is the nìn
identity matrix.
Copyright â 2003 Taylor & Francis Group LLC