Part III
EXTENSIONS AND
RELATED METHODS
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright
2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
15
Noisy ICA
In real life, there is always some kind of noise present in the observations. Noise
can correspond to actual physical noise in the measuring devices, or to inaccuracies
of the model used. Therefore, it has been proposed that the independent component
analysis (ICA) model should include a noise term as well. In this chapter, we consider
different methods for estimating the ICA model when noise is present.
However, estimation of the mixing matrix seems to be quite difficult when noise
is present. It could be argued that in practice, a better approach could often be to
reduce noise in the data before performing ICA. For example, simple filtering of
time-signals is often very useful in this respect, and so is dimension reduction by
principal component analysis (PCA); see Sections 13.1.2 and 13.2.2.
In noisy ICA, we also encounter a new problem: estimation of the noise-free
realizations of the independent components (ICs). The noisy model is not invertible,
and therefore estimation of the noise-free components requires new methods. This
problem leads to some interesting forms of denoising.
15.1 DEFINITION
Here we extend the basic ICA model to the situation where noise is present. The
noise is assumed to be additive. This is a rather realistic assumption, standard in
factor analysis and signal processing, and allows for a simple formulation of the noisy
model. Thus, the noisy ICA model can be expressed as
x = As + n
(15.1)
293
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright
2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
294
NOISY ICA
where
n =(n
1
::: n
n
)
is the noise vector. Some further assumptions on the noise
are usually made. In particular, it is assumed that
1. The noise is independent from the independent components.
2. The noise is gaussian.
The covariance matrix of the noise, say
, is often assumed to of the form
2
I
,but
this may be too restrictive in some cases. In any case, the noise covariance is assumed
to be known. Little work on estimation of an unknown noise covariance has been
conducted; see [310, 215, 19].
The identifiability of the mixing matrix in the noisy ICA model is guaranteed
under the same restrictions that are sufficient in the basic case,
1
basically meaning
independence and nongaussianity. In contrast, the realizations of the independent
components
s
i
can no longer be identified, because they cannot be completely sepa-
rated from noise.
15.2 SENSOR NOISE VS. SOURCE NOISE
In the typical case where the noise covariance is assumed to be of the form
2
I
,the
noise in Eq. (15.1) could be considered as “sensor” noise. This is because the noise
variables are separately added on each sensor, i.e., observed variable
x
i
. Thisisin
contrast to “source” noise, in which the noise is added to the independent components
(sources). Source noise can be modeled with an equation slightly different from the
preceding, given by
x = A(s + n)
(15.2)
where again the covariance of the noise is diagonal. In fact, we could consider the
noisy independent components, given by
~s
i
= s
i
+ n
i
, and rewrite the model as
x = A
~
s
(15.3)
We see that this is just the basic ICA model, with modified independent components.
What is important is that the assumptions of the basic ICA model are still valid: the
components of
~
s
are nongaussian and independent. Thus we can estimate the model
in (15.3) by any method for basic ICA. This gives us a perfectly suitable estimator
for the noisy ICA model. This way we can estimate the mixing matrix and the noisy
independent components. The estimation of the original independent components
from the noisy ones is an additional problem, though; see below.
This idea is, in fact, more general. Assume that the noise covariance has the form
= AA
T
2
(15.4)
1
This seems to be admitted by the vast majority of ICA researchers. We are not aware of any rigorous
proofs of this property, though.
T
FEW NOISE SOURCES
295
Then the noise vector can be transformed into another one
~
n = A
1
n
, which can be
called equivalent source noise. Then the equation (15.1) becomes
x = As + A
~
n = A(s +
~
n)
(15.5)
The point is that the covariance of
~
n
is
2
I
, and thus the transformed components in
s +
~
n
are independent. Thus, we see again that the mixing matrix
A
can be estimated
by basic ICA methods.
To recapitulate: if the noise is added to the independent components and not to the
observed mixtures, or has a particular covariance structure, the mixing matrix can be
estimated by ordinary ICA methods. The denoising of the independent components
is another problem, though; it will be treated in Section 15.5 below.
15.3 FEW NOISE SOURCES
Another special case that reduces to the basic ICA model can be found, when the
number of noise components and independent components is not very large. In
particular, if their total number is not larger than the number of mixtures, we again
have an ordinary ICA model, in which some of the components are gaussian noise and
others are the real independent components. Such a model could still be estimated
by the basic ICA model, using one-unit algorithms with less units than the dimension
of the data.
In other words, we could define the vector of the independent components as
~
s = (s
1
:::s
k
n
1
::: n
l
)
T
where the
s
i
i = 1 ::: k
are the “real” independent
components and the
n
i
i =1:::l
are the noise variables. Assume that the number
of mixtures equals
k + l
, that is the number of real ICs plus the number of noise
variables. In this case, the ordinary ICA model holds with
x = A
~
s
,where
A
is
a matrix that incorporates the mixing of the real ICs and the covariance structure
of the noise, and the number of the independent components in
~
s
is equal to the
number of observed mixtures. Therefore, finding the
k
most nongaussian directions,
we can estimate the real independent components. We cannot estimate the remaining
dummy independent components that are actually noise variables, but we did not
want to estimate them in the first place.
The applicability of this idea is quite limited, though, since in most cases we want
to assume that the noise is added on each mixture, in which case
k + l
, the number
of real ICs plus the number of noise variables, is necessarily larger than the number
of mixtures, and the basic ICA model does not hold for
~
s
.
15.4 ESTIMATION OF THE MIXING MATRIX
Not many methods for noisy ICA estimation exist in the general case. The estimation
of the noiseless model seems to be a challenging task in itself, and thus the noise is
usually neglected in order to obtain tractable and simple results. Moreover, it may
296
NOISY ICA
be unrealistic in many cases to assume that the data could be divided into signals and
noise in any meaningful way.
Here we treat first the problem of estimating the mixing matrix. Estimation of the
independent components will be treated below.
15.4.1 Bias removal techniques
Perhaps the most promising approach to noisy ICA is given by bias removal tech-
niques. This means that ordinary (noise-free) ICA methods are modified so that the
bias due to noise is removed, or at least reduced.
Let us denote the noise-free data in the following by
v = As
(15.6)
We can now use the basic idea of finding projections, say
w
T
v
, in which nongaus-
sianity, is locally maximized for whitened data, with constraint
kwk =1
.Asshown
in Chapter 8, projections in such directions give consistent estimates of the indepen-
dent components, if the measure of nongaussianity is well chosen. This approach
could be used for noisy ICA as well, if only we had measures of nongaussianity
which are immune to gaussian noise, or at least, whose values for the original data
can be easily estimated from noisy observations. We have
w
T
x = w
T
v + w
T
n
,
and thus the point is to measure the nongaussianity of
w
T
v
from the observed
w
T
x
so that the measure is not affected by the noise
w
T
n
.
Bias removal for kurtosis
If the measure of nongaussianity is kurtosis (the
fourth-order cumulant), it is almost trivial to construct one-unit methods for noisy
ICA, because kurtosis is immune to gaussian noise. This is because the kurtosis of
w
T
x
equals the kurtosis of
w
T
v
, as can be easily proven by the basic properties of
kurtosis.
It must be noted, however, that in the preliminary whitening, the effect of noise
must be taken into account; this is quite simple if the noise covariance matrix is
known. Denoting by
C = E fxx
T
g
the covariance matrix of the observed noisy
data, the ordinary whitening should be replaced by the operation
~
x =(C )
1=2
x
(15.7)
In other words, the covariance matrix
C
of the noise-free data should be used in
whitening instead of the covariance matrix
C
of the noisy data. In the following, we
call this operation “quasiwhitening”. After this operation, the quasiwhitened data
~
x
follows a noisy ICA model as well:
~
x = Bs +
~
n
(15.8)
where
B
is orthogonal,and
~
n
is a linear transform of the original noise in (15.1).
Thus, the theorem in Chapter 8 is valid for
~
x
, and finding local maxima of the absolute
value of kurtosis is a valid method for estimating the independent components.