Tài liệu Thuật toán ICA - 13: Practical Considerations pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (221.25 KB, 10 trang )

13
Practical Considerations
In the preceding chapters, we presented several approaches for the estimation of
the independent component analysis (ICA) model. In particular, several algorithms
were proposed for the estimation of the basic version of the model, which has a
square mixing matrix and no noise. Now we are, in principle, ready to apply those
algorithms on real data sets. Many such applications will be discussed in Part IV.
However, when applying the ICA algorithms to real data, some practical con-
siderations arise and need to be taken into account. In this chapter, we discuss
different problems that may arise, in particular, overlearning and noise in the data.
We also propose some preprocessing techniques (dimension reduction by principal
component analysis, time ﬁltering) that may be useful and even necessary before the
application of the ICA algorithms in practice.
13.1 PREPROCESSING BY TIME FILTERING
The success of ICA for a given data set may depend crucially on performing some
application-dependent preprocessing steps. In the basic methods discussed in the
previous chapters, we always used centering in preprocessing, and often whitening
was done as well. Here we discuss further preprocessing methods that are not
necessary in theory, but are often very useful in practice.
263
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright

2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
264
PRACTICAL CONSIDERATIONS
13.1.1 Why time ﬁltering is possible
In many cases, the observed random variables are, in fact, time signals or time series,

which means that they describe the time course of some phenomenon or system.
Thus the sample index
t
in
x
i
(t)
is a time index. In such a case, it may be very useful
to ﬁlter the signals. In other words, this means taking moving averages of the time
series. Of course, in the ICA model no time structure is assumed, so ﬁltering is not
always possible: If the sample points
x(t)
cannot be ordered in any meaningful way
with respect to
t
, ﬁltering is not meaningful, either.
For time series, any linear ﬁltering of the signals is allowed, since it does not
change the ICA model. In fact, if we ﬁlter linearly the observed signals
x
i
(t)
to
obtain new signals, say
x

i
(t)
, the ICA model still holds for
x


i
(t)
, with the same
mixing matrix. This can be seen as follows. Denote by
X
the matrix that contains
the observations
x(1):::x(T )
as its columns, and similarly for
S
. Then the ICA
model can be expressed as:
X = AS
(13.1)
Now, time ﬁltering of
X
corresponds to multiplying
X
from the right by a matrix, let
us call it
M
.Thisgives
X

= XM = ASM = AS

(13.2)
which shows that the ICA model still remains valid. The independent components
are ﬁltered by the same ﬁltering that was applied on the mixtures. They are not
mixed with each other in

S

because the matrix
M
is by deﬁnition a component-wise
ﬁltering matrix.
Since the mixing matrix remains unchanged, we can use the ﬁltered data in the
ICA estimating method only. After estimating the mixing matrix, we can apply the
same mixing matrix on the original data to obtain the independent components.
The question then arises what kind of ﬁltering could be useful. In the following,
we consider three different kinds of ﬁltering: high-pass and low-pass ﬁltering, as
well as their compromise.
PREPROCESSING BY TIME FILTERING
265
13.1.2 Low-pass ﬁltering
Basically, low-pass ﬁltering means that every sample point is replaced by a weighted
average of that point and the points immediately before it.
1
This is a form of
smoothing the data. Then the matrix
M
in (13.2) would be something like
M =
1
3
0
B
B
B
B

B
B
B
B
B
B
B
B
@
.
.
.
:::1 1 1 0 0 0 0 0 :::
:::0 1 1 1 0 0 0 0 :::
:::0 0 1 1 1 0 0 0 :::
:::0 0 0 1 1 1 0 0 :::
:::0 0 0 0 1 1 1 0 :::
:::0 0 0 0 0 1 1 1 :::
.
.
.
1
C
C
C
C
C
C
C
C

C
C
C
C
A
(13.3)
Low-pass ﬁltering is often used because it tends to reduce noise. This is a well-
known property in signal processing that is explained in most basic signal processing
textbooks.
In the basic ICA model, the effect of noise is more or less neglected; see Chapter 15
for a detailed discussion. Thus basic ICA methods work much better with data that
does not have much noise, and reducing noise is thus useful and sometimes even
necessary.
A possible problem with low-pass ﬁltering is that it reduces the information in the
data, since the fast-changing, high-frequency features of the data are lost. It often
happens that this leads to a reduction of independence as well (see next section).
13.1.3 High-pass ﬁltering and innovations
High-pass ﬁltering is the opposite of low-pass ﬁltering. The point is to remove slowly
changing trends from the data. Thus a low-pass ﬁltered version is subtracted from
the signal. A classic way of doing high-pass ﬁltering is differencing, which means
replacing every sample point by the difference between the value at that point and
the value at the preceding point. Thus, the matrix
M
in (13.2) would be
M =
0
B
B
B
B

B
B
B
B
B
B
B
B
@
.
.
.
:::1 1 0 0 0 0 0 :::
:::0 1 1 0 0 0 0 :::
:::0 0 1 1 0 0 0 :::
:::0 0 0 1 1 0 0 :::
:::0 0 0 0 1 1 0 :::
:::0 0 0 0 0 1 1 :::
.
.
.
1
C
C
C
C
C
C
C
C

C
C
C
C
A
(13.4)
1
To have a causal ﬁlter, points after the current point may be left out of the averaging.
266
PRACTICAL CONSIDERATIONS
High-pass ﬁltering may be useful in ICA because in certain cases it increases
the independence of the components. It often happens in practice that the compo-
nents have slowly changing trends or ﬂuctuations, in which case they are not very
independent. If these slow ﬂuctuations are removed by high-pass ﬁltering the ﬁl-
tered components are often much more independent. A more principled approach to
high-pass ﬁltering is to consider it in the light of innovation processes.
Innovation processes
Given a stochastic process
s(t)
, we deﬁne its innovation
process
~
s(t)
as the error of the best prediction of
s(t)
, given its past. Such a best
prediction is given by the conditional expectation of
s(t)
given its past, because it
is the expected value of the conditional distribution of

s(t)
given its past. Thus the
innovation process of
~
s(t)
is deﬁned by
~
s(t)=s(t)  E fs(t)js(t  1) s(t  2):::g
(13.5)
The expression “innovation” describes the fact that
~
s(t)
contains all the new infor-
mation about the process that can be obtained at time
t
by observing
s(t)
.
The concept of innovations can be utilized in the estimation of the ICA model due
to the following property:
Theorem 13.1 If
x(t)
and
s(t)
follow the basic ICA model, then the innovation
processes
~
x(t)
and
~

s(t)
follow the ICA model as well. In particular, the components
~s
i
(t)
are independent from each other.
On the other hand, independence of the innovations does not imply the indepen-
dence of the
s
i
(t)
. Thus, the innovations are more often independent from each
other than the original processes. Moreover, one could argue that the innovations
are usually more nongaussian than the original processes. This is because the
s
i
(t)
is a kind of moving average of the innovation process, and sums tend to be more
gaussian than the original variable. Together these mean that the innovation process
is more susceptible to be independent and nongaussian, and thus to fulﬁll the basic
assumptions in ICA.
Innovation processes were discussed in more detail in [194], where it was also
shown that using innovations, it is possible to separate signals (images of faces) that
are otherwise strongly correlated and very difﬁcult to separate.
The connection between innovations and ordinary ﬁltering techniques is that the
computation of the innovation process is often rather similar to high-pass ﬁltering.
Thus, the arguments in favor of using innovation processes apply at least partly in
favor of high-pass ﬁltering.
A possible problem with high-pass ﬁltering, however, is that it may increase noise
for the same reasons that low-pass ﬁltering decreases noise.

13.1.4 Optimal ﬁltering
Both of the preceding types of ﬁltering have their pros and cons. The optimum would
be to ﬁnd a ﬁlter that increases the independence of the components while reducing
PREPROCESSING BY PCA
267
noise. To achieve this, some compromise between high- and low-pass ﬁltering may
be the best solution. This leads to band-pass ﬁltering, in which the highest and the
lowest frequencies are ﬁltered out, leaving a suitable frequency band in between.
What this band should be depends on the data and general answers are impossible to
give.
In addition to simple low-pass/high-pass ﬁltering, one might also use more so-
phisticated techniques. For example, one might take the (1-D) wavelet transforms of
the data [102, 290, 17]. Other time-frequency decompositions could be used as well.
13.2 PREPROCESSING BY PCA
A common preprocessing technique for multidimensional data is to reduce its dimen-
sion by principal component analysis (PCA). PCA was explained in more detail in
Chapter 6. Basically, the data is projected linearly onto a subspace
~
x = E
n
x
(13.6)
so that the maximum amount of information (in the least-squares sense) is preserved.
Reducing dimension in this way has several beneﬁts which we discuss in the next
subsections.
13.2.1 Making the mixing matrix square
First, let us consider the case where the the number of independent components
n
is smaller than the number of mixtures, say
m

. Performing ICA on the mixtures
directly can cause big problems in such a case, since the basic ICA model does not
hold anymore. Using PCA we can reduce the dimension of the data to
n
. After such
a reduction, the number of mixtures and ICs are equal, the mixing matrix is square,
and the basic ICA model holds.
The question is whether PCA is able to ﬁnd the subspace correctly, so that the
n
ICs can be estimated from the reduced mixtures. This is not true in general, but
in a special case it turns out to be the case. If the data consists of
n
ICs only, with
no noise added, the whole data is contained in an
n
-dimensional subspace. Using
PCA for dimension reduction clearly ﬁnds this
n
-dimensional subspace, since the
eigenvalues corresponding to that subspace, and only those eigenvalues, are nonzero.
Thus reducing dimension with PCA works correctly. In practice, the data is usually
not exactly contained in the subspace, due to noise and other factors, but if the noise
level is low, PCA still ﬁnds approximately the right subspace; see Section 6.1.3. In
the general case, some “weak” ICs may be lost in the dimension reduction process,
but PCA may still be a good idea for optimal estimation of the “strong” ICs [313].
Performing ﬁrst PCA and then ICA has an interesting interpretation in terms of
factor analysis. In factor analysis, it is conventional that after ﬁnding the factor
subspace, the actual basis vectors for that subspace are determined by some criteria

Tài liệu Thuật toán ICA - 13: Practical Considerations pptx

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về