Independent component analysis P19

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (284.26 KB, 16 trang )

19
Convolutive Mixtures and
Blind Deconvolution
This chapter deals with blind deconvolution and blind separation of convolutive
mixtures.
Blind deconvolution is a signal processing problem that is closely related to basic
independent component analysis (ICA) and blind source separation (BSS). In com-
munications and related areas, blind deconvolution is often called blind equalization.
In blind deconvolution, we have only one observed signal (output) and one source
signal (input). The observed signal consists of an unknown source signal mixed with
itself at different time delays. The task is to estimate the source signal from the
observed signal only, without knowing the convolving system, the time delays, and
mixing coefﬁcients.
Blind separation of convolutive mixtures considers the combined blind deconvolu-
tion and instantaneous blind source separation problem. This estimation task appears
under many different names in the literature: ICA with convolutive mixtures, mul-
tichannel blind deconvolution or identiﬁcation, convolutive signal separation, and
blind identiﬁcation of multiple-input-multiple-output (MIMO) systems. In blind
separation of convolutive mixtures, there are several source (input) signals and sev-
eral observed (output) signals just like in the instantaneous ICA problem. However,
the source signals have different time delays in each observed signal due to the ﬁnite
propagation speed in the medium. Each observed signal may also contain time-
delayed versions of the same source due to multipath propagation caused typically
by reverberations from some obstacles. Figure 23.3 in Chapter 23 shows an example
of multipath propagation in mobile communications.
In the following, we ﬁrst consider the simpler blind deconvolution problem, and
after that separation of convolutive mixtures. Many techniques for convolutive mix-
355
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja

Copyright

2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
356
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
tures have in fact been developed by extending methods designed originally for either
the blind deconvolution or standard ICA/BSS problems. In the appendix, certain basic
concepts of discrete-time ﬁlters needed in this chapter are brieﬂy reviewed.
19.1 BLIND DECONVOLUTION
19.1.1 Problem deﬁnition
In blind deconvolution [170, 171, 174, 315], it is assumed that the observed discrete-
time signal
x(t)
is generated from an unknown source signal
s(t)
by the convolution
model
x(t)=
1
X
k=1
a
k
s(t  k )
(19.1)
Thus, delayed versions of the source signal are mixed together. This situation appears
in many practical applications, for example, in communications and geophysics.
In blind deconvolution, both the source signal
s(t)

and the convolution coefﬁcients
a
k
are unknown. Observing
x(t)
only, we want to estimate the source signal
s(t)
.In
other words, we want to ﬁnd a deconvolution ﬁlter
y (t)=
1
X
k=1
h
k
x(t  k )
(19.2)
which provides a good estimate of the source signal
s(t)
at each time instant. This
is achieved by choosing the coefﬁcients
h
k
of the deconvolution ﬁlter suitably. In
practice the deconvolving ﬁnite impulse response (FIR) ﬁlter (see the Appendix
for deﬁnition) in Eq. (19.2) is assumed to be of sufﬁcient but ﬁnite length. Other
structures are possible, but this one is the standard choice.
To estimate the deconvolving ﬁlter, certain assumptions on the source signal
s(t)
must be made. Usually it is assumed that the source signal values

s(t)
at different
times
t
are nongaussian, statistically independent and identically distributed (i.i.d.).
The probability distribution of the source signal
s(t)
may be known or unknown. The
indeterminacies remaining in the blind deconvolution problem are that the estimated
source signal may have an arbitrary scaling (and sign) and time shift compared
with the true one. This situation is similar to the permutation and sign indeterminacy
encountered in ICA; the two models are,in fact, intimately related as will be explained
in Section 19.1.4.
Of course, the preceding ideal model usually does not exactly hold in practice.
There is often additive noise present, though we have omitted noise from the model
(19.1) for simplicity. The source signal sequence may not satisfy the i.i.d condition,
and its distribution is often unknown, or we may only know that the source signal is
subgaussian or supergaussian. Hence blind deconvolution often is a difﬁcult signal
processing task that can be solved only approximately, in practice.
BLIND DECONVOLUTION
357
If the linear time-invariant system (19.1) is minimum phase (see the Appendix),
then the blind deconvolution problem can be solved in a straightforward way. On the
above assumptions, the deconvolving ﬁlter is simply a whitening ﬁlter that temporally
whitens the observed signal sequence
fx(t)g
[171, 174]. However, in many appli-
cations, for example, in telecommunications, the system is typically nonminimum
phase [174] and this simple solution cannot be used.
We shall next discuss some popular approaches to blind deconvolution. Blind

deconvolution is frequently needed in communications applications where it is con-
venient to use complex-valued data. Therefore we present most methods for this
general case. The respective algorithms for real data are obtained as special cases.
Methods for estimating the ICA model with complex-valued data are discussed later
in Section 20.3.
19.1.2 Bussgang methods
Bussgang methods [39, 171, 174, 315] include some of the earliest algorithms [152,
392] proposed for blind deconvolution, but they are still widely used. In Bussgang
methods, a noncausal FIR ﬁlter structure
y (t)=
L
X
k=L
w

k
(t)x(t  k )
(19.3)
of length
2L +1
is used. Here

denotes the complex conjugate. The weights
w
k
(t)
of
the FIR ﬁlter depend on the time
t
, and they are adapted using the least-mean-square

(LMS) type algorithm [171]
w
k
(t +1) = w
k
(t)+x(t  k )e

(t) k = L::: L
(19.4)
where the error signal is deﬁned by
e(t)=g(y (t))  y (t)
(19.5)
In these equations,

is a positive learning parameter,
y (t)
is given by (19.3), and
g (:)
is a suitably chosen nonlinearity. It is applied separately to the real and imaginary
components of
y (t)
. The algorithm is initialized by setting
w
0
(0) = 1
,
w
k
(0) = 0
k 6=0

.
Assume that the ﬁlter length
2L +1
is large enough and the learning algorithm
has converged. It can be shown that then the following condition holds for the output
y (t)
of the FIR ﬁlter (19.3):
E
fy (t)y (t  k )g
E
fy (t)g (y (t  k ))g
(19.6)
A stochastic process that satisﬁes the condition (19.6) is called a Bussgang process.
The nonlinearity
g (t)
can be chosen in several ways, leading to different Bussgang
type algorithms [39, 171]. The Godard algorithm [152] is the best performing
Bussgang algorithm in the sense that it is robust and has the smallest mean-square
358
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
error after convergence; see [171] for details. The Godard algorithm minimizes the
nonconvex cost function
J
p
(t)=
E
fjy (t)j
p
 
p

]
2
g
(19.7)
where
p
is a positive integer and

p
is a positive real constant deﬁned by the statistics
of the source signal:

p
=
E
fjs(t)j
2p
g
E
fjs(t)j
p
g
(19.8)
The constant

p
is chosen in such a way that the gradient of the cost function
J
p
(t)

is zero when perfect deconvolution is attained, that is, when
y (t)
=
s(t)
. The error
signal (19.5) in the gradient algorithm (19.4) for minimizing the cost function (19.7)
with respect to the weight
w
k
(t)
has the form
e(t)=y (t)jy (t)j
p2

p
jy (t)j
p
]
(19.9)
In computing
e(t)
, the expectation in (19.7) has been omitted for getting a simpler
stochastic gradient type algorithm. The respective nonlinearity
g (y (t))
is given by
[171]
g (y (t)) = y (t)+y(t)jy (t)j
p2

p

jy (t)j
p
]
(19.10)
Among the family of Godard algorithms,the so-called constant modulus algorithm
(CMA) is widely used. It is obtained by setting
p =2
in the above formulas. The
cost function (19.7) is then related to the minimization of the kurtosis. The CMA and
more generally Godard algorithms perform appropriately for subgaussian sources
only, but in communications applications the source signals are subgaussian.
1
.The
CMA algorithm is the most successful blind equalization (deconvolution) algorithm
used in communications due to its low complexity, good performance, and robustness
[315].
Properties of the CMA cost function and algorithm have been studied thoroughly
in [224]. The constant modulus property possessed by many types of communications
signals has been exploited also in developing efﬁcient algebraic blind equalization
and source separation algorithms [441]. A good general review of Bussgang type
blind deconvolution methods is [39].
19.1.3 Cumulant-based methods
Another popular group of blind deconvolution methods consists of cumulant-based
approaches [315, 170, 174, 171]. They apply explicitly higher-order statistics of
the observed signal
x(t)
, while in the Bussgang methods higher-order statistics
1
The CMA algorithm can be applied to blind deconvolution of supergaussian sources by using a negative
learning parameter


in (19.4); see [11]
BLIND DECONVOLUTION
359
are involved into the estimation process implicitly via the nonlinear function
g ()
.
Cumulants have been deﬁned and discussed brieﬂy in Chapter 2.
Shalvi and Weinstein [398] have derived necessary and sufﬁcient conditions and
a set of cumulant-based criteria for blind deconvolution. In particular, they intro-
duced a stochastic gradient algorithm for maximizing a constrained kurtosis based
criterion. We shall next describe this algorithm brieﬂy, because it is computationally
simple, converges globally, and can be applied equally well to both subgaussian and
supergaussian source signals
s(t)
.
Assume that the source (input) signal
s(t)
is complex-valued and symmetric,
satisfying the condition E
fs(t)
2
g
=
0
. Assume that the length of the causal FIR
deconvolution ﬁlter is
M
. The output
z (t)

of this ﬁlter at discrete time
t
can then be
expressed compactly as the inner product
z (t)=w
T
(t)y(t)
(19.11)
where the
M
-dimensional ﬁlter weight vector
w(t)
and output vector
y(t)
at time
t
are respectively deﬁned by
y(t)=y (t)y(t  1)::: y(t  M + 1)]
T
(19.12)
w(t)=w(t)w(t  1)::: w(t  M +1)]
T
(19.13)
Shalvi and Weinstein’s algorithm is then given by [398, 351]
u(t +1) = u(t)+
sign
(
s
)jz (t)j
2

z (t)]y

(t)
w(t +1) = u(t +1)= k u(t +1) k
(19.14)
Here

s
is the kurtosis of
s(t)
,
kk
is the usual Euclidean norm, and the unnormalized
ﬁlter weight vector
u(t)
is deﬁned quite similarly as
w(t)
in (19.13).
It is important to notice that Shalvi and Weinstein’s algorithm (19.14) requires
whitening of the output signal
y (t)
for performing appropriately (assuming that
s(t)
is white, too). For a single complex-valued signal sequence (time series)
fy (t)g
,the
temporal whiteness condition is
E
fy (t)y


(t  k )g = 
2
y

tk
=
(

2
y
 t = k
0 t 6= k
(19.15)
where the variance of
y (t)
is often normalized to unity:

2
y
=1
. Temporal whitening
can be achieved by spectral prewhitening in the Fourier domain, or by using time-
domain techniques such as linear prediction [351]. Linear prediction techniques have
been discussed for example in the books [169, 171, 419].
Shalvi and Weinstein have presented a somewhat more complicated algorithm
for the case E
fs(t)
2
g 6= 0
in [398]. Furthermore, they showed that there exists a

close relationship between their algorithm and the CMA algorithm discussed in the
previous subsection; see also [351]. Later, they derived fast converging but more
involved super-exponential algorithms for blind deconvolution in [399]. Shalvi and
360
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
Weinstein have reviewed their blind deconvolution methods in [170]. Closely related
algorithms were proposed earlier in [114, 457].
It is interesting to note that Shalvi and Weinstein’s algorithm (19.14) can be
derived by maximizing the absolute value of the kurtosis of the ﬁltered (deconvolved)
signal
z (t)
under the constraint that the output signal
y (t)
is temporally white [398,
351]. The temporal whiteness condition leads to the normalization constraint of
the weight vector
w(t)
in (19.14). The corresponding criterion for standard ICA is
familiar already from Chapter 8, where gradient algorithms similar to (19.14) have
been discussed. Also Shalvi and Weinstein’s super-exponential algorithm [399] is
very similar to the cumulant-based FastICA as introduced in Section 8.2.3. The
connection between blind deconvolution and ICA is discussed in more detail in the
next subsection.
Instead of cumulants, one can resort to higher-order spectra or polyspectra [319,
318]. They are deﬁned as Fourier transforms of the cumulants quite similarly as
the power spectrum is deﬁned as a Fourier transform of the autocorrelation function
(see Section 2.8.5). Polyspectra provide a basis for blind deconvolution and more
generally identiﬁcation of nonminimum-phase systems, because they preserve phase
information of the observed signal. However, blind deconvolution methods based
on higher-order spectra tend to be computationally more complex than Bussgang

methods, and converge slowly [171]. Therefore, we shall not discuss them here. The
interested reader can ﬁnd more information on those methods in [170, 171, 315].
19.1.4 Blind deconvolution using linear ICA
In deﬁning the blind deconvolution problem, the values of the original signal
s(t)
were assumed to be independent for different
t
and nongaussian. Therefore, the blind
deconvolution problem is formally closely related to the standard ICA problem. In
fact, one can deﬁne a vector
~
s(t)=s(t)s(t  1):::s(t  n +1)]
T
(19.16)
by collecting
n
last values of the source signal, and similarly deﬁne
~
x(t)=x(t)x(t  1):::x(t  n +1)]
T
(19.17)
Then
~
x
and
~
s
are
n
-dimensional vectors, and the convolution (19.1) can be expressed

for a ﬁnite number of values of the summation index
k
as
~
x = A
~
s
(19.18)
where
A
is a matrix that contains the coefﬁcients
a
k
of the convolution ﬁlter as its
rows, at different positions for each row. This is the classic matrix representation
of a ﬁlter. This representation is not exact near the top and bottom rows, but for a
sufﬁciently large
n
, it is good enough in practice.
From (19.18) we see that the blind deconvolution problem is actually (approxi-
mately) a special case of ICA. The components of
s
are independent, and the mixing
is linear, so we get the standard linear ICA model.

Independent component analysis P19

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về