19
Convolutive Mixtures and
Blind Deconvolution
This chapter deals with blind deconvolution and blind separation of convolutive
mixtures.
Blind deconvolution is a signal processing problem that is closely related to basic
independent component analysis (ICA) and blind source separation (BSS). In com-
munications and related areas, blind deconvolution is often called blind equalization.
In blind deconvolution, we have only one observed signal (output) and one source
signal (input). The observed signal consists of an unknown source signal mixed with
itself at different time delays. The task is to estimate the source signal from the
observed signal only, without knowing the convolving system, the time delays, and
mixing coefficients.
Blind separation of convolutive mixtures considers the combined blind deconvolu-
tion and instantaneous blind source separation problem. This estimation task appears
under many different names in the literature: ICA with convolutive mixtures, mul-
tichannel blind deconvolution or identification, convolutive signal separation, and
blind identification of multiple-input-multiple-output (MIMO) systems. In blind
separation of convolutive mixtures, there are several source (input) signals and sev-
eral observed (output) signals just like in the instantaneous ICA problem. However,
the source signals have different time delays in each observed signal due to the finite
propagation speed in the medium. Each observed signal may also contain time-
delayed versions of the same source due to multipath propagation caused typically
by reverberations from some obstacles. Figure 23.3 in Chapter 23 shows an example
of multipath propagation in mobile communications.
In the following, we first consider the simpler blind deconvolution problem, and
after that separation of convolutive mixtures. Many techniques for convolutive mix-
355
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja
Copyright
2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
356
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
tures have in fact been developed by extending methods designed originally for either
the blind deconvolution or standard ICA/BSS problems. In the appendix, certain basic
concepts of discrete-time filters needed in this chapter are briefly reviewed.
19.1 BLIND DECONVOLUTION
19.1.1 Problem definition
In blind deconvolution [170, 171, 174, 315], it is assumed that the observed discrete-
time signal
x(t)
is generated from an unknown source signal
s(t)
by the convolution
model
x(t)=
1
X
k=1
a
k
s(t k )
(19.1)
Thus, delayed versions of the source signal are mixed together. This situation appears
in many practical applications, for example, in communications and geophysics.
In blind deconvolution, both the source signal
s(t)
and the convolution coefficients
a
k
are unknown. Observing
x(t)
only, we want to estimate the source signal
s(t)
.In
other words, we want to find a deconvolution filter
y (t)=
1
X
k=1
h
k
x(t k )
(19.2)
which provides a good estimate of the source signal
s(t)
at each time instant. This
is achieved by choosing the coefficients
h
k
of the deconvolution filter suitably. In
practice the deconvolving finite impulse response (FIR) filter (see the Appendix
for definition) in Eq. (19.2) is assumed to be of sufficient but finite length. Other
structures are possible, but this one is the standard choice.
To estimate the deconvolving filter, certain assumptions on the source signal
s(t)
must be made. Usually it is assumed that the source signal values
s(t)
at different
times
t
are nongaussian, statistically independent and identically distributed (i.i.d.).
The probability distribution of the source signal
s(t)
may be known or unknown. The
indeterminacies remaining in the blind deconvolution problem are that the estimated
source signal may have an arbitrary scaling (and sign) and time shift compared
with the true one. This situation is similar to the permutation and sign indeterminacy
encountered in ICA; the two models are,in fact, intimately related as will be explained
in Section 19.1.4.
Of course, the preceding ideal model usually does not exactly hold in practice.
There is often additive noise present, though we have omitted noise from the model
(19.1) for simplicity. The source signal sequence may not satisfy the i.i.d condition,
and its distribution is often unknown, or we may only know that the source signal is
subgaussian or supergaussian. Hence blind deconvolution often is a difficult signal
processing task that can be solved only approximately, in practice.
BLIND DECONVOLUTION
357
If the linear time-invariant system (19.1) is minimum phase (see the Appendix),
then the blind deconvolution problem can be solved in a straightforward way. On the
above assumptions, the deconvolving filter is simply a whitening filter that temporally
whitens the observed signal sequence
fx(t)g
[171, 174]. However, in many appli-
cations, for example, in telecommunications, the system is typically nonminimum
phase [174] and this simple solution cannot be used.
We shall next discuss some popular approaches to blind deconvolution. Blind
deconvolution is frequently needed in communications applications where it is con-
venient to use complex-valued data. Therefore we present most methods for this
general case. The respective algorithms for real data are obtained as special cases.
Methods for estimating the ICA model with complex-valued data are discussed later
in Section 20.3.
19.1.2 Bussgang methods
Bussgang methods [39, 171, 174, 315] include some of the earliest algorithms [152,
392] proposed for blind deconvolution, but they are still widely used. In Bussgang
methods, a noncausal FIR filter structure
y (t)=
L
X
k=L
w
k
(t)x(t k )
(19.3)
of length
2L +1
is used. Here
denotes the complex conjugate. The weights
w
k
(t)
of
the FIR filter depend on the time
t
, and they are adapted using the least-mean-square
(LMS) type algorithm [171]
w
k
(t +1) = w
k
(t)+x(t k )e
(t) k = L::: L
(19.4)
where the error signal is defined by
e(t)=g(y (t)) y (t)
(19.5)
In these equations,
is a positive learning parameter,
y (t)
is given by (19.3), and
g (:)
is a suitably chosen nonlinearity. It is applied separately to the real and imaginary
components of
y (t)
. The algorithm is initialized by setting
w
0
(0) = 1
,
w
k
(0) = 0
k 6=0
.
Assume that the filter length
2L +1
is large enough and the learning algorithm
has converged. It can be shown that then the following condition holds for the output
y (t)
of the FIR filter (19.3):
E
fy (t)y (t k )g
E
fy (t)g (y (t k ))g
(19.6)
A stochastic process that satisfies the condition (19.6) is called a Bussgang process.
The nonlinearity
g (t)
can be chosen in several ways, leading to different Bussgang
type algorithms [39, 171]. The Godard algorithm [152] is the best performing
Bussgang algorithm in the sense that it is robust and has the smallest mean-square
358
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
error after convergence; see [171] for details. The Godard algorithm minimizes the
nonconvex cost function
J
p
(t)=
E
fjy (t)j
p
p
]
2
g
(19.7)
where
p
is a positive integer and
p
is a positive real constant defined by the statistics
of the source signal:
p
=
E
fjs(t)j
2p
g
E
fjs(t)j
p
g
(19.8)
The constant
p
is chosen in such a way that the gradient of the cost function
J
p
(t)
is zero when perfect deconvolution is attained, that is, when
y (t)
=
s(t)
. The error
signal (19.5) in the gradient algorithm (19.4) for minimizing the cost function (19.7)
with respect to the weight
w
k
(t)
has the form
e(t)=y (t)jy (t)j
p2
p
jy (t)j
p
]
(19.9)
In computing
e(t)
, the expectation in (19.7) has been omitted for getting a simpler
stochastic gradient type algorithm. The respective nonlinearity
g (y (t))
is given by
[171]
g (y (t)) = y (t)+y(t)jy (t)j
p2
p
jy (t)j
p
]
(19.10)
Among the family of Godard algorithms,the so-called constant modulus algorithm
(CMA) is widely used. It is obtained by setting
p =2
in the above formulas. The
cost function (19.7) is then related to the minimization of the kurtosis. The CMA and
more generally Godard algorithms perform appropriately for subgaussian sources
only, but in communications applications the source signals are subgaussian.
1
.The
CMA algorithm is the most successful blind equalization (deconvolution) algorithm
used in communications due to its low complexity, good performance, and robustness
[315].
Properties of the CMA cost function and algorithm have been studied thoroughly
in [224]. The constant modulus property possessed by many types of communications
signals has been exploited also in developing efficient algebraic blind equalization
and source separation algorithms [441]. A good general review of Bussgang type
blind deconvolution methods is [39].
19.1.3 Cumulant-based methods
Another popular group of blind deconvolution methods consists of cumulant-based
approaches [315, 170, 174, 171]. They apply explicitly higher-order statistics of
the observed signal
x(t)
, while in the Bussgang methods higher-order statistics
1
The CMA algorithm can be applied to blind deconvolution of supergaussian sources by using a negative
learning parameter
in (19.4); see [11]
BLIND DECONVOLUTION
359
are involved into the estimation process implicitly via the nonlinear function
g ()
.
Cumulants have been defined and discussed briefly in Chapter 2.
Shalvi and Weinstein [398] have derived necessary and sufficient conditions and
a set of cumulant-based criteria for blind deconvolution. In particular, they intro-
duced a stochastic gradient algorithm for maximizing a constrained kurtosis based
criterion. We shall next describe this algorithm briefly, because it is computationally
simple, converges globally, and can be applied equally well to both subgaussian and
supergaussian source signals
s(t)
.
Assume that the source (input) signal
s(t)
is complex-valued and symmetric,
satisfying the condition E
fs(t)
2
g
=
0
. Assume that the length of the causal FIR
deconvolution filter is
M
. The output
z (t)
of this filter at discrete time
t
can then be
expressed compactly as the inner product
z (t)=w
T
(t)y(t)
(19.11)
where the
M
-dimensional filter weight vector
w(t)
and output vector
y(t)
at time
t
are respectively defined by
y(t)=y (t)y(t 1)::: y(t M + 1)]
T
(19.12)
w(t)=w(t)w(t 1)::: w(t M +1)]
T
(19.13)
Shalvi and Weinstein’s algorithm is then given by [398, 351]
u(t +1) = u(t)+
sign
(
s
)jz (t)j
2
z (t)]y
(t)
w(t +1) = u(t +1)= k u(t +1) k
(19.14)
Here
s
is the kurtosis of
s(t)
,
kk
is the usual Euclidean norm, and the unnormalized
filter weight vector
u(t)
is defined quite similarly as
w(t)
in (19.13).
It is important to notice that Shalvi and Weinstein’s algorithm (19.14) requires
whitening of the output signal
y (t)
for performing appropriately (assuming that
s(t)
is white, too). For a single complex-valued signal sequence (time series)
fy (t)g
,the
temporal whiteness condition is
E
fy (t)y
(t k )g =
2
y
tk
=
(
2
y
t = k
0 t 6= k
(19.15)
where the variance of
y (t)
is often normalized to unity:
2
y
=1
. Temporal whitening
can be achieved by spectral prewhitening in the Fourier domain, or by using time-
domain techniques such as linear prediction [351]. Linear prediction techniques have
been discussed for example in the books [169, 171, 419].
Shalvi and Weinstein have presented a somewhat more complicated algorithm
for the case E
fs(t)
2
g 6= 0
in [398]. Furthermore, they showed that there exists a
close relationship between their algorithm and the CMA algorithm discussed in the
previous subsection; see also [351]. Later, they derived fast converging but more
involved super-exponential algorithms for blind deconvolution in [399]. Shalvi and
360
CONVOLUTIVE MIXTURES AND BLIND DECONVOLUTION
Weinstein have reviewed their blind deconvolution methods in [170]. Closely related
algorithms were proposed earlier in [114, 457].
It is interesting to note that Shalvi and Weinstein’s algorithm (19.14) can be
derived by maximizing the absolute value of the kurtosis of the filtered (deconvolved)
signal
z (t)
under the constraint that the output signal
y (t)
is temporally white [398,
351]. The temporal whiteness condition leads to the normalization constraint of
the weight vector
w(t)
in (19.14). The corresponding criterion for standard ICA is
familiar already from Chapter 8, where gradient algorithms similar to (19.14) have
been discussed. Also Shalvi and Weinstein’s super-exponential algorithm [399] is
very similar to the cumulant-based FastICA as introduced in Section 8.2.3. The
connection between blind deconvolution and ICA is discussed in more detail in the
next subsection.
Instead of cumulants, one can resort to higher-order spectra or polyspectra [319,
318]. They are defined as Fourier transforms of the cumulants quite similarly as
the power spectrum is defined as a Fourier transform of the autocorrelation function
(see Section 2.8.5). Polyspectra provide a basis for blind deconvolution and more
generally identification of nonminimum-phase systems, because they preserve phase
information of the observed signal. However, blind deconvolution methods based
on higher-order spectra tend to be computationally more complex than Bussgang
methods, and converge slowly [171]. Therefore, we shall not discuss them here. The
interested reader can find more information on those methods in [170, 171, 315].
19.1.4 Blind deconvolution using linear ICA
In defining the blind deconvolution problem, the values of the original signal
s(t)
were assumed to be independent for different
t
and nongaussian. Therefore, the blind
deconvolution problem is formally closely related to the standard ICA problem. In
fact, one can define a vector
~
s(t)=s(t)s(t 1):::s(t n +1)]
T
(19.16)
by collecting
n
last values of the source signal, and similarly define
~
x(t)=x(t)x(t 1):::x(t n +1)]
T
(19.17)
Then
~
x
and
~
s
are
n
-dimensional vectors, and the convolution (19.1) can be expressed
for a finite number of values of the summation index
k
as
~
x = A
~
s
(19.18)
where
A
is a matrix that contains the coefficients
a
k
of the convolution filter as its
rows, at different positions for each row. This is the classic matrix representation
of a filter. This representation is not exact near the top and bottom rows, but for a
sufficiently large
n
, it is good enough in practice.
From (19.18) we see that the blind deconvolution problem is actually (approxi-
mately) a special case of ICA. The components of
s
are independent, and the mixing
is linear, so we get the standard linear ICA model.