Tải bản đầy đủ (.pdf) (16 trang)

Tài liệu Independent Component Analysis - Chapter 19: Convolutive Mixtures and Blind Deconvolution pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (284.26 KB, 16 trang )

19
Convolutive Mixtures and
Blind Deconvolution
This chapter deals with blind deconvolution and blind separation of convolutive
mixtures.
Blind deconvolution is a signal processing problem that is closely related to basic
independent component analysis (ICA) and blind source separation (BSS). In com-
munications and related areas, blind deconvolution is often called blind equalization.
In blind deconvolution, we have only one observed signal (output) and one source
signal (input). The observed signal consists of an unknown source signal mixed with
itself at different time delays. The task is to estimate the source signal from the
observed signal only, without knowing the convolving system, the time delays, and
mixing coefficients.
Blind separation of convolutive mixtures considers the combined blind deconvolu-
tion and instantaneous blind source separation problem. This estimation task appears
under many different names in the literature: ICA with convolutive mixtures, mul-
tichannel blind deconvolution or identification, convolutive signal separation, and
blind identification of multiple-input-multiple-output (MIMO) systems. In blind
separation of convolutive mixtures, there are several source (input) signals and sev-
eral observed (output) signals just like in the instantaneous ICA problem. However,
the source signals have different time delays in each observed signal due to the finite
propagation speed in the medium. Each observed signal may also contain time-
delayed versions of the same source due to multipath propagation caused typically
by reverberations from some obstacles. Figure 23.3 in Chapter 23 shows an example
of multipath propagation in mobile communications.
In the following, we first consider the simpler blind deconvolution problem, and
after that separation of convolutive mixtures. Many techniques for convolutive mix-
355
Independent Component Analysis. Aapo Hyv
¨
arinen, Juha Karhunen, Erkki Oja


Copyright
 2001 John Wiley & Sons, Inc.
ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)
356
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
tures have in fact been developed by extending methods designed originally for either
the blind deconvolution or standard ICA/BSS problems. In the appendix, certain basic
concepts of discrete-time filters needed in this chapter are briefly reviewed.
19.1 BLIND DECONVOLUTION
19.1.1 Problem definition
In blind deconvolution [170, 171, 174, 315], it is assumed that the observed discrete-
time signal is generated from an unknown source signal by the convolution
model
(19.1)
Thus, delayed versions of the source signal are mixed together. This situation appears
in many practical applications, for example, in communications and geophysics.
In blind deconvolution, both the source signal and the convolution coefficients
are unknown. Observing only, we want to estimate the source signal .In
other words, we want to find a deconvolution filter
(19.2)
which provides a good estimate of the source signal at each time instant. This
is achieved by choosing the coefficients of the deconvolution filter suitably. In
practice the deconvolving finite impulse response (FIR) filter (see the Appendix
for definition) in Eq. (19.2) is assumed to be of sufficient but finite length. Other
structures are possible, but this one is the standard choice.
To estimate the deconvolving filter, certain assumptions on the source signal
must be made. Usually it is assumed that the source signal values at different
times are nongaussian, statistically independent and identically distributed (i.i.d.).
The probability distribution of the source signal may be known or unknown. The
indeterminacies remaining in the blind deconvolution problem are that the estimated

source signal may have an arbitrary scaling (and sign) and time shift compared
with the true one. This situation is similar to the permutation and sign indeterminacy
encountered in ICA; the two models are, in fact, intimately related as will be explained
in Section 19.1.4.
Of course, the preceding ideal model usually does not exactly hold in practice.
There is often additive noise present, though we have omitted noise from the model
(19.1) for simplicity. The source signal sequence may not satisfy the i.i.d condition,
and its distribution is often unknown, or we may only know that the source signal is
subgaussian or supergaussian. Hence blind deconvolution often is a difficult signal
processing task that can be solved only approximately, in practice.
BLIND DECONVOLUTION
357
If the linear time-invariant system (19.1) is minimum phase (see the Appendix),
then the blind deconvolution problem can be solved in a straightforward way. On the
above assumptions, the deconvolving filter is simply a whitening filter that temporally
whitens the observed signal sequence [171, 174]. However, in many appli-
cations, for example, in telecommunications, the system is typically nonminimum
phase [174] and this simple solution cannot be used.
We shall next discuss some popular approaches to blind deconvolution. Blind
deconvolution is frequently needed in communications applications where it is con-
venient to use complex-valued data. Therefore we present most methods for this
general case. The respective algorithms for real data are obtained as special cases.
Methods for estimating the ICA model with complex-valued data are discussed later
in Section 20.3.
19.1.2 Bussgang methods
Bussgang methods [39, 171, 174, 315] include some of the earliest algorithms [152,
392] proposed for blind deconvolution, but they are still widely used. In Bussgang
methods, a noncausal FIR filter structure
(19.3)
of length is used. Here denotes the complex conjugate. The weights of

the FIR filter depend on the time , and they are adapted using the least-mean-square
(LMS) type algorithm [171]
(19.4)
where the error signal is defined by
(19.5)
In these equations, is a positive learning parameter, is given by (19.3), and
is a suitably chosen nonlinearity. It is applied separately to the real and imaginary
components of
. The algorithm is initialized by setting ,
.
Assume that the filter length
is large enough and the learning algorithm
has converged. It can be shown that then the following condition holds for the output
of the FIR filter (19.3):
E E (19.6)
A stochastic process that satisfies the condition (19.6) is called a Bussgang process.
The nonlinearity can be chosen in several ways, leading to different Bussgang
type algorithms [39, 171]. The Godard algorithm [152] is the best performing
Bussgang algorithm in the sense that it is robust and has the smallest mean-square
358
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
error after convergence; see [171] for details. The Godard algorithm minimizes the
nonconvex cost function
E (19.7)
where
is a positive integer and is a positive real constant defined by the statistics
of the source signal:
E
E
(19.8)

The constant is chosen in such a way that the gradient of the cost function
is zero when perfect deconvolution is attained, that is, when = . The error
signal (19.5) in the gradient algorithm (19.4) for minimizing the cost function (19.7)
with respect to the weight has the form
(19.9)
In computing , the expectation in (19.7) has been omitted for getting a simpler
stochastic gradient type algorithm. The respective nonlinearity is given by
[171]
(19.10)
Among the family of Godard algorithms, the so-called constant modulus algorithm
(CMA) is widely used. It is obtained by setting in the above formulas. The
cost function (19.7) is then related to the minimization of the kurtosis. The CMA and
more generally Godard algorithms perform appropriately for subgaussian sources
only, but in communications applications the source signals are subgaussian.
1
.The
CMA algorithm is the most successful blind equalization (deconvolution) algorithm
used in communications due to its low complexity, good performance, and robustness
[315].
Properties of the CMA cost function and algorithm have been studied thoroughly
in [224]. The constant modulus property possessed by many types of communications
signals has been exploited also in developing efficient algebraic blind equalization
and source separation algorithms [441]. A good general review of Bussgang type
blind deconvolution methods is [39].
19.1.3 Cumulant-based methods
Another popular group of blind deconvolution methods consists of cumulant-based
approaches [315, 170, 174, 171]. They apply explicitly higher-order statistics of
the observed signal , while in the Bussgang methods higher-order statistics
1
The CMA algorithm can be applied to blind deconvolution of supergaussian sources by using a negative

learning parameter in (19.4); see [11]
BLIND DECONVOLUTION
359
are involved into the estimation process implicitly via the nonlinear function .
Cumulants have been defined and discussed briefly in Chapter 2.
Shalvi and Weinstein [398] have derived necessary and sufficient conditions and
a set of cumulant-based criteria for blind deconvolution. In particular, they intro-
duced a stochastic gradient algorithm for maximizing a constrained kurtosis based
criterion. We shall next describe this algorithm briefly, because it is computationally
simple, converges globally, and can be applied equally well to both subgaussian and
supergaussian source signals .
Assume that the source (input) signal is complex-valued and symmetric,
satisfying the condition E = . Assume that the length of the causal FIR
deconvolution filter is . The output of this filter at discrete time can then be
expressed compactly as the inner product
(19.11)
where the
-dimensional filter weight vector and output vector at time
are respectively defined by
(19.12)
(19.13)
Shalvi and Weinstein’s algorithm is then given by [398, 351]
sign
(19.14)
Here is the kurtosis of , is the usual Euclidean norm, and the unnormalized
filter weight vector is defined quite similarly as in (19.13).
It is important to notice that Shalvi and Weinstein’s algorithm (19.14) requires
whitening of the output signal for performing appropriately (assuming that
is white, too). For a single complex-valued signal sequence (time series) ,the
temporal whiteness condition is

E (19.15)
where the variance of is often normalized to unity: . Temporal whitening
can be achieved by spectral prewhitening in the Fourier domain, or by using time-
domain techniques such as linear prediction [351]. Linear prediction techniques have
been discussed for example in the books [169, 171, 419].
Shalvi and Weinstein have presented a somewhat more complicated algorithm
for the case E in [398]. Furthermore, they showed that there exists a
close relationship between their algorithm and the CMA algorithm discussed in the
previous subsection; see also [351]. Later, they derived fast converging but more
involved super-exponential algorithms for blind deconvolution in [399]. Shalvi and
360
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
Weinstein have reviewed their blind deconvolution methods in [170]. Closely related
algorithms were proposed earlier in [114, 457].
It is interesting to note that Shalvi and Weinstein’s algorithm (19.14) can be
derived by maximizing the absolute value of the kurtosis of the filtered (deconvolved)
signal under the constraint that the output signal is temporally white [398,
351]. The temporal whiteness condition leads to the normalization constraint of
the weight vector in (19.14). The corresponding criterion for standard ICA is
familiar already from Chapter 8, where gradient algorithms similar to (19.14) have
been discussed. Also Shalvi and Weinstein’s super-exponential algorithm [399] is
very similar to the cumulant-based FastICA as introduced in Section 8.2.3. The
connection between blind deconvolution and ICA is discussed in more detail in the
next subsection.
Instead of cumulants, one can resort to higher-order spectra or polyspectra [319,
318]. They are defined as Fourier transforms of the cumulants quite similarly as
the power spectrum is defined as a Fourier transform of the autocorrelation function
(see Section 2.8.5). Polyspectra provide a basis for blind deconvolution and more
generally identification of nonminimum-phase systems, because they preserve phase
information of the observed signal. However, blind deconvolution methods based

on higher-order spectra tend to be computationally more complex than Bussgang
methods, and converge slowly [171]. Therefore, we shall not discuss them here. The
interested reader can find more information on those methods in [170, 171, 315].
19.1.4 Blind deconvolution using linear ICA
In defining the blind deconvolution problem, the values of the original signal
were assumed to be independent for different and nongaussian. Therefore, the blind
deconvolution problem is formally closely related to the standard ICA problem. In
fact, one can define a vector
(19.16)
by collecting last values of the source signal, and similarly define
(19.17)
Then and are -dimensional vectors, and the convolution (19.1) can be expressed
for a finite number of values of the summation index as
(19.18)
where is a matrix that contains the coefficients of the convolution filter as its
rows, at different positions for each row. This is the classic matrix representation
of a filter. This representation is not exact near the top and bottom rows, but for a
sufficiently large , it is good enough in practice.
From (19.18) we see that the blind deconvolution problem is actually (approxi-
mately) a special case of ICA. The components of are independent, and the mixing
is linear, so we get the standard linear ICA model.
BLIND SEPARATION OF CONVOLUTIVE MIXTURES
361
In fact, the one-unit (deflationary) ICA algorithms in Chapter 8 can be directly
used to perform blind deconvolution. As defined above, the inputs should then
consist of sample sequences of the signal to be
deconvolved. Estimating just one “independent component”, we obtain the original
deconvolved signal . If several components are estimated, they correspond
to translated versions of the original signal, so it is enough to estimate just one
component.

19.2 BLIND SEPARATION OF CONVOLUTIVE MIXTURES
19.2.1 The convolutive BSS problem
In several practical applications of ICA, some kind of convolution takes place simul-
taneously with the linear mixing. For example, in the classic cocktail-party problem,
or separation of speech signals recorded by a set of microphones, the speech signals
do not arrive in the microphones at the same time. This is because the sound travels in
the atmosphere with a very limited speed. Moreover, the microphones usually record
echos of the speakers’ voices caused by reverberations from the walls of the room
or other obstacles. These two phenomena can be modeled in terms of convolutive
mixtures. Here we have not considered noise and other complications that often
appear in practice; see Section 24.2 and [429, 430].
Blind source separation of convolutive mixtures is basically a combination of
standard instantaneous linear blind source separation and blind deconvolution. In the
convolutive mixture model, each element of the mixing matrix in the model
= is a filter instead of a scalar. Written out for each mixture, the data model
for convolutive mixtures is given by
for (19.19)
This is a FIR filter model, where each FIR filter (for fixed indices and ) is defined by
the coefficients . Usually these coefficients are assumed to be time-independent
constants, and the number of terms over which the convolution index runs is finite.
Again, we observe only the mixtures , and both the independent source signals
and all the coefficients must be estimated.
To invert the convolutive mixtures (19.19), a set of similar FIR filters is typically
used:
for (19.20)
The output signals of the separating system are estimates of the
source signals at discrete time .The give the coefficients of
the FIR filters of the separating system. The FIR filters used in separation can be
362
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION

either causal or noncausal depending on the method. The number of coefficient in
each separating filter must sometimes be very large (hundreds or even thousands)
for achieving sufficient inversion accuracy. Instead of the feedforward FIR structure,
feedback (IIR type) filters have sometimes been used for separating convolutive
mixtures, an example is presented in Section 23.4. See [430] for a discussion of
mutual advantages and drawbacks of these filter structures in convolutive BSS.
At this point, it is useful to discuss relationships between the convolutive BSS
problem and the standard ICA problem on a general level [430]. Recall first than in
standard linear ICA and BSS, the indeterminacies are the scaling and the order of the
estimated independent components or sources (and their sign, which can be included
in scaling). With convolutive mixtures the indeterminacies are more severe: the order
of the estimated sources is still arbitrary, but scaling is replaced by (arbitrary)
filtering. In practice, many of the methods proposed for convolutive mixtures filter
the estimated sources
so that they are temporally uncorrelated (white). This
follows from the strong independence condition that most of the blind separation
methods introduced for convolutive mixtures try to realize as well as possible. The
temporal whitening effect causes some inevitable distortion if the original source
signals themselves are not temporally white. Sometimes it is possible to get rid of
this by using a feedback filter structure; see [430].
Denote by
(19.21)
the vector of estimated source signals. They are both temporally and spatially white
if
E (19.22)
where denotes complex conjugate transpose (Hermitian operator). The standard
spatial whitening condition E = is obtained as a special case when
. The condition (19.22) is required to hold for all the lag values for which the
separating filters (19.20) are defined. Douglas and Cichocki have introduced a simple
adaptive algorithm for whitening convolutive mixtures in [120]. Lambert and Nikias

have given an efficient temporal whitening method based on FIR matrix algebra and
Fourier transforms in [257].
Standard ICA makes use of spatial statistics of the mixtures to learn a spatial blind
separation system. In general, higher-order spatial statistics are needed for achieving
this goal. However, if the source signals are temporally correlated, second-order
spatiotemporal statistics are sufficient for blind separation under some conditions,
as shown in [424] and discussed in Chapter 18. In contrast, blind separation of
convolutive mixtures must utilize spatiotemporal statistics of the mixtures to learn a
spatiotemporal separation system.
Stationarity of the sources has a decisive role in separating convolutive mixtures,
too. If the sources have nonstationary variances, second-order spatiotemporal statis-
tics are enough as briefly discussed in [359, 456].
BLIND SEPARATION OF CONVOLUTIVE MIXTURES
363
For convolutive mixtures, stationary sources require higher than second-order
statistics, just as basic ICA, but the following simplification is possible [430]. Spa-
tiotemporal second-order statistics can be used to decorrelate the mixtures. This step
returns the problem to that of conventional ICA, which again requires higher-order
spatial statistics. Examples of such approaches are can be found in [78, 108, 156].
This simplification is not very widely used, however.
Alternatively, one can resort to higher-order spatiotemporal statistics from the
beginning for sources that cannot be assumed nonstationary. This approach has been
adopted in many papers, and it will be discussed briefly later in this chapter.
19.2.2 Reformulation as ordinary ICA
The simplest approach to blind separation of convolutive mixtures is to reformulate
the problem using the standard linear ICA model. This is possible because blind
deconvolution can be formulated as a special case of ICA, as we saw in (19.18).
Define now a vector by concatenating time-delayed versions of every source
signal:
(19.23)

and define similarly a vector
(19.24)
Using these definitions, the convolutive mixing model (19.19) can be written
(19.25)
where is a matrix containing the coefficients of the FIR filters in a suitable
order. Now one can estimate the convolutive BSS model by applying ordinary ICA
methods to the standard linear ICA model (19.25).
Deflationary estimation is treated in [108, 401, 432]. These methods are based on
finding maxima of the absolute value of kurtosis, thus generalizing the kurtosis-based
methods of Chapter 8. Other examples of approaches in which the convolutive BSS
problem has been solved using conventional ICA can be found in [156, 292].
A problem with the formulation (19.25) is that when the original data vector
is expanded to , its dimension grows very much. The number of time delays
that needs to be taken into account depends on the application, but it is often tens
or hundreds, and the dimension of model (19.25) grows with the same factor, to
. This may lead to prohibitively high dimensions. Therefore, depending on the
application and the dimensions and , this reformulation can solve the convolutive
BSS problem satisfactorily, or not.
In blind deconvolution, this is not such a big problem because we have just one
signal to begin with, and we only need to estimate one independent component, which
364
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
is easier than estimating all of them. In convolutive BSS, however, we often need
to estimate all the independent components, and their number is in the model
(19.25). Thus the computations may be very burdensome, and the number of data
points needed to estimate such a large number of parameters can be prohibitive in
practical applications. This is especially true if we want to estimate the separating
system adaptively, trying to track changes in the mixing system. Estimation should
then be fast both in terms of computations and data collection time.
Regrettably, these remarks hold largely for other approaches proposed for blind

separation of convolutive mixtures, too. A fundamental reason of the computational
difficulties encountered with convolutive mixtures is the fact that the number of the
unknown parameters in the model (19.19) is so large. If the filters have length ,itis
-fold compared with the respective instantaneous ICA model. This basic problem
cannot be avoided in any way.
19.2.3 Natural gradient methods
In Chapter 9, the well-known Bell-Sejnowski and natural gradient algorithms were
derived from the maximum likelihood principle. This principle was shown to be quite
closely related to the maximization of the output entropy, which is often called the
information maximization (infomax) principle; see Chapter 9. These ICA estimation
criteria and algorithms can be extended to convolutive mixtures in a straightforward
way. Early results and derivations of algorithms can be found in [13, 79, 121, 268,
363, 426, 427]. An application to CDMA communication signals will be described
later in Chapter 23.
Amari, Cichocki, and Douglas presented an elegant and systematic approach for
deriving natural gradient type algorithms for blind separation of convolutive mixtures
and related tasks. It is based on algebraic equivalences and their nice properties. Their
work has been summarized in [11], where rather general natural gradient learning
rules have been given for complex-valued data both in the time domain and -
transform domain. The derived natural gradient rules can be implemented in either
batch, on-line, or block on-line forms [11]. In the batch form, one can use a noncausal
FIR filter structure, while the on-line algorithms require the filters to be causal.
In the following, we represent an efficient natural gradient type algorithm [10, 13]
described also in [430] for blind separation of convolutive mixtures. It can be
implemented on-line using a feedforward (FIR) filter structure in the time domain.
The algorithm is given for complex-valued data.
The separating filters are represented as a sequence of coefficient matrices
at discrete time and lag (delay) . The separated output with this notation and
causal FIR filters is
(19.26)

Here is -dimensional data vector containing the values of the mixtures
(19.19) at the time instant ,and is the output vector whose components are
BLIND SEPARATION OF CONVOLUTIVE MIXTURES
365
estimates of the source signals . Hence has components,
with .
This matrix notation allows the derivation of a separation algorithm using the
natural gradient approach. The resulting weight matrix update algorithm, which
takes into account the causal approximation of a doubly infinite filter by delaying the
output by samples, is as follows [13, 430]:
(19.27)
Quite similarly as in Chapter 9, each component of the vector applies the nonlinear-
ity to the respective component of the argument vector. The optimal nonlinearity
is the negative score function of the distribution of the source
. In (19.27), is reverse-filtered output computed using the latest samples
backwards from the current sample:
(19.28)
The vector needs to be stored for the latest samples to compute the update
of the weight matrix for all lags . The algorithm has
rather modest computational and memory requirements.
Note that if , the formulas (19.27) and (19.28) reduce to the standard natural
gradient algorithm. In [13], the authors present a speech separation experiment where
about 50 seconds of mixed data were needed to achieve about 10-15 dB enhancement
in the quality of separated signals.
19.2.4 Fourier transform methods
Fourier transform techniques are useful in dealing with convolutive mixtures, because
convolutions become products between Fourier transforms in the frequency domain.
It was shown in Chapter 13 that filtering the data is allowed before performing
ICA, since filtering does not change the mixing matrix. Using the same proof, one
can see that applying Fourier transform to the data does not change the mixing matrix

either. Thus we can apply Fourier transform to both sides of Eq. (19.19). Denoting
by , ,and the Fourier transforms of , ,and ,
respectively, we obtain
for (19.29)
This shows that the convolutive mixturemodel (
19.19) is transformed
into an instan-
taneous linear ICA model in the frequency domain. The price that we have to pay
for this is that the mixing matrix is now a function of the angular frequency while
in the standard ICA/BSS problem it is constant.
366
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
To utilize standard ICA in practice in the Fourier domain, one can take short-time
Fourier transforms of the data, instead of the global transform. This means that
the data is windowed, usually by a smooth windowing function such as a gaussian
envelope, and the Fourier transform is applied separately to each data window. The
dependency of on can be simplified by dividing the values of into a
certain number of frequency bins (intervals). For every frequency bin, we have then a
number of observations of , and we can estimate the ICA model separately for
each frequency bin. Note that the ICs and the mixing matrix are now complex-valued.
See Section 20.3 on how to estimate the ICA model with complex-valued data.
The problem with this Fourier approach is the indeterminacy of permutation and
sign that is ubiquitous in ICA. The permutation and signs of the sources are usually
different in each frequency interval. For reconstructing a source signal in the
time domain, we need all its frequency components. Hence we a need some method
for choosing which source signals in different frequency intervals belong together.
To this end, various continuity criteria have been introduced by many authors; see
[15, 59, 216, 356, 397, 405, 406, 430].
Another major group of Fourier methods developed for convolutive mixtures
avoids the preceding problem by performing the actual separation in the time domain.

Only selected parts of the separation procedure are carried out in the frequency
domain. Separating filters may be easier to learn in the frequency domain because
components are now orthogonal and do not depend on each other like the time domain
coefficients [21, 430]. Examples of methods that apply their separation criterion in
the time domain but do the rest in the frequency domain are reported in [21, 257].
A frequency domain representation of the filters is learned, and they are also applied
in the frequency domain. The final time-domain result is reconstructed using for
example the overlap-save technique of digital signal processing (see [339]). Thus,
the permutation and scaling problem does not exist.
The work by Lambert and Nikias deserves special attention, see the review in
[257]. They have introduced methods that utilize the Bussgang family of cost func-
tions and standard adaptive filtering algorithms in blind separation of convolutive
mixtures. FIR matrix algebra introduced in [256] is employed as an efficient tool for
systematic developmentof methods. Lambert and Nikias [257] have considered three
general classes of Bussgang type cost functions, namely blind least mean-squares
(LMS), Infomax, and direct Bussgang costs. Most of these costs can be implemented
in either the time or frequency domain, or in the batch or continuously adaptive
modes. Lambert and Nikias have introduced several efficient and practical algo-
rithms for blind separation of convolutive mixtures having different computational
complexities and convergence speeds. For example, block-oriented frequency do-
main implementations can be used to perform robust blind separation on convolutive
mixtures which have hundreds or thousands of time delays [257].
BLIND SEPARATION OF CONVOLUTIVE MIXTURES
367
19.2.5 Spatiotemporal decorrelation methods
Consider first the noisy instantaneous linear ICA model
(19.30)
which has been discussed in more detail in Chapter 15. Making the standard realistic
assumption that the additive noise
is independent of the source signals ,the

spatial covariance matrix of at time is
(19.31)
where and are respectively the covariance matrices of the sources
and the noise at time . If the sources are nonstationary with respect to their
covariances, then in general for . This allows to write
multiple conditions for different choices of to solve for , ,and .
Note that the covariances matrices and are diagonal. The diagonality of
follows from the independence of the sources, and can be taken diagonal
because the components of the noise vector are assumed to be uncorrelated.
We can also look at cross-covariance matrices =E
over time. This approach has been mentioned in the context of convolutive mixtures
in [456], and it can be used with instantaneous mixtures as described in Chapter 18.
For convolutive mixtures, we can write in frequency domain for sample averages
[359, 356]
(19.32)
where is the averaged spatial covariance matrix. If is nonstationary, one can
again write multiple linearly independent equations for different time lags and solve
for unknowns or find LMS estimates of them by diagonalizing a number of matrices
in the frequency domain [123, 359, 356].
If the mixing system is minimum phase, decorrelation alone can provide a unique
solution, and the nonstationarity of the signals is not needed [55, 280, 402]. Many
methods have been proposed for this case, for example, in [113, 120, 149, 281,
280, 296, 389, 390, 456]. More references are given in [430]. However, such
decorrelating methods cannot necessarily be applied to practical communications
and audio separation problems, because the mixtures encountered there are often not
minimum-phase. For example in the cocktail-party problem the system is minimum
phase if each speaker is closest to his or her “own” microphone, otherwise not [430].
19.2.6 Other methods for convolutive mixtures
Many methods proposed for blind separation of convolutive mixtures are extensions
of earlier methods originally designed for either the standard linear instantaneous BSS

(ICA) problem or for the blind deconvolution problem. We have already discussed
some extensions of the natural gradient method in Section 19.2.3 and Bussgang
methods in Section 19.2.4. Bussgang methods have been generalized for convolutive
368
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
mixtures also in [351]. Matsuoka’s method [296] for BSS of nonstationary sources is
modified for convolutive mixtures in [239] using natural gradient learning. Nguyen
Thi and Jutten [420] have generalized the seminal H
´
erault-Jutten algorithm described
in Chapter 12 to BSS of convolutive mixtures. Their approach has also been studied
in [74, 101]. A state-space approach for blind separation of convolutive mixtures has
been studied in [479].
There exist numerous approaches to convolutive BSS which are based on crite-
ria utilizing directly spatiotemporal higher-order statistics. Methods based on the
maximization of the sum of the squares of the kurtoses to estimate the whole sep-
arating system were introduced in [90], and further developed in [307]. Other
methods based on spatiotemporal higher-order statistics have been presented in
[1, 124, 145, 155, 218, 217, 400, 416, 422, 434, 433, 470, 471, 474]. More ref-
erences can be found in [91, 430].
19.3 CONCLUDING REMARKS
Historically, many ideas used in ICA were originally developed in the context of
blind deconvolution, which is an older topic of research than ICA. Later, it was
found that many methods developed for blind deconvolution can be directly applied
for ICA, and vice versa. Blind deconvolution can thus be considered an intellectual
ancestor of ICA. For example, Donoho proposed in [114] that the deconvolution filter
(19.2) could be found by finding the filter whose output is maximally nongaussian.
This is the same principle as used for ICA in Chapter 8. Douglas and Haykin have
explored relationships between blind deconvolution and blind source separation in
[122]. Elsewhere, it has been pointed out that Bussgang criteria are closely related

to nonlinear PCA criteria [236] and several other ICA methods [11].
In this chapter, we have briefly discussed Bussgang, cumulant, and ICA based
methods for blind deconvolution. Still one prominent class of blind deconvolution
and separation methods for convolutive mixtures consists of subspace approaches
[143, 171, 311, 315, 425]. They can be used only if the number of output signals
(observed mixtures) strictly exceeds the number of sources. Subspace methods
resort to second-order statistics and fractional sampling, and they are applicable to
cyclostationary source signals which are commonplace in communications [91].
General references on blind deconvolution are [170, 171, 174, 315]. Blind decon-
volution and separation methods for convolutive mixtures have often been developed
in context with blind channel estimation and identification problems in communica-
tions. These topics are beyond the scope of our book, but the interested reader can
find useful review chapters on blind methods in communications in [143, 144].
In the second half of this chapter, we have considered separation of convolutive
mixtures. The mixing process then takes place both temporally and spatially, which
complicates the blind separation problem considerably. Numerous methods for
handling this problem have been proposed, but it is somewhat difficult to assess
their usefulness, because comparison studies are still lacking. The large number of
parameters is a problem, making it difficult to apply convolutive BSS methods to large
APPENDIX
369
scale problems. Other practical problems in audio and communications applications
have been discussed in Torkkola’s tutorial review [430]. More information can be
found in the given references and recent reviews [257, 425, 429, 430] on convolutive
BSS.
Appendix Discrete-time filters and the -transform
In this appendix, we briefly discuss certain basic concepts and results of discrete-time signal
processing which are needed in this chapter.
Linear causal discrete-time filters [169, 339] can generally be described by the difference
equation

(A.1)
which is mathematically equivalent to the ARMA model (2.127) in Section 2.8.6. In (A.1),
is discrete time, is the input signal of the filter, and its output at time instant
. Causality means that in (A.1) there are no quantities that depend on future time instants
, making it possible to compute the filter output in real time. The constant
coefficients
, define the FIR (Finite Impulse Response) part of the filter (A.1),
having the order
. Respectively, the coefficients , define the IIR (Infinite
Impulse Response) part of the filter (A.1) with the order
.
If
, (A.1) defines a pure FIR filter, and if , a pure IIR filter results. Either
of these filter structures is typically used in separating convolutive mixtures. The FIR filter
is more popular, because it is always stable, which means that its output is bounded for
bounded input values
and coefficients . On the other hand, IIR filter can be unstable
because of its feedback (recurrent) structure.
The stability and other properties of the discrete-time filter (A.1) can be analyzed conve-
niently in terms of the z-transform [169, 339]. For a discrete-time real sequence
,the
-transform is defined as the series
(A.2)
where
is a complex variable with real and imaginary part. For specifying the -transform of
a sequence uniquely, one must also know its region of convergence.
The
-transform has several useful properties that follow from its definition. Of particular
importance in dealing with convolutive mixtures is the property that the
-transform of the

convolution sum
(A.3)
is the product of the z-transforms of the sequences and :
(A.4)
370
CONVOLUTIVE MI XTURES AND BLIND DECONVOLUTION
The weights in (A.3) are called impulse response values, and the quantity =
is called transfer function. The transfer function of the convolution sum (A.3) is
the
-transform of its impulse response sequence.
The Fourier transform of a sequence is obtained from its
-transform as a special case by
constraining the variable
to lie on the unit circle in the complex plane. This can be done by
setting
(A.5)
where
is the imaginary unit and the angular frequency. The Fourier transform has similar
convolution and other properties as the
-transform [339].
Applying the
-transform to both sides of Eq. (A.1) yields
(A.6)
where
(A.7)
is the -transform of the coefficients where the coefficient
corresponds to ,and is the -transform of the output sequence .
and are defined quite similarly as -transform of the coefficients ,
and the respective input signal sequence
.

From (A.6), we get for the transfer function of the linear filter (A.1)
(A.8)
Note that for a pure FIR filter, , and for pure IIR filter . The zeros of
denominator polynomial
are called the poles of the transfer function (A.8), and the zeros
of numerator
are called the zeros of (A.8). It can be shown (see for example [339]) that
the linear causal discrete-time filter (A.1) is stable if all the poles of the transfer function lie
inside the unit circle in the complex plane. This is also the stability condition for a pure IIR
filter.
From (A.8),
= ,wheretheinverse filter has the transfer function
= . Hence, the inverse filter of a pure FIR filter is a pure IIR filter and vice
versa. Clearly, the general stability condition for the inverse filter
is that the zeros of
(and hence the zeros of the filter ) in (A.8) are inside the unit circle in the complex
plane. This is also the stability condition for the inverse of a pure FIR filter.
Generally, it is desirable that both the poles and the zeros of the transfer function (A.8)
lie inside the unit circle. Then both the filter and its inverse filter exist and are stable. Such
filters are called minimum phase filters. The minimum phase property is a necessity in many
methods developed for convolutive mixtures. It should be noted that a filter that has no stable
causal inverse may have a stable noncausal inverse, realized by a nonminimum-phase filter.
These matters are discussed much more thoroughly in many textbooks of digital signal
processing and related areas; see for example [339, 302, 169, 171].

×