W. Kenneth Jenkins, et. Al. “Transform Domain Adaptive Filtering.”
2000 CRC Press LLC. <>.
TransformDomainAdaptive
Filtering
W.KennethJenkins
UniversityofIllinois,
Urbana-Champaign
DanielF.Marshall
MITLincolnLaboratory
22.1LMSAdaptiveFilterTheory
22.2OrthogonalizationandPowerNormalization
22.3ConvergenceoftheTransformDomainAdaptiveFilter
22.4DiscussionandExamples
22.5Quasi-NewtonAdaptiveAlgorithms
AFastQuasi-NewtonAlgorithm
•
Examples
22.6The2-DTransformDomainAdaptiveFilter
22.7Block-BasedAdaptiveFilters
ComparisonoftheConstrainedandUnconstrainedFre-
quencyDomainBlock-LMSAdaptiveAlgorithms
•
Examples
andDiscussion
References
Oneoftheearliestworksontransformdomainadaptivefilteringwaspublishedin1978byDentino
etal.[4],inwhichtheconceptofadaptivefilteringinthefrequencydomainwasproposed.Many
publicationshavesinceappearedthatfurtherdevelopthetheoryandexpandthecurrentunder-
standingofperformancecharacteristicsforthisclassofadaptivefilters.Inadditiontothediscrete
Fouriertransform(DFT),otherorthogonaltransformssuchasthediscretecosinetransform(DCT)
andtheWalshHadamardtransform(WHT)canalsobeusedeffectivelyasameanstoimprovethe
LMSalgorithmwithoutaddingtoomuchcomputationalcomplexity.Forthisreason,thegeneral
termtransformdomainadaptivefilteringisusedinthefollowingdiscussiontomeanthattheinput
signalispreprocessedbydecomposingtheinputvectorintoorthogonalcomponents,whicharein
turnusedasinputstoaparallelbankofsimpleradaptivesubfilters.Withanorthogonaltransforma-
tion,theadaptationtakesplaceinthetransformdomain,asitispossibletoshowthattheadjustable
parametersareindeedrelatedtoanequivalentsetoftimedomainfiltercoefficientsbymeansofthe
sametransformationthatisusedfortherealtimeprocessing[5,14,17].
AdirectformFIRdigitalfilterstructureisshowninFig.22.1.ThedirectformrequiresN−1
delays,Nmultiplications,andN−1additionsforeachoutputsamplethatisproduced.Theamount
ofhardware(aswellaspower)requiredtoimplementthedirectformstructuredependsonthedegree
ofhardwaremultiplexingthatcanbeutilizedwithinthespeeddemandsoftheapplication.Afully
parallelimplementationconsistingofNdelayregisters,Nmultipliers,andatreeoftwo-inputadders
wouldbeneededforveryhigh-frequencyapplications.Attheoppositeendoftheperformancespec-
trum,asequentialimplementationconsistingofalengthNdelaylineandasingletimemultiplexed
multiplierandaccumulationadderwouldprovidethecheapest(andslowest)implementation.This
c
1999byCRCPressLLC
FIGURE 22.1: The direct form adaptive filter structure.
latter structure would be characteristic of a filter that is implemented in software on one of the many
commercially available DSP chips.
Regardless of the hardware complexity that results from a particular implementation, the com-
putational complexity of the filter is determined by the requirements of the algorithm and, as such,
remains invariant with respect to different hardware structures. In particular, the computational
complexity of the direct form FIR filter is O[N], since N multiplications and (N −1) additions must
be performed at each iteration. When designing an adaptive filter, it seems reasonable to seek an
adaptive algorithm whose order of complexity is no greater than the order of complexity of the basic
filter structure itself. This goal is achieved by the LMS algorithm, which is the major contributing
factor to the enormous success of that algorithm. Extending this principle for 2-D adaptive filters
implies that desirable 2-D adaptive algorithms have an order of complexity of O[N
2
], since a 2-D
FIR direct form filter has O[N
2
] complexity inherent in its basic structure [11, 21].
The transform domain adaptive filteris a generalization ofthe LMS FIR structure, in which a linear
transformation is performed on the input signal and each transformed “chanel” is power normalized
to improve the convergence rate of the adaptation process. The linear transform is characterized
throughout the following discussions as a sliding window operator that consists of a transformation
matrix multiplying an input vector [14]. At each iteration n the input vector includes one new input
sample x(n), and N − 1 past input samples x(n − k), k = 1,...,N− 1. As the window slides
forward sample by sample, filtered outputs are produced continuously at each value of the index n.
Since the input transformation is represented by a matrix-vector product, it might appear that
the computational complexity of the transform domain filter is at least O[N
2
]. However, many
transformations can be implemented with fast algorithms that have complexities less than O[N
2
].
For example, the discrete Fourier transform can be implemented by the FFT algorithm, resulting in
a complexity of O[N log
2
N] per iteration. Some transformations can be implemented recursively
in a bank of parallel filters, resulting in a net complexity of O[N] per iteration. The main point to
be made here is that the complexity of the transform domain filter typically falls between O[N] and
O[N
2
], with the actual complexity depending on the specific algorithm that is used to compute the
sliding window transform operator [17].
22.1 LMS Adaptive Filter Theory
The LMS algorithm is derived as an approximation to the steepest descent optimization strategy. The
fact that the field of adaptive signal processing is based on an elementary principle from optimization
theory suggests that more advanced adaptive algorithms can be developed by incorporating other
c
1999 by CRC Press LLC
results from the field of optimization [22]. This point of view recurs throughout this discussion, as
concepts are borrowed from the field of optimization and modified for adaptive filtering as needed.
In particular, one of the borrowed ideas that appears later is the quasi-Newton optimization strategy.
It will be shown that transform domain adaptive filtering algorithms are closely related to quasi-
Newton algorithms, but have computational complexity that is closer to the simple requirements of
the LMS algorithm.
For a length N FIR filter with the input expressed as a column vector x(n) =[x(n), x(n −
1),...,x(n− N + 1)]
T
, the filter output y(n) is easily expressed as
y(n) = w
T
(n)x(n) ,
(22.1)
where w(n) =[w
0
(n), w
1
(n),...,w
N−1
(n)]
T
is the time varying vector of filter coefficients (tap
weights), and the superscript “T” denotes vector transpose. The output error is formed as the
difference between the filter output and a training signal d(n), i.e., e(n) = d(n)− y(n). Strategies for
obtaining an appropriate d(n) vary from one application to another. In many cases the availability
of a suitable training signal determines whether an adaptive filtering solution will be successful
in a particular application. The ideal cost function is defined by the mean squared error (MSE)
criterion, E[|e(n)|
2
]. The LMS algorithm is derived by approximating the ideal cost function by the
instantaneous squared error, resulting in J
LMS
(n) =|e(n)|
2
. While the LMS seems to make a rather
crude approximation at the very beginning, the approximation results in an unbiased estimator.
In many applications the LMS algorithm is quite robust and is able to converge rapidly to a small
neighborhood of the optimum Wiener solution.
The steepest descent optimization strategy is given by
w(n + 1) = w(n) − µ∇
E[|e|
2
]
(n)
,
(22.2)
where∇
E[|e|
2
]
(n)
is the gradient of the cost function with respect to the coefficient vector w(n). When
the gradient is formed using the LMS cost function J
LMS
(n) =|e(n)|
2
, the conventionalLMS results:
w(n + 1) = w(n) + µe(n)x(n) ,
e(n) = d(n) − y(n) ,
(22.3)
and
y(n) = x(n)
T
w(n) .
(Note: Many sources include a “2” before the µ factor in Eq. (22.3) because this factor arises during
the derivation of (22.3)from(22.2). In this discussion we assume this factor is absorbed into the µ,
so it will not appear explicitly.) Since the LMS algorithm is treated in considerable detail in other
sections of this book, we will not present any further derivation or analysis of it here. However, the
following observations will be useful when other algorithms are compared to the LMS as a baseline
design [2, 3, 6, 8].
1. Assume that all of the signals and filter variables are real-valued. The filter itself requires
N multiplications and N − 1 additions to produce y(n) at each value of n . The coeffi-
cient update algorithm requires 2N multiplications and N additions, resulting in a total
computational burden of 3N multiplications and 2N − 1 additions per iteration. Since
N is generally much larger than the factor of three, the order of complexity of the LMS
algorithm is O[N].
2. The cost function given for the LMS algorithm is a simplified form of the one used for
the RLS algorithm. This implies that the LMS algorithm is a simplified version of the RLS
algorithm, where averages are replaced by single instantaneous terms.
c
1999 by CRC Press LLC
3. The (power normalized) LMS algorithm is also a simplified form of the transform domain
adaptive filter which results by setting the transform matrix equal to the identity matrix.
4. The LMS algorithm is also a simplified form of the Gauss-Newton optimization strategy
whichintroduces secondorder statistics (the input autocorrelationfunction) toaccelerate
the rate of convergence. In order to obtain the LMS algorithm from the Gauss-Newton
algorithm, two approximations must be made: (i) The gradient must be approximated by
the instantaneous error squared, and (ii) the inverse of the input autocorrelation matrix
must be crudely approximated by the identity matrix.
These observations suggest that many of the seemingly distinct adaptive filtering algorithms that
appear scattered about in the literature are indeed closely related, and can be considered to be mem-
bers of a family whose hereditary characteristics have their origins in Gauss-Newton optimization
theory [15, 16]. The different members of this family inherit their individual characteristics from
approximations that are made on the pure Gauss-Newton algorithm at various stages of their deriva-
tions. However, after the individual derivations are complete and each algorithm is packaged in
its own algorithmic form, the algorithms look considerably different from one another. Unless a
conscious effort is made to reveal their commonality, the fact that they have evolved from common
roots may be entirely obscured.
The convergence behavior of the LMS algorithm, as applied to a direct form FIR filter structure, is
controlled by the autocorrelation matrix R
x
of the input process, where
R
x
≡ E[x
∗
(n)x
T
(n)] .
(22.4)
(The
∗
in Eq. (22.4) denotes complex conjugate to account for the general case of complex input
signals, although throughout most of the following discussions it will be assumed that x(n) and d(n)
are both real-valued signals.) The autocorrelation matrix R
x
is usually positive definite, which is
one of the conditions necessary to guarantee convergence to the Wiener solution. Another necessary
condition for convergence is 0 <µ<1/λ
max
,whereλ
max
is the largest eigenvalue of R
x
. It is also
well established that the convergence of this algorithm is directly related to the eigenvalue spread of
R
x
. The eigenvalue spread is measured by the condition number of R
x
, defined as κ = λ
max
/λ
min
,
where λ
min
is the minimum eigenvalue of R
x
. Ideal conditioning occurs when κ = 1 (white noise); as
this ratio increases, slower convergence results. The eigenvalue spread (condition number) depends
on the spectral distribution of the input signal and can be shown to be related to the maximum and
minimumvalues of the input powerspectrum (22.4). Fromthis line of reasoning itbecomes clear that
white noise is the ideal input signal for rapidly training an LMS adaptive filter. The adaptive process
becomes slower and requires more computation for input signals that are more severely colored [6].
Convergence properties are reflected in the geometry of the MSE surface, which is simply the
mean squared output error E[|e(n)|
2
] expressed as a function of the N adaptive filter coefficients in
(N + 1)-space. An expression for the error surface of the direct form filter is
J(z) ≡ E
|e(n)|
2
= J
min
+ z
∗T
R
x
z ,
(22.5)
with R
x
definedin(22.4) and z ≡ w − w
opt
,wherew
opt
is the vector of optimum filter coefficients in
the sense of minimizing the mean squared error ( w
opt
is known as the Wiener solution). An example
of an error surface for a simple two-tap filter is shown in Fig. 22.2. In this example x(n) was specified
to be a colored noise input signal with an autocorrelation matrix
R
x
=
1.00.9
0.91.0
.
Figure 22.2 shows three equal-error contours on the three dimensional surface. The term z
∗T
R
x
z
in Eq. (22.2) is a quadratic form that describes the bowl shape of the FIR error surface. When R
x
is
c
1999 by CRC Press LLC
positive definite, the equal-error contours of the surface are hyperellipses (N dimensional ellipses)
centered at the origin of the coefficient parameter space. Furthermore, the principle axes of these
hyperellipses are the eigenvectors of R
x
, and their lengths are proportional to the eigenvalues of R
x
.
Sincethe convergence rate of the LMS algorithm is inversely relatedto the ratio of the maximum to the
minimum eigenvalues of R
x
, large eccentricity of the equal-error contours implies slow convergence
of the adaptive system. In the case of an ideal white noise input, R
x
has a single eigenvalue of
multiplicity N, so that the equal-error contours are hyperspheres [8].
FIGURE 22.2: Example of an error surface for a simple two-tap filter.
22.2 Orthogonalization and Power Normalization
The transform domain adaptive filter (TDAF) structure is shown in Fig. 22.3. The input x(n) and
desired signal d(n) are assumed to be zero mean and jointly stationary. The input to the filter is a
vector of N current and past input samples, defined in the previous section and denoted as x(n).
This vector is processed by a unitary transform, such as the DFT. Once the filter order N is fixed, the
transform is simply an N × N matrix T, which is in general complex, with orthonormal rows. The
transformed outputs form a vector v(n) whichisgivenby
v(n) =
v
0
(n), v
1
(n),...,v
N−1
(n)
T
= Tx(n) .
(22.6)
With an adaptive tap vector defined as
W(n) =
W
0
(n), W
1
(n),...,W
N−1
(n)
T
,
(22.7)
the filter output is given by
y(n) = W
T
(n)v(n) = W
T
(n)Tx(n) .
(22.8)
The instantaneous output error
e(n) = d(n) − y(n)
(22.9)
c
1999 by CRC Press LLC
FIGURE 22.3: The transform domain adaptive filter structure
is then formed and used to update the adaptive filter taps using a modified form of the LMS algo-
rithm (22.11):
W(n + 1) = W(n) + µe(n)
−2
v
∗
(n)
2
≡ diag
σ
2
0
,σ
2
1
,...,σ
2
N−1
(22.10)
where
σ
2
i
= E
|
v
i
(n)
|
2
.
As before, the superscript asterisk in (22.10) indicates complex conjugation to account for the most
general case in which the transform is complex. Also, the use of the upper case coefficient vector
in Eq. (22.10) denotes that W(n) is a transform domain variable. The power estimates σ
2
i
can be
developed on-line by computing an exponentially weighted average of past samples according to
σ
2
i
(n) = ασ
2
i
(n − 1) +
|
v
i
(n)
|
2
, 0 <α<1 .
(22.11)
If σ
2
i
becomes too small due to an insufficient amount of energy in the i-th channel, the update
mechanism becomes ill-conditioned due to a very large effective step size. In some cases the process
will become unstable and register overflow will cause the adaptation to catastrophically fail. So
the algorithm given by (22.10) should have the update mechanism disabled for the i-th orthogonal
channel if σ
2
i
falls below a critical threshold.
Alternatively the transform domain algorithm may be stabilized by adding small positive constants
ε to the diagonal elements of
2
according to
2
=
2
+ εI .
(22.12)
Then
2
is used in place of
2
in Eq. (22.10). For most input signals σ
2
i
ε, and the inclusion
of the stabilization factors is transparent to the performance of the algorithm. However, whenever
σ
2
i
≈ ε, the stabilization terms begins to have a significant effect. Within this operating region the
power in the channels will not be uniformly normalized and the convergence rate of the filter will
begin to degrade but catatrophic failure will be avoided.
The motivation for using the TDAF adaptive system instead of a simpler LMS based system is
to achieve rapid convergence of the filter’s coefficients when the input signal is not white, while
maintaining a reasonably low computational complexity requirement. In the following section this
convergence rate improvement of the TDAF will be explained geometrically.
c
1999 by CRC Press LLC
22.3 Convergence of the Transform Domain Adaptive Filter
In this section the convergence rate improvement of the TDAF is described in terms of the mean
squared error surface. From Eqs. (22.4) and (22.6) it is found that R
v
= T
∗
R
x
T
T
, so that for the
transform structure without power normalization Eq. (22.5) becomes
Jz ≡ E
|e(n)|
2
= J
min
+ z
∗T
T
∗
R
x
T
T
z .
(22.13)
The difference between (22.5) and (22.13) is the presence of T inthequadratictermof(22.13). When
T is a unitary matrix, its presence in (22.13) gives a rotation and/or a reflection of the surface. The
eccentricity of the surface is unaffected by the transform, so the convergence rate of the system is
unchanged by the transformation alone.
However, the signal power levels at the adaptive coefficients are changed by the transforma-
tion. Consider the intersection of the equal-error contours with the rotated axes: letting z =
[0···z
i
···0]
T
, with z
i
in the i-th position, Eq. (22.13) becomes
J(z) − J
min
=
T
∗
R
x
T
T
i
z
2
i
≈ σ
2
i
z
2
i
.
(22.14)
If the equal-error contours are hyperspheres (the ideal case), then for a fixed value of the error
J (n),(22.14) must give |z
i
|=|z
j
| for all i and j, since all points on a hypersphere are equidistant
from the origin. When the filter input is not white, this will not hold in general. But since the power
levels σ
2
i
are easily estimated, the rotated axes can be scaled to have this property. Let
−1
ˆ
z = z,
where is defined in (22.10). Then the error surface of the TDAF, with transform T and including
power normalization, is given by
J(
ˆ
z) = J
min
+
ˆ
z
∗T
−1
T
∗
R
x
T
T
−1
ˆ
z .
(22.15)
The main diagonal entries of
−1
T
∗
R
x
T
T
−1
are all equal to one, so (22.14) becomes J(z)−J
min
=
ˆz
2
i
, which has the property described above.
Thus, the action of the TDAF system is to rotate the axes of the filter coefficient space using a
unitary rotation matrix T, and then to scale these axes so that the error surface contours become
approximately hyperspherical at the points where they can be easily observed, i.e., the points of
intersection with the new (rotated) axes. Usually the actual eccentricity of the error surface contours
is reduced by this scaling, and faster convergence is obtained.
As a second example, transform domain processing is now added to the previous example, as
illustrated in Figs. 22.4 and 22.5. The error surface of Fig. 22.4 was created by using the (arbitrary)
transform
T =
0.866 0.500
0.500 0.866
,
on the error surface shown in Fig. 22.2, which produces clockwise rotation of the ellipsoidal contours
so that the major and minor axes more closely align with the coordinate axes than they did without
the transform. Powernormalization was then applied using the normalization matrix
−1
as shown
in Fig. 22.5, which represents the transformed and power normalized error surface. Note that the
elliptical contours after transform domain processing are nearly circular in shape, and in fact they
would have been perfectly circular if the rotation of Fig. 22.4 had brought the contours into precise
alignment with the coordinate axes. Perfect alignment did not occur in this example because T was
not able to perfectly diagonalize the input autocorrelation matrix for this particular x(n). Since T is
a fixed transform in the TDAF structure, it clearly cannot properly diagonalize R
x
for an arbitrary
x(n), hence the surface rotation (orthogonalization) will be less than perfect for most input signals. It
c
1999 by CRC Press LLC