Tài liệu 22 TransformDomain Adaptive Filtering pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (239.62 KB, 22 trang )

W. Kenneth Jenkins, et. Al. “Transform Domain Adaptive Filtering.”
2000 CRC Press LLC. <>.
TransformDomainAdaptive
Filtering
W.KennethJenkins
UniversityofIllinois,
Urbana-Champaign
DanielF.Marshall
MITLincolnLaboratory
22.1LMSAdaptiveFilterTheory
22.2OrthogonalizationandPowerNormalization
22.3ConvergenceoftheTransformDomainAdaptiveFilter
22.4DiscussionandExamples
22.5Quasi-NewtonAdaptiveAlgorithms
AFastQuasi-NewtonAlgorithm
•
Examples
22.6The2-DTransformDomainAdaptiveFilter
22.7Block-BasedAdaptiveFilters
ComparisonoftheConstrainedandUnconstrainedFre-
quencyDomainBlock-LMSAdaptiveAlgorithms
•
Examples
andDiscussion
References
Oneoftheearliestworksontransformdomainadaptiveﬁlteringwaspublishedin1978byDentino
etal.[4],inwhichtheconceptofadaptiveﬁlteringinthefrequencydomainwasproposed.Many
publicationshavesinceappearedthatfurtherdevelopthetheoryandexpandthecurrentunder-
standingofperformancecharacteristicsforthisclassofadaptiveﬁlters.Inadditiontothediscrete
Fouriertransform(DFT),otherorthogonaltransformssuchasthediscretecosinetransform(DCT)
andtheWalshHadamardtransform(WHT)canalsobeusedeffectivelyasameanstoimprovethe

LMSalgorithmwithoutaddingtoomuchcomputationalcomplexity.Forthisreason,thegeneral
termtransformdomainadaptiveﬁlteringisusedinthefollowingdiscussiontomeanthattheinput
signalispreprocessedbydecomposingtheinputvectorintoorthogonalcomponents,whicharein
turnusedasinputstoaparallelbankofsimpleradaptivesubﬁlters.Withanorthogonaltransforma-
tion,theadaptationtakesplaceinthetransformdomain,asitispossibletoshowthattheadjustable
parametersareindeedrelatedtoanequivalentsetoftimedomainﬁltercoefﬁcientsbymeansofthe
sametransformationthatisusedfortherealtimeprocessing[5,14,17].
AdirectformFIRdigitalﬁlterstructureisshowninFig.22.1.ThedirectformrequiresN−1
delays,Nmultiplications,andN−1additionsforeachoutputsamplethatisproduced.Theamount
ofhardware(aswellaspower)requiredtoimplementthedirectformstructuredependsonthedegree
ofhardwaremultiplexingthatcanbeutilizedwithinthespeeddemandsoftheapplication.Afully
parallelimplementationconsistingofNdelayregisters,Nmultipliers,andatreeoftwo-inputadders
wouldbeneededforveryhigh-frequencyapplications.Attheoppositeendoftheperformancespec-
trum,asequentialimplementationconsistingofalengthNdelaylineandasingletimemultiplexed
multiplierandaccumulationadderwouldprovidethecheapest(andslowest)implementation.This
c

1999byCRCPressLLC
FIGURE 22.1: The direct form adaptive ﬁlter structure.
latter structure would be characteristic of a ﬁlter that is implemented in software on one of the many
commercially available DSP chips.
Regardless of the hardware complexity that results from a particular implementation, the com-
putational complexity of the ﬁlter is determined by the requirements of the algorithm and, as such,
remains invariant with respect to different hardware structures. In particular, the computational
complexity of the direct form FIR ﬁlter is O[N], since N multiplications and (N −1) additions must
be performed at each iteration. When designing an adaptive ﬁlter, it seems reasonable to seek an
adaptive algorithm whose order of complexity is no greater than the order of complexity of the basic
ﬁlter structure itself. This goal is achieved by the LMS algorithm, which is the major contributing
factor to the enormous success of that algorithm. Extending this principle for 2-D adaptive ﬁlters
implies that desirable 2-D adaptive algorithms have an order of complexity of O[N

2
], since a 2-D
FIR direct form ﬁlter has O[N
2
] complexity inherent in its basic structure [11, 21].
The transform domain adaptive ﬁlteris a generalization ofthe LMS FIR structure, in which a linear
transformation is performed on the input signal and each transformed “chanel” is power normalized
to improve the convergence rate of the adaptation process. The linear transform is characterized
throughout the following discussions as a sliding window operator that consists of a transformation
matrix multiplying an input vector [14]. At each iteration n the input vector includes one new input
sample x(n), and N − 1 past input samples x(n − k), k = 1,...,N− 1. As the window slides
forward sample by sample, ﬁltered outputs are produced continuously at each value of the index n.
Since the input transformation is represented by a matrix-vector product, it might appear that
the computational complexity of the transform domain ﬁlter is at least O[N
2
]. However, many
transformations can be implemented with fast algorithms that have complexities less than O[N
2
].
For example, the discrete Fourier transform can be implemented by the FFT algorithm, resulting in
a complexity of O[N log
2
N] per iteration. Some transformations can be implemented recursively
in a bank of parallel ﬁlters, resulting in a net complexity of O[N] per iteration. The main point to
be made here is that the complexity of the transform domain ﬁlter typically falls between O[N] and
O[N
2
], with the actual complexity depending on the speciﬁc algorithm that is used to compute the
sliding window transform operator [17].
22.1 LMS Adaptive Filter Theory

The LMS algorithm is derived as an approximation to the steepest descent optimization strategy. The
fact that the ﬁeld of adaptive signal processing is based on an elementary principle from optimization
theory suggests that more advanced adaptive algorithms can be developed by incorporating other
c

1999 by CRC Press LLC
results from the ﬁeld of optimization [22]. This point of view recurs throughout this discussion, as
concepts are borrowed from the ﬁeld of optimization and modiﬁed for adaptive ﬁltering as needed.
In particular, one of the borrowed ideas that appears later is the quasi-Newton optimization strategy.
It will be shown that transform domain adaptive ﬁltering algorithms are closely related to quasi-
Newton algorithms, but have computational complexity that is closer to the simple requirements of
the LMS algorithm.
For a length N FIR ﬁlter with the input expressed as a column vector x(n) =[x(n), x(n −
1),...,x(n− N + 1)]
T
, the ﬁlter output y(n) is easily expressed as
y(n) = w
T
(n)x(n) ,
(22.1)
where w(n) =[w
0
(n), w
1
(n),...,w
N−1
(n)]
T
is the time varying vector of ﬁlter coefﬁcients (tap
weights), and the superscript “T” denotes vector transpose. The output error is formed as the

difference between the ﬁlter output and a training signal d(n), i.e., e(n) = d(n)− y(n). Strategies for
obtaining an appropriate d(n) vary from one application to another. In many cases the availability
of a suitable training signal determines whether an adaptive ﬁltering solution will be successful
in a particular application. The ideal cost function is deﬁned by the mean squared error (MSE)
criterion, E[|e(n)|
2
]. The LMS algorithm is derived by approximating the ideal cost function by the
instantaneous squared error, resulting in J
LMS
(n) =|e(n)|
2
. While the LMS seems to make a rather
crude approximation at the very beginning, the approximation results in an unbiased estimator.
In many applications the LMS algorithm is quite robust and is able to converge rapidly to a small
neighborhood of the optimum Wiener solution.
The steepest descent optimization strategy is given by
w(n + 1) = w(n) − µ∇
E[|e|
2
]
(n)
,
(22.2)
where∇
E[|e|
2
]
(n)
is the gradient of the cost function with respect to the coefﬁcient vector w(n). When
the gradient is formed using the LMS cost function J

LMS
(n) =|e(n)|
2
, the conventionalLMS results:
w(n + 1) = w(n) + µe(n)x(n) ,
e(n) = d(n) − y(n) ,
(22.3)
and
y(n) = x(n)
T
w(n) .
(Note: Many sources include a “2” before the µ factor in Eq. (22.3) because this factor arises during
the derivation of (22.3)from(22.2). In this discussion we assume this factor is absorbed into the µ,
so it will not appear explicitly.) Since the LMS algorithm is treated in considerable detail in other
sections of this book, we will not present any further derivation or analysis of it here. However, the
following observations will be useful when other algorithms are compared to the LMS as a baseline
design [2, 3, 6, 8].
1. Assume that all of the signals and ﬁlter variables are real-valued. The ﬁlter itself requires
N multiplications and N − 1 additions to produce y(n) at each value of n . The coefﬁ-
cient update algorithm requires 2N multiplications and N additions, resulting in a total
computational burden of 3N multiplications and 2N − 1 additions per iteration. Since
N is generally much larger than the factor of three, the order of complexity of the LMS
algorithm is O[N].
2. The cost function given for the LMS algorithm is a simpliﬁed form of the one used for
the RLS algorithm. This implies that the LMS algorithm is a simpliﬁed version of the RLS
algorithm, where averages are replaced by single instantaneous terms.
c

1999 by CRC Press LLC
3. The (power normalized) LMS algorithm is also a simpliﬁed form of the transform domain

adaptive ﬁlter which results by setting the transform matrix equal to the identity matrix.
4. The LMS algorithm is also a simpliﬁed form of the Gauss-Newton optimization strategy
whichintroduces secondorder statistics (the input autocorrelationfunction) toaccelerate
the rate of convergence. In order to obtain the LMS algorithm from the Gauss-Newton
algorithm, two approximations must be made: (i) The gradient must be approximated by
the instantaneous error squared, and (ii) the inverse of the input autocorrelation matrix
must be crudely approximated by the identity matrix.
These observations suggest that many of the seemingly distinct adaptive ﬁltering algorithms that
appear scattered about in the literature are indeed closely related, and can be considered to be mem-
bers of a family whose hereditary characteristics have their origins in Gauss-Newton optimization
theory [15, 16]. The different members of this family inherit their individual characteristics from
approximations that are made on the pure Gauss-Newton algorithm at various stages of their deriva-
tions. However, after the individual derivations are complete and each algorithm is packaged in
its own algorithmic form, the algorithms look considerably different from one another. Unless a
conscious effort is made to reveal their commonality, the fact that they have evolved from common
roots may be entirely obscured.
The convergence behavior of the LMS algorithm, as applied to a direct form FIR ﬁlter structure, is
controlled by the autocorrelation matrix R
x
of the input process, where
R
x
≡ E[x
∗
(n)x
T
(n)] .
(22.4)
(The
∗

in Eq. (22.4) denotes complex conjugate to account for the general case of complex input
signals, although throughout most of the following discussions it will be assumed that x(n) and d(n)
are both real-valued signals.) The autocorrelation matrix R
x
is usually positive deﬁnite, which is
one of the conditions necessary to guarantee convergence to the Wiener solution. Another necessary
condition for convergence is 0 <µ<1/λ
max
,whereλ
max
is the largest eigenvalue of R
x
. It is also
well established that the convergence of this algorithm is directly related to the eigenvalue spread of
R
x
. The eigenvalue spread is measured by the condition number of R
x
, deﬁned as κ = λ
max
/λ
min
,
where λ
min
is the minimum eigenvalue of R
x
. Ideal conditioning occurs when κ = 1 (white noise); as
this ratio increases, slower convergence results. The eigenvalue spread (condition number) depends
on the spectral distribution of the input signal and can be shown to be related to the maximum and

minimumvalues of the input powerspectrum (22.4). Fromthis line of reasoning itbecomes clear that
white noise is the ideal input signal for rapidly training an LMS adaptive ﬁlter. The adaptive process
becomes slower and requires more computation for input signals that are more severely colored [6].
Convergence properties are reﬂected in the geometry of the MSE surface, which is simply the
mean squared output error E[|e(n)|
2
] expressed as a function of the N adaptive ﬁlter coefﬁcients in
(N + 1)-space. An expression for the error surface of the direct form ﬁlter is
J(z) ≡ E

|e(n)|
2

= J
min
+ z
∗T
R
x
z ,
(22.5)
with R
x
deﬁnedin(22.4) and z ≡ w − w
opt
,wherew
opt
is the vector of optimum ﬁlter coefﬁcients in
the sense of minimizing the mean squared error ( w
opt

is known as the Wiener solution). An example
of an error surface for a simple two-tap ﬁlter is shown in Fig. 22.2. In this example x(n) was speciﬁed
to be a colored noise input signal with an autocorrelation matrix
R
x
=

1.00.9
0.91.0

.
Figure 22.2 shows three equal-error contours on the three dimensional surface. The term z
∗T
R
x
z
in Eq. (22.2) is a quadratic form that describes the bowl shape of the FIR error surface. When R
x
is
c

1999 by CRC Press LLC
positive deﬁnite, the equal-error contours of the surface are hyperellipses (N dimensional ellipses)
centered at the origin of the coefﬁcient parameter space. Furthermore, the principle axes of these
hyperellipses are the eigenvectors of R
x
, and their lengths are proportional to the eigenvalues of R
x
.
Sincethe convergence rate of the LMS algorithm is inversely relatedto the ratio of the maximum to the

minimum eigenvalues of R
x
, large eccentricity of the equal-error contours implies slow convergence
of the adaptive system. In the case of an ideal white noise input, R
x
has a single eigenvalue of
multiplicity N, so that the equal-error contours are hyperspheres [8].
FIGURE 22.2: Example of an error surface for a simple two-tap ﬁlter.
22.2 Orthogonalization and Power Normalization
The transform domain adaptive ﬁlter (TDAF) structure is shown in Fig. 22.3. The input x(n) and
desired signal d(n) are assumed to be zero mean and jointly stationary. The input to the ﬁlter is a
vector of N current and past input samples, deﬁned in the previous section and denoted as x(n).
This vector is processed by a unitary transform, such as the DFT. Once the ﬁlter order N is ﬁxed, the
transform is simply an N × N matrix T, which is in general complex, with orthonormal rows. The
transformed outputs form a vector v(n) whichisgivenby
v(n) =

v
0
(n), v
1
(n),...,v
N−1
(n)

T
= Tx(n) .
(22.6)
With an adaptive tap vector deﬁned as
W(n) =


W
0
(n), W
1
(n),...,W
N−1
(n)

T
,
(22.7)
the ﬁlter output is given by
y(n) = W
T
(n)v(n) = W
T
(n)Tx(n) .
(22.8)
The instantaneous output error
e(n) = d(n) − y(n)
(22.9)
c

1999 by CRC Press LLC
FIGURE 22.3: The transform domain adaptive ﬁlter structure
is then formed and used to update the adaptive ﬁlter taps using a modiﬁed form of the LMS algo-
rithm (22.11):
W(n + 1) = W(n) + µe(n)
−2

v
∗
(n)

2
≡ diag

σ
2
0
,σ
2
1
,...,σ
2
N−1

(22.10)
where
σ
2
i
= E

|
v
i
(n)
|
2


.
As before, the superscript asterisk in (22.10) indicates complex conjugation to account for the most
general case in which the transform is complex. Also, the use of the upper case coefﬁcient vector
in Eq. (22.10) denotes that W(n) is a transform domain variable. The power estimates σ
2
i
can be
developed on-line by computing an exponentially weighted average of past samples according to
σ
2
i
(n) = ασ
2
i
(n − 1) +
|
v
i
(n)
|
2
, 0 <α<1 .
(22.11)
If σ
2
i
becomes too small due to an insufﬁcient amount of energy in the i-th channel, the update
mechanism becomes ill-conditioned due to a very large effective step size. In some cases the process
will become unstable and register overﬂow will cause the adaptation to catastrophically fail. So

the algorithm given by (22.10) should have the update mechanism disabled for the i-th orthogonal
channel if σ
2
i
falls below a critical threshold.
Alternatively the transform domain algorithm may be stabilized by adding small positive constants
ε to the diagonal elements of 
2
according to


2
= 
2
+ εI .
(22.12)
Then


2
is used in place of 
2
in Eq. (22.10). For most input signals σ
2
i
 ε, and the inclusion
of the stabilization factors is transparent to the performance of the algorithm. However, whenever
σ
2
i

≈ ε, the stabilization terms begins to have a signiﬁcant effect. Within this operating region the
power in the channels will not be uniformly normalized and the convergence rate of the ﬁlter will
begin to degrade but catatrophic failure will be avoided.
The motivation for using the TDAF adaptive system instead of a simpler LMS based system is
to achieve rapid convergence of the ﬁlter’s coefﬁcients when the input signal is not white, while
maintaining a reasonably low computational complexity requirement. In the following section this
convergence rate improvement of the TDAF will be explained geometrically.
c

1999 by CRC Press LLC
22.3 Convergence of the Transform Domain Adaptive Filter
In this section the convergence rate improvement of the TDAF is described in terms of the mean
squared error surface. From Eqs. (22.4) and (22.6) it is found that R
v
= T
∗
R
x
T
T
, so that for the
transform structure without power normalization Eq. (22.5) becomes
Jz ≡ E

|e(n)|
2

= J
min
+ z

∗T

T
∗
R
x
T
T

z .
(22.13)
The difference between (22.5) and (22.13) is the presence of T inthequadratictermof(22.13). When
T is a unitary matrix, its presence in (22.13) gives a rotation and/or a reﬂection of the surface. The
eccentricity of the surface is unaffected by the transform, so the convergence rate of the system is
unchanged by the transformation alone.
However, the signal power levels at the adaptive coefﬁcients are changed by the transforma-
tion. Consider the intersection of the equal-error contours with the rotated axes: letting z =
[0···z
i
···0]
T
, with z
i
in the i-th position, Eq. (22.13) becomes
J(z) − J
min
=

T
∗

R
x
T
T

i
z
2
i
≈ σ
2
i
z
2
i
.
(22.14)
If the equal-error contours are hyperspheres (the ideal case), then for a ﬁxed value of the error
J (n),(22.14) must give |z
i
|=|z
j
| for all i and j, since all points on a hypersphere are equidistant
from the origin. When the ﬁlter input is not white, this will not hold in general. But since the power
levels σ
2
i
are easily estimated, the rotated axes can be scaled to have this property. Let 
−1
ˆ

z = z,
where  is deﬁned in (22.10). Then the error surface of the TDAF, with transform T and including
power normalization, is given by
J(
ˆ
z) = J
min
+
ˆ
z
∗T


−1
T
∗
R
x
T
T

−1

ˆ
z .
(22.15)
The main diagonal entries of 
−1
T
∗

R
x
T
T

−1
are all equal to one, so (22.14) becomes J(z)−J
min
=
ˆz
2
i
, which has the property described above.
Thus, the action of the TDAF system is to rotate the axes of the ﬁlter coefﬁcient space using a
unitary rotation matrix T, and then to scale these axes so that the error surface contours become
approximately hyperspherical at the points where they can be easily observed, i.e., the points of
intersection with the new (rotated) axes. Usually the actual eccentricity of the error surface contours
is reduced by this scaling, and faster convergence is obtained.
As a second example, transform domain processing is now added to the previous example, as
illustrated in Figs. 22.4 and 22.5. The error surface of Fig. 22.4 was created by using the (arbitrary)
transform
T =

0.866 0.500
0.500 0.866

,
on the error surface shown in Fig. 22.2, which produces clockwise rotation of the ellipsoidal contours
so that the major and minor axes more closely align with the coordinate axes than they did without
the transform. Powernormalization was then applied using the normalization matrix 

−1
as shown
in Fig. 22.5, which represents the transformed and power normalized error surface. Note that the
elliptical contours after transform domain processing are nearly circular in shape, and in fact they
would have been perfectly circular if the rotation of Fig. 22.4 had brought the contours into precise
alignment with the coordinate axes. Perfect alignment did not occur in this example because T was
not able to perfectly diagonalize the input autocorrelation matrix for this particular x(n). Since T is
a ﬁxed transform in the TDAF structure, it clearly cannot properly diagonalize R
x
for an arbitrary
x(n), hence the surface rotation (orthogonalization) will be less than perfect for most input signals. It
c

1999 by CRC Press LLC

Tài liệu 22 TransformDomain Adaptive Filtering pptx

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về