Tải bản đầy đủ (.pdf) (21 trang)

Tài liệu Digital Signal Processing Handbook P23 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (284.2 KB, 21 trang )

Williamson, G.A. “Adaptive IIR Filters”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c

1999byCRCPressLLC
23
Adaptive IIR Filters
Geoffrey A. Williamson
Illinois Institute of Technology
23.1 Introduction
The System Identification Framework for Adaptive IIR Filter-
ing

Algorithms andPerformanceIssues

Some Preliminaries
23.2 The Equation Error Approach
The LMS and LS Equation Error Algorithms

Instrumental
Variable Algorithms

Equation Error Algorithms with Unit
Norm Constraints
23.3 The Output Error Approach
Gradient-Descent Algorithms

Output Error Algorithms
Based on Stability Theory


23.4 Equation-Error/Output-Error Hybrids
The Steiglitz-McBride Family of Algorithms
23.5 Alternate Parametrizations
23.6 Conclusions
References
23.1 Introduction
In comparison with adaptive finite impulse response (FIR) filters, adaptive infinite impulse response
(IIR) filters offer the potential to implement an adaptive filter meeting desired performance levels,
as measured by mean-square error, for example, with much less computational complexity. This
advantage stems fromthe enhanced modeling capabilities providedby the pole/zerotransfer function
of the IIR structure, compared to the “all-zero” form of the FIR structure.
However, adapting an IIR filter brings with it a number of challenges in obtaining stable and
optimal behavior of the algorithms used to adjust the filter parameters. Since the 1970s, there has
been much active research focused on adaptive IIR filters, but many of these challenges to date have
not been completely resolved. As a consequence, adaptive IIR filters are not found in commercial
practice in anywhere near the frequency that adaptive FIR filters are. Nonetheless, recent advances
in adaptive IIR filter research have provided new results and insights into the behavior of several
methods for adapting the filter parameters, and new algorithms have been proposed that address
some of the problems and open issues in these systems. Hence, this class of adaptive filter continues
to maintain promise as a potentially effective and efficient adaptive filtering option.
In this section, we provide an up-to-date overview of the different approaches to the adaptive IIR
filtering problem. Due to the extensive literature on the subject, many readers may wish to peruse
several earlier general treatments of the topic. Johnson’s 1984 [11] and Shynk’s 1989 paper [23]are
still current in the sense that a number of open issues cited therein remain open today. More recently,
Regalia’s 1995 book [19] provides a comprehensive view of the subject.
c

1999 by CRC Press LLC
23.1.1 The System Identification Framework for Adaptive IIR Filtering
Thespread ofissues associated with adaptive IIRfiltersis mosteasilyunderstood ifoneadopts a system

identification perspective to the filtering problem. To this end, consider the diagram presented in
Fig. 23.1. Available to the adaptive filterare two external signals: the input signal x(n) and the desired
output signal d(n). The adaptive filtering problem is to adjust the parameters of the filter acting on
x(n) so thatits output y(n)approximates d(n). From the systemidentification perspective, the task at
hand is to adjust the parameters of the filter generating y(n) from x(n) in Fig. 23.1 so that the filtering
operation itself matches in some sense the system generating d(n) from x(n). These two viewpoints
are closely related because if the systems are the same, then their outputs will be close. However, by
adopting the convention that there is a system generating d(n) from x(n), clearer insights into the
behavior and design of adaptive algorithms are obtained. This insight is useful even if the “system”
generating d(n) from x(n) has only a statistical and not a physical basis in reality.
FIGURE 23.1: System identification configuration of the adaptive IIR filter.
The standard adaptive IIR filter is described by
y(n)+ a
1
(n)y(n−1)+···+a
N
(n)y(n− N) = b
0
(n)x(n)+ b
1
(n)x(n−1)+···+b
M
(n)x(n− M),
(23.1)
or equivalently

1 + a
1
(n)q
−1

+···+a
N
(n)q
−N

y(n) =

b
0
(n) + b
1
(n)q
−1
+···+b
M
(n)q
−M

x(n) .
(23.2)
As is shown in Fig. 23.1, Eq. (23.2) may be written in shorthand as
y(n) =
B(q
−1
,n)
A(q
−1
,n)
x(n) ,
(23.3)

c

1999 by CRC Press LLC
where B(q
−1
,n) and A(q
−1
,n) are the time-dependent polynomials in the delay operator q
−1
appearing in (23.2). The parameters that are updated by the adaptive algorithm are the coefficients
of these polynomials. Note that the polynomial A(q
−1
,n) is constrained to be monic, such that
a
0
(n) = 1.
We adopt a rather more general description for the unknown system, assuming that d(n) is gen-
erated from the input signal x(n) via some linear time-invariant system H(q
−1
), with the addition
of a noise signal v(n) to reflect components in d(n) that are independent of x(n). We further break
down H(q
−1
) into a transfer function H
m
(q
−1
) that is explicitly modeled by the adaptive filter, and
a transfer function H
u

(q
−1
) that is unmodeled. In this way, we view d(n) as a sum of three compo-
nents: the signal y
m
(n) that is modeled by the adaptive filter, the signal y
u
(n) that is unmodeled but
that depends on the input signal, and the signal v(n) that is independent of the input. Hence,
d(n) = y
m
(n) + y
u
(n) + v(n)
(23.4)
= y
s
(n) + v(n) ,
(23.5)
where y
s
(n) = y
m
(n) + y
u
(n). The modeled component of the system output is viewed as
y
m
(n) =
B

opt
(q
−1
)
A
opt
(q
−1
)
x(n) ,
(23.6)
with B
opt
(q
−1
) =

M
i=0
b
i,
opt
q
−i
and A
opt
(q
−1
) = 1 +


N
i=i
a
i,
opt
q
−i
. Note that (23.6) has the
same form as (23.3). The parameters {a
i,
opt
} and {b
i,
opt
} are considered to be the optimal values for
the adaptive filter parameters, in a manner that we describe shortly.
Figure 23.1 shows two error signals: e
e
(n) termed the equation error , and e
o
(n), termed the
output error. The parameters of the adaptive filter are usually adjusted so as to minimize some
positive function of one or the other of these error signals. However, the figure of merit for judging
adaptive filter performance that we will applythroughout this sectionis the mean-squareoutput error
E{e
2
o
(n)}. In most adaptive filtering applications, the desired signal, d(n), is available only during a
“training phase” in which the filter parameters are adapted. At the conclusion of the training phase,
the filter will be operated to producethe output signal y(n) as shown in the figure, with the difference

between the filter output y(n) and the (now unmeasurable) system output d(n) the error. Thus,
we adopt the convention that {a
i,
opt
} and {b
i,
opt
} are defined such that when a
i
(n) ≡ a
i,
opt
and
b
i
(n) ≡ b
i,
opt
,E{e
2
o
(n)} is minimized, with A
opt
(q
−1
) constrained to be stable.
At this point it is convenient to set down some notation and terminology. Define the regressor
vectors
U
e

(n) =
[
x(n) ···x(n − M) − d(n − 1) ···−d(n − N)
]
T
,
(23.7)
U
o
(n) =
[
x(n) ···x(n − M) − y(n − 1) ···−y(n − N)
]
T
,
(23.8)
U
m
(n) =
[
x(n) ···x(n − M) − y
m
(n − 1) ···−y
m
(n − N)
]
T
.
(23.9)
These vectors are the equation error regressor, output error regressor, and modeled system regressor

vectors, respectively. Define a noise regressor vector
V(n) =
[
0 ···0 − v(n − 1) ···−v(n − N)
]
T
(23.10)
with M + 1 leading zeros corresponding to the x(n − i) values in the preceding regressors. Further-
more, define the parameter vectors
W(n) =
[
b
0
(n) b
1
(n) ···b
M
(n) a
1
(n) ···a
N
(n)
]
T
,
(23.11)
W
opt
=


b
0,
opt
b
1,
opt
···b
M,
opt
a
1,
opt
···a
N,
opt

T
,
(23.12)

W(n) = W
opt
− W(n) ,
(23.13)
W

= lim
n→∞
E{W(n)} .
(23.14)

c

1999 by CRC Press LLC
We will have occasion to use W to refer to the adaptive filter parameter vector when the parameters
are considered to be held at fixed values. With this notation, we may for instance write y
m
(n) =
U
T
m
(n)W
opt
and y(n) = U
T
o
(n)W(n).
The situation in which y
u
(n) ≡ 0 isreferredtoasthesufficient order case. The situation in which
y
u
(n) ≡ 0 is termed the undermodeled case.
23.1.2 Algorithms and Performance Issues
A number of different algorithms for the adaptation of the parameter vector W (n) in (23.11)have
been suggested. These may be characterized with respect to the form of the error criterion employed
by the algorithm. Each algorithm attemptsto drive to zero either the equation error, the output error,
or some combination or hybrid of these two error criteria. Major algorithm classes that we consider
for the equation error approachinclude the standard least-squares (LS) and least mean-square (LMS)
algorithms, which parallel the algorithms used in adaptive FIR filtering. For equation error meth-
ods, we also examine the instrumental variables (IV) algorithm, as well as algorithms that constrain

the parameters in the denominator of the adaptive filter’s transfer function to improve estimation
properties. In the output error class, we examine gradient algorithms and hyperstability-based algo-
rithms. Within the equation and output error hybrid algorithm class, we focus predominantlyon the
Steiglitz-McBride (SM) algorithm, though there are several algorithms that are more straightforward
combinations of equation and output error approaches.
In general, we desire that the adaptive filtering algorithm adjusts the parameter vector W
n
so that
it converges to W
opt
, the parameters that minimize the mean-square output error. The major issues
for adaptive IIR filtering on which we will focus herein are
1. conditions for the stability and convergence of the algorithm used to adapt W(n), and
2. the asymptotic value of the adapted parameter vector W

, and its relationship to W
opt
.
This latter issue relates to the minimum mean-square error achievable by the algorithm, as noted
above. Other issues of importance include the convergence speed of the algorithm, its ability to
track time variations of the “true” parameter values, and numerical properties, but these will receive
less attention here. Of these, convergence speed is of particular concern to practitioners, especially
as adaptive IIR filters tend to converge at a far slower rate than their FIR counterparts. However,
we emphasize the stability and nature of convergence over the speed because if the algorithm fails
to converge or converges to an undesirable solution, the rate at which it does so is of less concern.
Furthermore, convergence speed is difficult to characterize for adaptive IIR filters due to a number of
factors, including complicateddependencies on algorithm initializations, input signal characteristics,
and the relationship between x(n) and d(n).
23.1.3 Some Preliminaries
Unless otherwise indicated, we assume in our discussion that all signals in Fig. 23.1 are stationary,

zero mean, random signals with finite variance. In particular, the properties we ascribe to the various
algorithms are stated with this assumption and are presumed to be valid. Results that are based on a
deterministic framework are similar to those developed here; see [1] for an example.
We shall also make use of the following definitions.
DEFINITION 23.1
A (scalar) signal is persistently exciting (PE) of order L if, with
X(n) =
[
x(n) ···x(n − L + 1)
]
T
,
(23.15)
c

1999 by CRC Press LLC
there exist α and β satisfying 0 <α<β<∞ such that αI < E{X(n)X
T
(n)} <βI. The (vector)
signal X(n) is then also said to be PE.
If x(n) contains at least L/2 distinct sinusoidal components, then x(n) is PE of order L.Any
random signal x(n) whose power spectrum is nonzero over a interval of nonzero width will be PE
for any value of L in (23.15). Such is the case, for example, if x(n) is uncorrelated or if x(n) is
modeled as an AR, MA, or ARMA process driven by uncorrelated noise. PE conditions are required
of all adaptive algorithms to ensure good behavior because if thereis inadequate excitation to provide
information to the algorithm, convergence of the adapted parameters estimates will not necessary
follow [22].
DEFINITION 23.2
A transfer function H(q
−1

) is said to be strictly positive real (SPR) if H(q
−1
)
is stable and the real part of its frequency response is positive at all frequencies.
An SPR condition will be requiredto ensureconvergence for a few of thealgorithms thatwediscuss.
Note that such a condition cannot be guaranteed in practice when H(q
−1
) is an unknown transfer
function, or when H(q
−1
) depends on an unknown transfer function.
23.2 The Equation Error Approach
To motivate the equation error approach, consider again Fig. 23.1. Suppose that y(n) in the figure
were actually equal to d(n). Then the system relationship A(q
−1
, n)y(n) = B(q
−1
, n)x(n) would
imply that A(q
−1
, n)d(n) = B(q
−1
, n)x(n). But of course this last equation does not hold exactly,
and we term its error the “equation error” e
e
(n).Hence,wedefine
e
e
(n) = A(q
−1

, n)d(n) − B(q
−1
, n)x(n) .
(23.16)
Using the notation developed in (23.7) through (23.14), we find that
e
e
(n) = d(n) − U
T
e
(n)W(n) .
(23.17)
Equation error methods for adaptive IIR filtering typically adjust W(n) so as to minimize the mean-
squared error (MSE) J
MSE
(n) = E{e
2
e
(n)},whereE{·} denotes statistical expectation, or the expo-
nentially weighted least-squares (LS) error J
LS
(n) =

n
k=0
λ
n−k
e
2
e

(k).
23.2.1 The LMS and LS Equation Error Algorithms
The equation error e
e
(n) of (23.17) is the difference between d(n) and a prediction of d(n) given by
U
T
e
(n)W(n). Noting that U
T
e
(n) does not depend on W(n), we see that equation error adaptive IIR
filtering is a type of linear prediction, and in particular the form of the prediction is identical to that
arising in adaptive FIR filtering. One would suspect that many adaptive FIR filter algorithms would
then apply directly to adaptive IIR filters with an equation error criterion, and this is in fact the case.
Two adaptive algorithms applicable to equation error adaptive IIR filtering are the LMS algorithm
given by
W(n + 1) = W(n) + µ(n)U
e
(n)e
e
(n) ,
(23.18)
and the recursive least-squares (RLS) algorithm given by
W(n + 1) = W(n) + P (n)U
e
(n)e
e
(n) ,
(23.19)

P (n) =
1
λ

P(n− 1) −
P(n− 1)U
e
(n)U
T
e
(n)P (n − 1)
λ + U
T
e
(n)P (n − 1)U
e
(n)

,
(23.20)
c

1999 by CRC Press LLC
where the above expression for P (n) is a recursive implementation of
P (n) =

n

k=0
λ

n−k
U
e
(k)U
T
e
(k)

−1
.
(23.21)
Some typical choices for µ(n) in (23.18)areµ(n) ≡ µ
0
, a constant, or µ(n) =¯µ/( +U
T
e
(n)U
e
(n)),
a normalized step size. For convergence of the gradient algorithm in (23.18), µ
0
is chosen in the
range 0 <µ
0
< 1/((M + 1)σ
2
x
+ Nσ
2
d

),whereσ
2
x
= E{x
2
(n)} and σ
2
d
= E{d
2
(n)}. Typically,
values of µ
0
in the range 0 <µ
0
< 0.1/((M + 1)σ
2
x
+ Nσ
2
d
) are chosen. With the normalized step
size, we require 0 < ¯µ<2 and >0 for stability, with typical choices of ¯µ = 0.1 and  = 0.001.
In 23.20, we require that λ satisfy 0 <λ≤ 1, with λ typically close to or equal to one, and we
initialize P(0) = γI with γ a large, positive number. These results are analogous to the FIR filter
cases considered in the earlier sections of this chapter.
These algorithms possess nice convergence properties, as we now discuss.
Property 1: Given that x is PEof order N + M + 1, under (23.18) and under (23.19) and (23.20),
with algorithm parameters chosen to satisfy the conditions noted above, then E{W (n)} convergestoa
value W


minimizing J
MSE
(n) and J
LS
(n), respectively, as n →∞.
This property is desirable in that global convergence to parameter values optimal for the equation
error cost function is guaranteed, just as with adaptive FIR filters. The convergence result holds
whether the filter is operating in the sufficient order case or the undermodeled case. This is an
important advantage of the equation error approach over other approaches. The reader is referred
to Chapters 19, 20, and 21 for further details on the convergence behaviors of these algorithms and
their variations. As in the FIR case, the eigenvalues of the matrix R = E{U
e
(n)U
T
e
(n)} determine
the rates of convergence for the LMS algorithm. A large eigenvalue disparity in R engenders slow
convergence in the LMS algorithm and ill-conditioning, with the attendant numerical instabilities,
in the RLS algorithm. For adaptive IIR filters, compared to the FIR case, the presence of d(n) in
U
e
(n) tends to increase the eigenvalue disparity, so that slower convergence is typically observed for
these algorithms.
Of importance is the value of the convergence points for the LMS and RLS algorithms with respect
to the modeling assumptions of the system identification configuration of Fig. 23.1. For simplicity,
let us first assume that the adaptive filter is capable of modeling the unknown system exactly; that is,
H
u
(q

−1
) = 0. One may readily show that the parameter vector W that minimizes the mean-square
equation error (or equivalently the asymptotic least square equation error, given ergodic stationary
signals) is
W = E

U
e
(n)U
T
e
(n)

−1
E{U
e
(n)d(n)}
(23.22)
=

E{U
m
(n)U
T
m
(n)}+E

V(n)V
T
(n)


−1
(
E{U
m
(n)y
m
(n)}+E{V(n)v(n)}
)
.
(23.23)
Clearly, if v(n) ≡ 0, the W so obtained must equal W
opt
, so that we have
W
opt
= E

U
m
(n)U
T
m
(n)

−1
E{U
m
(n)y
m

(n)} .
(23.24)
By comparing (23.23) and (23.24), we can easily see that when v(n) ≡ 0, W = W
opt
. That is, the
parameter estimates provided by (23.18) through (23.20)are,ingeneral,biased from the desired
values, even when the noise term v(n) is uncorrelated.
What effect on adaptive filter performance does this bias impose? Since the parameters that
minimize the mean-square equation error are not the same as W
opt
, the values that minimize the
c

1999 by CRC Press LLC
mean-square output error, the adaptive filter performance will not be optimal. Situations can arise
in which this bias is severe, with correspondingly significant degradation of performance.
Furthermore, a critical issue with regard to the parameter bias is the input-output stability of the
resulting IIR filter. Because the equation error is formed as A(q
−1
)d(n) − B(q
−1
)x(n), a difference
of two FIR filtered signals, there are no built in constraints to keep the roots of A(q
−1
) within the
unit circle in the complex plane. Clearly, if an unstable polynomial results from the adaptation, then
the filter output y(n) can grow unboundedly in operational mode, so that the adaptive filter fails. An
example of such a situation is given in [25]. An important feature of this example is that the adaptive
filter is capable of precisely modeling the unknown system, and that interactions of the noise process
within the algorithm are all that is needed to destabilize the resulting model.

Nonetheless, under certain operating conditions, thiskind of instability can be shown not to occur,
as described in the following.
Property 2: [18] Consider the adaptive filter depicted in Fig. 23.1,wherey(n) isgivenby(23.2). If
x(n) is an autoregressive process of order no more than N, and v(n) is independent of x(n) and of finite
variance, then the adaptive filter parameters minimizing the mean-square equation error E{e
2
e
(n)} are
such that A(q
−1
) is stable.
For instance, if x(n) is an uncorrelated signal, then the convergence point of the equation error
algorithms corresponds to a stable filter.
To summarize, for LMS and RLS adaptation in an equation error setting, we have guaranteed
global convergence, but bias in the presence of additive noise even in the exact modeling case, and
an estimated model guaranteed to be stable only under a limited set of conditions.
23.2.2 Instrumental Variable Algorithms
A number of different approaches to adaptive IIR filtering have been proposed with the intention of
mitigating the undesirable biased properties of the LMS- and RLS-based equation error adaptive IIR
filters. One such approach, still within the equation error context, is the instrumental variables (IV)
method. Observe that the bias problem illustrated above stems from the presence of v(n) in both
U
e
(n) and in e
e
(n) in the update terms in (23.18) and(23.19), so that second orderterms in v(n) then
appear in (23.23). This simultaneous presence creates, in expectation, a nonzero, noise-dependent
driving term to the adaptation. The IV algorithm approach addresses this by replacing U
e
(n) in these

algorithms with a vector U
iv
(n) of instrumental variables that are independent of v(n).IfU
iv
(n)
remains correlated with U
m
(n), the noiseless regressor, convergence to unbiased filter parameters is
possible.
The IV algorithm is given by
W(n + 1) = W(n) + µ(n)P
iv
(n)U
iv
(n)e
e
(n) .
(23.25)
P
iv
(n) =
1
λ(n)

P
iv
(n − 1) −
P
iv
(n − 1)U

iv
(n)U
T
e
(n)P
iv
(n − 1)
(λ(n)/µ(n)) + U
T
e
(n)P
iv
(n − 1)U
iv
(n)

.
(23.26)
with λ(n) = 1 − µ(n). Common choices for λ(n) aretosetλ(n) ≡ λ
0
, a fixed constant in the range
0 <λ<1 and usually chosen in the range between 0.9 and 0.99, or to choose µ(n) = 1/n and
λ(n) = 1 − µ(n). As with RLS methods, P(0) = γI with γ a large, positive number. The vector
U
iv
(n) is typically chosen as
U
iv
(n) =
[

x(n) ···x(n − M) − z(n − 1) ···−z(n − N)
]
T
(23.27)
with either
z(n) =−x(n − M) or z(n) =
¯
B(q
−1
)
¯
A(q
−1
)
x(n) .
(23.28)
c

1999 by CRC Press LLC

×