Tài liệu Digital Signal Processing Handbook P23 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (284.2 KB, 21 trang )

Williamson, G.A. “Adaptive IIR Filters”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c

1999byCRCPressLLC
23
Adaptive IIR Filters
Geoffrey A. Williamson
Illinois Institute of Technology
23.1 Introduction
The System Identiﬁcation Framework for Adaptive IIR Filter-
ing
•
Algorithms andPerformanceIssues
•
Some Preliminaries
23.2 The Equation Error Approach
The LMS and LS Equation Error Algorithms
•
Instrumental
Variable Algorithms
•
Equation Error Algorithms with Unit
Norm Constraints
23.3 The Output Error Approach
Gradient-Descent Algorithms
•
Output Error Algorithms
Based on Stability Theory

23.4 Equation-Error/Output-Error Hybrids
The Steiglitz-McBride Family of Algorithms
23.5 Alternate Parametrizations
23.6 Conclusions
References
23.1 Introduction
In comparison with adaptive ﬁnite impulse response (FIR) ﬁlters, adaptive inﬁnite impulse response
(IIR) ﬁlters offer the potential to implement an adaptive ﬁlter meeting desired performance levels,
as measured by mean-square error, for example, with much less computational complexity. This
advantage stems fromthe enhanced modeling capabilities providedby the pole/zerotransfer function
of the IIR structure, compared to the “all-zero” form of the FIR structure.
However, adapting an IIR ﬁlter brings with it a number of challenges in obtaining stable and
optimal behavior of the algorithms used to adjust the ﬁlter parameters. Since the 1970s, there has
been much active research focused on adaptive IIR ﬁlters, but many of these challenges to date have
not been completely resolved. As a consequence, adaptive IIR ﬁlters are not found in commercial
practice in anywhere near the frequency that adaptive FIR ﬁlters are. Nonetheless, recent advances
in adaptive IIR ﬁlter research have provided new results and insights into the behavior of several
methods for adapting the ﬁlter parameters, and new algorithms have been proposed that address
some of the problems and open issues in these systems. Hence, this class of adaptive ﬁlter continues
to maintain promise as a potentially effective and efﬁcient adaptive ﬁltering option.
In this section, we provide an up-to-date overview of the different approaches to the adaptive IIR
ﬁltering problem. Due to the extensive literature on the subject, many readers may wish to peruse
several earlier general treatments of the topic. Johnson’s 1984 [11] and Shynk’s 1989 paper [23]are
still current in the sense that a number of open issues cited therein remain open today. More recently,
Regalia’s 1995 book [19] provides a comprehensive view of the subject.
c

1999 by CRC Press LLC
23.1.1 The System Identiﬁcation Framework for Adaptive IIR Filtering
Thespread ofissues associated with adaptive IIRﬁltersis mosteasilyunderstood ifoneadopts a system

identiﬁcation perspective to the ﬁltering problem. To this end, consider the diagram presented in
Fig. 23.1. Available to the adaptive ﬁlterare two external signals: the input signal x(n) and the desired
output signal d(n). The adaptive ﬁltering problem is to adjust the parameters of the ﬁlter acting on
x(n) so thatits output y(n)approximates d(n). From the systemidentiﬁcation perspective, the task at
hand is to adjust the parameters of the ﬁlter generating y(n) from x(n) in Fig. 23.1 so that the ﬁltering
operation itself matches in some sense the system generating d(n) from x(n). These two viewpoints
are closely related because if the systems are the same, then their outputs will be close. However, by
adopting the convention that there is a system generating d(n) from x(n), clearer insights into the
behavior and design of adaptive algorithms are obtained. This insight is useful even if the “system”
generating d(n) from x(n) has only a statistical and not a physical basis in reality.
FIGURE 23.1: System identiﬁcation conﬁguration of the adaptive IIR ﬁlter.
The standard adaptive IIR ﬁlter is described by
y(n)+ a
1
(n)y(n−1)+···+a
N
(n)y(n− N) = b
0
(n)x(n)+ b
1
(n)x(n−1)+···+b
M
(n)x(n− M),
(23.1)
or equivalently

1 + a
1
(n)q
−1

+···+a
N
(n)q
−N

y(n) =

b
0
(n) + b
1
(n)q
−1
+···+b
M
(n)q
−M

x(n) .
(23.2)
As is shown in Fig. 23.1, Eq. (23.2) may be written in shorthand as
y(n) =
B(q
−1
,n)
A(q
−1
,n)
x(n) ,
(23.3)

c

1999 by CRC Press LLC
where B(q
−1
,n) and A(q
−1
,n) are the time-dependent polynomials in the delay operator q
−1
appearing in (23.2). The parameters that are updated by the adaptive algorithm are the coefﬁcients
of these polynomials. Note that the polynomial A(q
−1
,n) is constrained to be monic, such that
a
0
(n) = 1.
We adopt a rather more general description for the unknown system, assuming that d(n) is gen-
erated from the input signal x(n) via some linear time-invariant system H(q
−1
), with the addition
of a noise signal v(n) to reﬂect components in d(n) that are independent of x(n). We further break
down H(q
−1
) into a transfer function H
m
(q
−1
) that is explicitly modeled by the adaptive ﬁlter, and
a transfer function H
u

(q
−1
) that is unmodeled. In this way, we view d(n) as a sum of three compo-
nents: the signal y
m
(n) that is modeled by the adaptive ﬁlter, the signal y
u
(n) that is unmodeled but
that depends on the input signal, and the signal v(n) that is independent of the input. Hence,
d(n) = y
m
(n) + y
u
(n) + v(n)
(23.4)
= y
s
(n) + v(n) ,
(23.5)
where y
s
(n) = y
m
(n) + y
u
(n). The modeled component of the system output is viewed as
y
m
(n) =
B

opt
(q
−1
)
A
opt
(q
−1
)
x(n) ,
(23.6)
with B
opt
(q
−1
) =

M
i=0
b
i,
opt
q
−i
and A
opt
(q
−1
) = 1 +


N
i=i
a
i,
opt
q
−i
. Note that (23.6) has the
same form as (23.3). The parameters {a
i,
opt
} and {b
i,
opt
} are considered to be the optimal values for
the adaptive ﬁlter parameters, in a manner that we describe shortly.
Figure 23.1 shows two error signals: e
e
(n) termed the equation error , and e
o
(n), termed the
output error. The parameters of the adaptive ﬁlter are usually adjusted so as to minimize some
positive function of one or the other of these error signals. However, the ﬁgure of merit for judging
adaptive ﬁlter performance that we will applythroughout this sectionis the mean-squareoutput error
E{e
2
o
(n)}. In most adaptive ﬁltering applications, the desired signal, d(n), is available only during a
“training phase” in which the ﬁlter parameters are adapted. At the conclusion of the training phase,
the ﬁlter will be operated to producethe output signal y(n) as shown in the ﬁgure, with the difference

between the ﬁlter output y(n) and the (now unmeasurable) system output d(n) the error. Thus,
we adopt the convention that {a
i,
opt
} and {b
i,
opt
} are deﬁned such that when a
i
(n) ≡ a
i,
opt
and
b
i
(n) ≡ b
i,
opt
,E{e
2
o
(n)} is minimized, with A
opt
(q
−1
) constrained to be stable.
At this point it is convenient to set down some notation and terminology. Deﬁne the regressor
vectors
U
e

(n) =
[
x(n) ···x(n − M) − d(n − 1) ···−d(n − N)
]
T
,
(23.7)
U
o
(n) =
[
x(n) ···x(n − M) − y(n − 1) ···−y(n − N)
]
T
,
(23.8)
U
m
(n) =
[
x(n) ···x(n − M) − y
m
(n − 1) ···−y
m
(n − N)
]
T
.
(23.9)
These vectors are the equation error regressor, output error regressor, and modeled system regressor

vectors, respectively. Deﬁne a noise regressor vector
V(n) =
[
0 ···0 − v(n − 1) ···−v(n − N)
]
T
(23.10)
with M + 1 leading zeros corresponding to the x(n − i) values in the preceding regressors. Further-
more, deﬁne the parameter vectors
W(n) =
[
b
0
(n) b
1
(n) ···b
M
(n) a
1
(n) ···a
N
(n)
]
T
,
(23.11)
W
opt
=


b
0,
opt
b
1,
opt
···b
M,
opt
a
1,
opt
···a
N,
opt

T
,
(23.12)

W(n) = W
opt
− W(n) ,
(23.13)
W
∞
= lim
n→∞
E{W(n)} .
(23.14)

c

1999 by CRC Press LLC
We will have occasion to use W to refer to the adaptive ﬁlter parameter vector when the parameters
are considered to be held at ﬁxed values. With this notation, we may for instance write y
m
(n) =
U
T
m
(n)W
opt
and y(n) = U
T
o
(n)W(n).
The situation in which y
u
(n) ≡ 0 isreferredtoasthesufﬁcient order case. The situation in which
y
u
(n) ≡ 0 is termed the undermodeled case.
23.1.2 Algorithms and Performance Issues
A number of different algorithms for the adaptation of the parameter vector W (n) in (23.11)have
been suggested. These may be characterized with respect to the form of the error criterion employed
by the algorithm. Each algorithm attemptsto drive to zero either the equation error, the output error,
or some combination or hybrid of these two error criteria. Major algorithm classes that we consider
for the equation error approachinclude the standard least-squares (LS) and least mean-square (LMS)
algorithms, which parallel the algorithms used in adaptive FIR ﬁltering. For equation error meth-
ods, we also examine the instrumental variables (IV) algorithm, as well as algorithms that constrain

the parameters in the denominator of the adaptive ﬁlter’s transfer function to improve estimation
properties. In the output error class, we examine gradient algorithms and hyperstability-based algo-
rithms. Within the equation and output error hybrid algorithm class, we focus predominantlyon the
Steiglitz-McBride (SM) algorithm, though there are several algorithms that are more straightforward
combinations of equation and output error approaches.
In general, we desire that the adaptive ﬁltering algorithm adjusts the parameter vector W
n
so that
it converges to W
opt
, the parameters that minimize the mean-square output error. The major issues
for adaptive IIR ﬁltering on which we will focus herein are
1. conditions for the stability and convergence of the algorithm used to adapt W(n), and
2. the asymptotic value of the adapted parameter vector W
∞
, and its relationship to W
opt
.
This latter issue relates to the minimum mean-square error achievable by the algorithm, as noted
above. Other issues of importance include the convergence speed of the algorithm, its ability to
track time variations of the “true” parameter values, and numerical properties, but these will receive
less attention here. Of these, convergence speed is of particular concern to practitioners, especially
as adaptive IIR ﬁlters tend to converge at a far slower rate than their FIR counterparts. However,
we emphasize the stability and nature of convergence over the speed because if the algorithm fails
to converge or converges to an undesirable solution, the rate at which it does so is of less concern.
Furthermore, convergence speed is difﬁcult to characterize for adaptive IIR ﬁlters due to a number of
factors, including complicateddependencies on algorithm initializations, input signal characteristics,
and the relationship between x(n) and d(n).
23.1.3 Some Preliminaries
Unless otherwise indicated, we assume in our discussion that all signals in Fig. 23.1 are stationary,

zero mean, random signals with ﬁnite variance. In particular, the properties we ascribe to the various
algorithms are stated with this assumption and are presumed to be valid. Results that are based on a
deterministic framework are similar to those developed here; see [1] for an example.
We shall also make use of the following deﬁnitions.
DEFINITION 23.1
A (scalar) signal is persistently exciting (PE) of order L if, with
X(n) =
[
x(n) ···x(n − L + 1)
]
T
,
(23.15)
c

1999 by CRC Press LLC
there exist α and β satisfying 0 <α<β<∞ such that αI < E{X(n)X
T
(n)} <βI. The (vector)
signal X(n) is then also said to be PE.
If x(n) contains at least L/2 distinct sinusoidal components, then x(n) is PE of order L.Any
random signal x(n) whose power spectrum is nonzero over a interval of nonzero width will be PE
for any value of L in (23.15). Such is the case, for example, if x(n) is uncorrelated or if x(n) is
modeled as an AR, MA, or ARMA process driven by uncorrelated noise. PE conditions are required
of all adaptive algorithms to ensure good behavior because if thereis inadequate excitation to provide
information to the algorithm, convergence of the adapted parameters estimates will not necessary
follow [22].
DEFINITION 23.2
A transfer function H(q
−1

) is said to be strictly positive real (SPR) if H(q
−1
)
is stable and the real part of its frequency response is positive at all frequencies.
An SPR condition will be requiredto ensureconvergence for a few of thealgorithms thatwediscuss.
Note that such a condition cannot be guaranteed in practice when H(q
−1
) is an unknown transfer
function, or when H(q
−1
) depends on an unknown transfer function.
23.2 The Equation Error Approach
To motivate the equation error approach, consider again Fig. 23.1. Suppose that y(n) in the ﬁgure
were actually equal to d(n). Then the system relationship A(q
−1
, n)y(n) = B(q
−1
, n)x(n) would
imply that A(q
−1
, n)d(n) = B(q
−1
, n)x(n). But of course this last equation does not hold exactly,
and we term its error the “equation error” e
e
(n).Hence,wedeﬁne
e
e
(n) = A(q
−1

, n)d(n) − B(q
−1
, n)x(n) .
(23.16)
Using the notation developed in (23.7) through (23.14), we ﬁnd that
e
e
(n) = d(n) − U
T
e
(n)W(n) .
(23.17)
Equation error methods for adaptive IIR ﬁltering typically adjust W(n) so as to minimize the mean-
squared error (MSE) J
MSE
(n) = E{e
2
e
(n)},whereE{·} denotes statistical expectation, or the expo-
nentially weighted least-squares (LS) error J
LS
(n) =

n
k=0
λ
n−k
e
2
e

(k).
23.2.1 The LMS and LS Equation Error Algorithms
The equation error e
e
(n) of (23.17) is the difference between d(n) and a prediction of d(n) given by
U
T
e
(n)W(n). Noting that U
T
e
(n) does not depend on W(n), we see that equation error adaptive IIR
ﬁltering is a type of linear prediction, and in particular the form of the prediction is identical to that
arising in adaptive FIR ﬁltering. One would suspect that many adaptive FIR ﬁlter algorithms would
then apply directly to adaptive IIR ﬁlters with an equation error criterion, and this is in fact the case.
Two adaptive algorithms applicable to equation error adaptive IIR ﬁltering are the LMS algorithm
given by
W(n + 1) = W(n) + µ(n)U
e
(n)e
e
(n) ,
(23.18)
and the recursive least-squares (RLS) algorithm given by
W(n + 1) = W(n) + P (n)U
e
(n)e
e
(n) ,
(23.19)

P (n) =
1
λ

P(n− 1) −
P(n− 1)U
e
(n)U
T
e
(n)P (n − 1)
λ + U
T
e
(n)P (n − 1)U
e
(n)

,
(23.20)
c

1999 by CRC Press LLC
where the above expression for P (n) is a recursive implementation of
P (n) =

n

k=0
λ

n−k
U
e
(k)U
T
e
(k)

−1
.
(23.21)
Some typical choices for µ(n) in (23.18)areµ(n) ≡ µ
0
, a constant, or µ(n) =¯µ/( +U
T
e
(n)U
e
(n)),
a normalized step size. For convergence of the gradient algorithm in (23.18), µ
0
is chosen in the
range 0 <µ
0
< 1/((M + 1)σ
2
x
+ Nσ
2
d

),whereσ
2
x
= E{x
2
(n)} and σ
2
d
= E{d
2
(n)}. Typically,
values of µ
0
in the range 0 <µ
0
< 0.1/((M + 1)σ
2
x
+ Nσ
2
d
) are chosen. With the normalized step
size, we require 0 < ¯µ<2 and >0 for stability, with typical choices of ¯µ = 0.1 and  = 0.001.
In 23.20, we require that λ satisfy 0 <λ≤ 1, with λ typically close to or equal to one, and we
initialize P(0) = γI with γ a large, positive number. These results are analogous to the FIR ﬁlter
cases considered in the earlier sections of this chapter.
These algorithms possess nice convergence properties, as we now discuss.
Property 1: Given that x is PEof order N + M + 1, under (23.18) and under (23.19) and (23.20),
with algorithm parameters chosen to satisfy the conditions noted above, then E{W (n)} convergestoa
value W

∞
minimizing J
MSE
(n) and J
LS
(n), respectively, as n →∞.
This property is desirable in that global convergence to parameter values optimal for the equation
error cost function is guaranteed, just as with adaptive FIR ﬁlters. The convergence result holds
whether the ﬁlter is operating in the sufﬁcient order case or the undermodeled case. This is an
important advantage of the equation error approach over other approaches. The reader is referred
to Chapters 19, 20, and 21 for further details on the convergence behaviors of these algorithms and
their variations. As in the FIR case, the eigenvalues of the matrix R = E{U
e
(n)U
T
e
(n)} determine
the rates of convergence for the LMS algorithm. A large eigenvalue disparity in R engenders slow
convergence in the LMS algorithm and ill-conditioning, with the attendant numerical instabilities,
in the RLS algorithm. For adaptive IIR ﬁlters, compared to the FIR case, the presence of d(n) in
U
e
(n) tends to increase the eigenvalue disparity, so that slower convergence is typically observed for
these algorithms.
Of importance is the value of the convergence points for the LMS and RLS algorithms with respect
to the modeling assumptions of the system identiﬁcation conﬁguration of Fig. 23.1. For simplicity,
let us ﬁrst assume that the adaptive ﬁlter is capable of modeling the unknown system exactly; that is,
H
u
(q

−1
) = 0. One may readily show that the parameter vector W that minimizes the mean-square
equation error (or equivalently the asymptotic least square equation error, given ergodic stationary
signals) is
W = E

U
e
(n)U
T
e
(n)

−1
E{U
e
(n)d(n)}
(23.22)
=

E{U
m
(n)U
T
m
(n)}+E

V(n)V
T
(n)


−1
(
E{U
m
(n)y
m
(n)}+E{V(n)v(n)}
)
.
(23.23)
Clearly, if v(n) ≡ 0, the W so obtained must equal W
opt
, so that we have
W
opt
= E

U
m
(n)U
T
m
(n)

−1
E{U
m
(n)y
m

(n)} .
(23.24)
By comparing (23.23) and (23.24), we can easily see that when v(n) ≡ 0, W = W
opt
. That is, the
parameter estimates provided by (23.18) through (23.20)are,ingeneral,biased from the desired
values, even when the noise term v(n) is uncorrelated.
What effect on adaptive ﬁlter performance does this bias impose? Since the parameters that
minimize the mean-square equation error are not the same as W
opt
, the values that minimize the
c

1999 by CRC Press LLC
mean-square output error, the adaptive ﬁlter performance will not be optimal. Situations can arise
in which this bias is severe, with correspondingly signiﬁcant degradation of performance.
Furthermore, a critical issue with regard to the parameter bias is the input-output stability of the
resulting IIR ﬁlter. Because the equation error is formed as A(q
−1
)d(n) − B(q
−1
)x(n), a difference
of two FIR ﬁltered signals, there are no built in constraints to keep the roots of A(q
−1
) within the
unit circle in the complex plane. Clearly, if an unstable polynomial results from the adaptation, then
the ﬁlter output y(n) can grow unboundedly in operational mode, so that the adaptive ﬁlter fails. An
example of such a situation is given in [25]. An important feature of this example is that the adaptive
ﬁlter is capable of precisely modeling the unknown system, and that interactions of the noise process
within the algorithm are all that is needed to destabilize the resulting model.

Nonetheless, under certain operating conditions, thiskind of instability can be shown not to occur,
as described in the following.
Property 2: [18] Consider the adaptive ﬁlter depicted in Fig. 23.1,wherey(n) isgivenby(23.2). If
x(n) is an autoregressive process of order no more than N, and v(n) is independent of x(n) and of ﬁnite
variance, then the adaptive ﬁlter parameters minimizing the mean-square equation error E{e
2
e
(n)} are
such that A(q
−1
) is stable.
For instance, if x(n) is an uncorrelated signal, then the convergence point of the equation error
algorithms corresponds to a stable ﬁlter.
To summarize, for LMS and RLS adaptation in an equation error setting, we have guaranteed
global convergence, but bias in the presence of additive noise even in the exact modeling case, and
an estimated model guaranteed to be stable only under a limited set of conditions.
23.2.2 Instrumental Variable Algorithms
A number of different approaches to adaptive IIR ﬁltering have been proposed with the intention of
mitigating the undesirable biased properties of the LMS- and RLS-based equation error adaptive IIR
ﬁlters. One such approach, still within the equation error context, is the instrumental variables (IV)
method. Observe that the bias problem illustrated above stems from the presence of v(n) in both
U
e
(n) and in e
e
(n) in the update terms in (23.18) and(23.19), so that second orderterms in v(n) then
appear in (23.23). This simultaneous presence creates, in expectation, a nonzero, noise-dependent
driving term to the adaptation. The IV algorithm approach addresses this by replacing U
e
(n) in these

algorithms with a vector U
iv
(n) of instrumental variables that are independent of v(n).IfU
iv
(n)
remains correlated with U
m
(n), the noiseless regressor, convergence to unbiased ﬁlter parameters is
possible.
The IV algorithm is given by
W(n + 1) = W(n) + µ(n)P
iv
(n)U
iv
(n)e
e
(n) .
(23.25)
P
iv
(n) =
1
λ(n)

P
iv
(n − 1) −
P
iv
(n − 1)U

iv
(n)U
T
e
(n)P
iv
(n − 1)
(λ(n)/µ(n)) + U
T
e
(n)P
iv
(n − 1)U
iv
(n)

.
(23.26)
with λ(n) = 1 − µ(n). Common choices for λ(n) aretosetλ(n) ≡ λ
0
, a ﬁxed constant in the range
0 <λ<1 and usually chosen in the range between 0.9 and 0.99, or to choose µ(n) = 1/n and
λ(n) = 1 − µ(n). As with RLS methods, P(0) = γI with γ a large, positive number. The vector
U
iv
(n) is typically chosen as
U
iv
(n) =
[

x(n) ···x(n − M) − z(n − 1) ···−z(n − N)
]
T
(23.27)
with either
z(n) =−x(n − M) or z(n) =
¯
B(q
−1
)
¯
A(q
−1
)
x(n) .
(23.28)
c

1999 by CRC Press LLC

Tài liệu Digital Signal Processing Handbook P23 pdf

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về