Tải bản đầy đủ (.pdf) (48 trang)

Real-Time Digital Signal Processing - Chapter 8: Adaptive Filtering

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (424.19 KB, 48 trang )

8
Adaptive Filtering
As discussed in previous chapters, filtering refers to the linear process designed to alter
the spectral content of an input signal in a specified manner. In Chapters 5 and 6, we
introduced techniques for designing and implementing FIR and IIR filters for given
specifications. Conventional FIR and IIR filters are time-invariant. Theyperform linear
operations on an input signal to generate an output signal based on the fixed coeffi-
cients. Adaptive filters are time varying, filter characteristics such as bandwidth and
frequencyresponse change with time. Thus the filter coefficients cannot be determined
when the filter is implemented. The coefficients of the adaptive filter are adjusted
automaticallybyan adaptive algorithm based on incoming signals. This has the import-
ant effect of enabling adaptive filters to be applied in areas where the exact filtering
operation required is unknown or is non-stationary.
In Section 8.1, we will review the concepts of random processes that are useful in the
development and analysis of various adaptive algorithms. The most popular least-mean-
square (LMS) algorithm will be introduced in Section 8.2. Its important properties will be
analyzed in Section 8.3. Two widely used modified adaptive algorithms, the normalized
and leakyLMS algorithms, will be introduced in Section 8.4. In this chapter, we introduce
and analyze the LMS algorithm following the derivation and analysis given in [8]. In
Section 8.5, we will brieflyintroduce some important applications of adaptive filtering.
The implementation considerations will be discussed in Section 8.6, and the DSP imple-
mentations using the TMS320C55x will be presented in Section 8.7.
8.1 Introduction to Random Processes
A signal is called a deterministic signal if it can be described preciselyand be reproduced
exactlyand repeatedly. However, the signals encountered in practice are not necessarily
of this type. A signal that is generated in a random fashion and cannot be described by
mathematical expressions or rules is called a random (or stochastic) signal. The signals
in the real world are often random in nature. Some common examples of random
signals are speech, music, and noises. These signals cannot be reproduced and need to
be modeled and analyzed using statistical techniques. We have briefly introduced
probabilityand random variables in Section 3.3. In this section, we will review the


important properties of the random processes and introduce fundamental techniques
for processing and analyzing them.
Real-Time Digital Signal Processing. Sen M Kuo, Bob H Lee
Copyright # 2001 John Wiley& Sons Ltd
ISBNs: 0-470-84137-0 (Hardback); 0-470-84534-1 (Electronic)
A random process maybe defined as a set of random variables. We associate a time
function xnxn, A with everypossible outcome A of an experiment. Each time
function is called a realization of the random process or a random signal. The ensemble
of all these time functions (called sample functions) constitutes the random process xn.
If we sample this process at some particular time n
0
, we obtain a random variable. Thus
a random process is a familyof random variables.
We mayconsider the statistics of a random process in two ways. If we fix the time n at
n
0
and consider the random variable xn
0
, we obtain statistics over the ensemble. For
example, Exn
0
 is the ensemble average, where EÁ is the expectation operation
introduced in Chapter 3. If we fix A and consider a particular sample function, we
have a time function and the statistics we obtain are temporal. For example, Exn, A
i

is the time average. If the time average is equal to the ensemble average, we saythat the
process is ergodic. The propertyof ergodicityis important because in practice we often
have access to onlyone sample function. Since we generallywork onlywith temporal
statistics, it is important to be sure that the temporal statistics we obtain are the true

representation of the process as a whole.
8.1.1 Correlation Functions
For manyapplications, one signal is often used to compare with another in order to
determine the similaritybetween the pair, and to determine additional information
based on the similarity. Autocorrelation is used to quantify the similarity between two
segments of the same signal. The autocorrelation function of the random process x(n)is
defined as
r
xx
n, kExnxk: 8:1:1
This function specifies the statistical relation of two samples at different time index n
and k, and gives the degree of dependence between two random variables of n À k
units apart. For example, consider a digital white noise x(n) as uncorrelated random
variables with zero-mean and variance s
2
x
. The autocorrelation function is
r
xx
n, kExnxk  ExnExk 
0, n T k
s
2
x
, n  k.
&
8:1:2
If we subtract the means in (8.1.1) before taking the expected value, we have the
autocovariance function
g

xx
n, kEfxnÀm
x
nxkÀm
x
kg  r
xx
n, kÀm
x
nm
x
k: 8:1:3
The objective in computing the correlation between two different random signals is
to measure the degree in which the two signals are similar. The crosscorrelation
and crosscovariance functions between two random processes x(n) and y(n) are defined
as
r
xy
n, kExnyk 8:1:4
352
ADAPTIVE FILTERING
and
g
xy
n, kEfxnÀm
x
nykÀm
y
kg  r
xy

n, kÀm
x
nm
y
k: 8:1:5
Correlation is a veryuseful DSP tool for detecting signals that are corrupted
byadditive random noise, measuring the time delaybetween two signals, determining
the impulse response of a system (such as obtain the room impulse response used in
Section 4.5.2), and manyothers. Signal correlation is often used in radar, sonar, digital
communications, and other engineering areas. For example, in CDMA digital commu-
nications, data symbols are represented with a set of unique key sequences. If one of
these sequences is transmitted, the receiver compares the received signal with every
possible sequence from the set to determine which sequence has been received. In radar
and sonar applications, the received signal reflected from the target is the delayed
version of the transmitted signal. Bymeasuring the round-trip delay, one can determine
the location of the target.
Both correlation functions and covariance functions are extensivelyused in analyzing
random processes. In general, the statistical properties of a random signal such as the
mean, variance, and autocorrelation and autocovariance functions are time-varying
functions. A random process is said to be stationaryif its statistics do not change
with time. The most useful and relaxed form of stationaryis the wide-sense stationary
(WSS) process. A random process is called WSS if the following two conditions are
satisfied:
1. The mean of the process is independent of time. That is,
Exn  m
x
, 8:1:6
1. where m
x
is a constant.

2. The autocorrelation function depends onlyon the time difference. That is,
r
xx
kExn  kxn: 8:1:7
Equation (8.1.7) indicates that the autocorrelation function of a WSS process is inde-
pendent of the time shift and r
xx
k denotes the autocorrelation function of a time lag of
k samples.
The autocorrelation function r
xx
k of a WSS process has the following important
properties:
1. The autocorrelation function is an even function of the time lag k. That is,
r
xx
Àkr
xx
k: 8:1:8
2. The autocorrelation function is bounded bythe mean squared value of the process
expressed as
jr
xx
kj r
xx
0, 8:1:9
INTRODUCTION TO RANDOM PROCESSES
353
2. where r
xx

0Ex
2
n is equal to the mean-squared value, or the power in the
random process.
In addition, if x(n) is a zero-mean random process, we have
r
xx
0Ex
2
n  s
2
x
: 8:1:10
Thus the autocorrelation function of a signal has its maximum value at zero lag.
If x(n) has a periodic component, then r
xx
k will contain the same periodic com-
ponent.
Example 8.1: Given the sequence
xna
n
un,0< a < 1,
the autocorrelation function can be computed as
r
xx
k

I
nÀI
xn  kxn


I
n0
a
nk
a
n
 a
k

I
n0
a
2

n
:
Since a < 0, we obtain
r
xx
k
a
k
1 À a
2
:
Example 8.2: Consider the sinusoidal signal expressed as
xncos!n,
find the mean and the autocorrelation function of x(n).
(a) m

x
 Ecos!n  0.
(b) r
xx
kExn  kxn  Ecos!n  !k cos!n

1
2
Ecos2!n  !k 
1
2
cos!k
1
2
cos!k:
The crosscorrelation function of two WSS processes x(n) and y(n) is defined as
r
xy
kExn  kyn: 8:1:11
This crosscorrelation function has the property
r
xy
kr
yx
Àk: 8:1:12
Therefore r
yx
k is simplythe folded version of r
xy
k. Hence, r

yx
k provides exactly
the same information as r
xy
k, with respect to the similarityof x(n)toy(n).
354
ADAPTIVE FILTERING
In practice, we onlyhave one sample sequence fxng available for analysis. As
discussed earlier, a stationaryrandom process x(n) is ergodic if all its statistics can be
determined from a single realization of the process, provided that the realization is long
enough. Therefore time averages are equal to ensemble averages when the record length
is infinite. Since we do not have data of infinite length, the averages we compute differ
from the true values. In dealing with finite-duration sequence, the sample mean of x(n)
is defined as

m
x

1
N

NÀ1
n0
xn, 8:1:13
where N is the number of samples in the short-time analysis interval. The sample
variance is defined as

s
2
x


1
N

NÀ1
n0
Â
xnÀm
x
Ã
2
: 8:1:14
The sample autocorrelation function is defined as

r
xx
k
1
N À k

NÀkÀ1
n0
xn  kxn, k  0, 1, ..., N À 1, 8:1:15
where N is the length of the sequence x(n). Note that for a given sequence of length
N, Equation (8.1.15) generates values for up to N different lags. In practice, we can
onlyexpect good results for lags of no more than 5±10 percent of the length of the
signals.
The autocorrelation and crosscorrelation functions introduced in this section can be
computed using the MATLAB function xcorr in the Signal Processing Toolbox. The
crosscorrelation function r

xy
k of the two sequences x(n) and y(n) can be computed
using the statement
c = xcorr(x, y);
where x and y are length N vectors and the crosscorrelation vector c has length 2N À 1.
The autocorrelation function r
xx
k of the sequence x(n) can be computed using the
statement
c = xcorr(x);
In addition, the crosscovariance function can be estimated using
v = xcov(x, y);
and the autocovariance function can be computed with
v = xcov(x);
See Signal Processing Toolbox User's Guide for details.
INTRODUCTION TO RANDOM PROCESSES
355
8.1.2 Frequency-Domain Representations
In the studyof deterministic digital signals, we use the discrete-time Fourier transform
(DTFT) or the z-transform to find the frequencycontents of the signals. In this section,
we will use the same transform for random signals. Consider an ergodic random process
x(n). This sequence cannot be reallyrepresentative of the random process because the
sequence x(n) is onlyone of infinitelypossible sequences. However, if we consider the
autocorrelation function r
xx
k, the result is always the same no matter which sample
sequence is used to compute r
xx
k. Therefore we should applythe transform to r
xx

k
rather than x(n).
The correlation functions represent the time-domain description of the statistics of a
random process. The frequency-domain statistics are represented by the power density
spectrum (PDS) or the autopower spectrum. The PDS is the DTFT (or the z-transform)
of the autocorrelation function r
xx
k of a WSS signal x(n) defined as
P
xx
!

I
kÀI
r
xx
ke
Àj!k
, 8:1:16
or
P
xx
z

I
kÀI
r
xx
kz
Àk

: 8:1:17
A sufficient condition for the existence of the PDS is that r
xx
k is summable. The PDS
defined in (7.3.16) is equal to the DFT of the autocorrelation function. The windowing
technique introduced in Section 7.3.3 can be used to improve the convergence properties
of (7.3.16) and (7.3.17) if the DFT is used in computing the PDS of random signals.
Equation (8.1.16) implies that the autocorrelation function is the inverse DTFT of the
PDS, which is expressed as
r
xx
k
1
2p

p
Àp
P
xx
!e
j!k
d!: 8:1:18
From (8.1.10), we have the mean-square value
Ex
2
n  r
xx
0
1
2p


p
Àp
P
xx
!d!: 8:1:19
Thus r
xx
0 represents the average power in the random signal x(n). The PDS is a
periodic function of the frequency !, with the period equal to 2p. We can show (in the
exercise problems) that P
xx
! of a WSS signal is a real-valued function of !.Ifx(n)isa
real-valued signal, P
xx
! is an even function of !. That is,
P
xx
!P
xx
À!8:1:20
or
P
xx
zP
xx
z
À1
: 8:1:21
356

ADAPTIVE FILTERING
The DTFT of the crosscorrelation function P
xy
! of two WSS signals x(n) and y(n)is
given by
P
xy
!

I
kÀI
r
xy
ke
Àj!k
, 8:1:22
or
P
xy
z

I
kÀI
r
xy
kz
Àk
: 8:1:23
This function is called the cross-power spectrum.
Example 8.3: The autocorrelation function of a WSS white random process can be

defined as
r
xx
ks
2
x
dkm
2
x
: 8:1:24
The corresponding PDS is given by
P
xx
!s
2
x
 2pm
2
x
d!, j!j p: 8:1:25
An important white random signal is called white noise, which has zero mean.
Thus its autocorrelation function is expressed as
r
xx
ks
2
x
dk, 8:1:26
and the power spectrum is given by
P

xx
!s
2
x
, j!j < p, 8:1:27
which is of constant value for all frequencies !.
Consider a linear and time-invariant digital filter defined bythe impulse response
h(n), or the transfer function H(z). The input of the filter is a WSS random signal x(n)
with the PDS P
xx
!. As illustrated in Figure 8.1, the PDS of the filter output y(n) can
be expressed as
P
yy
!


H!


2
P
xx
!8:1:28
or
P
yy
z



Hz


2
P
xx
z, 8:1:29
INTRODUCTION TO RANDOM PROCESSES
357
h(n)
H(w)
x(n)
P
xx
(w) P
yy
(w)
y(n)
Figure 8.1 Linear filtering of random processes
where H! is the frequencyresponse of the filter. Therefore the value of the output
PDS at frequency ! depends on the squared magnitude response of the filter and the
input PDS at the same frequency.
Another important relationships between x(n) and y(n) are
m
y
 E

I
lÀI
hlxn À l

45


I
lÀI
hlExn À l  m
x

I
lÀI
hl, 8:1:30
and
r
yx
kEyn  kxn  E

I
lÀI
hlxn  k À lxn
45


I
lÀI
hlr
xx
k À lhkÃr
xx
k:
8:1:31

Taking the z-transform of both sides, we obtain
P
yx
zHzP
xx
z: 8:1:32
Similarly, the relationships between the input and the output signals are
r
xy
k

I
lÀI
hlr
xx
k  lhkÃr
xx
Àk8:1:33
and
P
yx
zH
Ã
zP
xx
z: 8:1:34
If the input signal x(n) is a zero-mean white noise with the autocorrelation function
defined in (8.1.26), Equation (8.1.31) becomes
r
yx

k

I
lÀI
hls
2
x
dk À ls
2
x
hk: 8:1:35
This equation shows that bycomputing the crosscorrelation function r
yx
k, the impulse
response h(n) of a filter (or system) can be obtained. This fact can be used to estimate an
unknown system such as the room impulse response used in Chapter 4.
358
ADAPTIVE FILTERING
Example 8.4: Let the system shown in Figure 8.1 be a second-order FIR filter. The
input x(n) is a zero-mean white noise given byExample 8.3, and the I/O equation
is expressed as
ynxn3xn À 12xn À 2:
Find the mean m
y
and the autocorrelation function r
yy
k of the output y(n).
(a) m
y
 E yn  Exn  3Exn À 1  2Exn À 2  0.

(b) r
yy
kE yn  kyn
 14r
xx
k9r
xx
k À 19r
xx
k  12r
xx
k À 22r
xx
k  2

14s
2
x
if k  0
9s
2
x
if k Æ1
2s
2
x
if k Æ2
0 otherwise.
V
b

b
b
`
b
b
b
X
8.2 Adaptive Filters
Manypractical applications involve the reduction of noise and distortion for extraction
of information from the received signal. The signal degradation in some physical
systems is time varying, unknown, or possibly both. Adaptive filters provide a useful
approach for these applications. Adaptive filters modifytheir characteristics to achieve
certain objectives and usuallyaccomplish the modification (adaptation) automatically.
For example, consider a high-speed modem for transmitting and receiving data over
telephone channels. It employs a filter called a channel equalizer to compensate for
the channel distortion. Since the dial-up communication channels have different char-
acteristics on each connection and are time varying, the channel equalizers must be
adaptive.
Adaptive filters have received considerable attention from manyresearchers over the
past 30 years. Many adaptive filter structures and adaptation algorithms have been
developed for different applications. This chapter presents the most widelyused adap-
tive filter based on the FIR filter with the LMS algorithm. Adaptive filters in this class
are relativelysimple to design and implement. Theyare well understood with regard to
convergence speed, steady-state performance, and finite-precision effects.
8.2.1 Introduction to Adaptive Filtering
An adaptive filter consists of two distinct parts ± a digital filter to perform the desired
signal processing, and an adaptive algorithm to adjust the coefficients (or weights) of
that filter. A general form of adaptive filter is illustrated in Figure 8.2, where d(n)isa
desired signal (or primaryinput signal), y(n) is the output of a digital filter driven bya
reference input signal x(n), and an error signal e(n) is the difference between d(n) and

y(n). The function of the adaptive algorithm is to adjust the digital filter coefficients to
ADAPTIVE FILTERS
359
x(n) y(n)
d(n)
e(n)
+

Adaptive
algorithm
Digital
filter
Figure 8.2 Block diagram of adaptive filter
y(n)
z
−1
z
−1
w
0
(n)
x(n)
x(n−1)
x(n−L+1)
w
L−1
(n)w
1
(n)
Figure 8.3 Block diagram of FIR filter for adaptive filtering

minimize the mean-square value of e(n). Therefore the filter weights are updated so that
the error is progressivelyminimized on a sample-by-sample basis.
In general, there are two types of digital filters that can be used for adaptive filtering:
FIR and IIR filters. The choice of an FIR or an IIR filter is determined bypractical
considerations. The FIR filter is always stable and can provide a linear phase response.
On the other hand, the IIR filter involves both zeros and poles. Unless theyare properly
controlled, the poles in the filter maymove outside the unit circle and make the filter
unstable. Because the filter is required to be adaptive, the stabilityproblems are much
difficult to handle. Thus the FIR adaptive filter is widelyused for real-time applications.
The discussions in the following sections will be restricted to the class of adaptive FIR
filters.
The most widelyused adaptive FIR filter is depicted in Figure 8.3. Given a set
of L coefficients, w
l
n, l  0, 1, ..., L À 1, and a data sequence, fxn xn À 1
...xn À L  1g, the filter output signal is computed as
yn

LÀ1
l0
w
l
nxn À l, 8:2:1
where the filter coefficients w
l
n are time varying and updated by the adaptive algo-
rithms that will be discussed next.
We define the input vector at time n as
xnxn xn À 1 ...xn À L  1
T

8:2:2
360
ADAPTIVE FILTERING
and the weight vector at time n as
wnw
0
n w
1
n ...w
LÀ1
n
T
: 8:2:3
Then the output signal y(n) in (8.2.1) can be expressed using the vector operation
ynw
T
nxnx
T
nwn: 8:2:4
The filter output y(n) is compared with the desired response d(n), which results in the
error signal
endnÀyndnÀw
T
nxn: 8:2:5
In the following sections, we assume that d(n)andx(n) are stationary, and our objective is
to determine the weight vector so that the performance (or cost) function is minimized.
8.2.2 Performance Function
The general block diagram of the adaptive filter shown in Figure 8.2 updates the
coefficients of the digital filter to optimize some predetermined performance criterion.
The most commonlyused performance measurement is based on the mean-square error

(MSE) defined as
xnEe
2
n: 8:2:6
For an adaptive FIR filter, xn will depend on the L filter weights w
0
n, w
1
n,
..., w
LÀ1
n. The MSE function can be determined bysubstituting (8.2.5) into (8.2.6),
expressed as
xnEd
2
n À 2p
T
wnw
T
nRwn, 8:2:7
where p is the crosscorrelation vector defined as
p  Ednxn  r
dx
0 r
dx
1 ...r
dx
L À 1
T
, 8:2:8

and
r
dx
kEdnxn À k 8:2:9
is the crosscorrelation function between d(n) and x(n). In (8.2.7), R is the input auto-
correlation matrix defined as
R  Exnx
T
n

r
xx
0 r
xx
1 ÁÁÁ r
xx
L À 1
r
xx
1 r
xx
0 ÁÁÁ r
xx
L À 2
.
.
.
ÁÁÁ
.
.

.
.
.
.
r
xx
L À 1 r
xx
L À 2 ÁÁÁ r
xx
0
P
T
T
T
R
Q
U
U
U
S
, 8:2:10
ADAPTIVE FILTERS
361
where
r
xx
kExnxn À k 8:2:11
is the autocorrelation function of x(n).
Example 8.5: Given an optimum filter illustrated in the following figure:

x(n)
x(n−1)
w
1
z
−1
++
+

d(n)
e(n)
If Ex
2
n  1, Exnxn À 1  0:5, Ed
2
n  4, Ednxn  À1, and
Ednxn À 1  1. Find x.
From (8.2.10), R 
10:5
0:51
!
, and from (8.2.8), we have P 
À1
1
!
.
Therefore from (8.2.7), we obtain
x  Ed
2
n À 2p

T
w  w
T
Rw
 4 À 2À11
1
w
1
!
1 w
1

10:5
0:51
!
1
w
1
!
 w
2
1
À w
1
 7:
The optimum filter w
o
minimizes the MSE cost function xn. Vector differentiation
of (8.2.7) gives w
o

as the solution to
Rw
o
 p: 8:2:12
This system equation defines the optimum filter coefficients in terms of two correlation
functions ± the autocorrelation function of the filter input and the crosscorrelation
function between the filter input and the desired response. Equation (8.2.12) provides a
solution to the adaptive filtering problem in principle. However, in manyapplications,
the signal maybe non-stationary. This linear algebraic solution, w
o
 R
À1
p, requires
continuous estimation of R and p, a considerable amount of computations. In addition,
when the dimension of the autocorrelation matrix is large, the calculation of R
À1
may
present a significant computational burden. Therefore a more useful algorithm is
obtained bydeveloping a recursive method for computing w
o
, which will be discussed
in the next section.
To obtain the minimum MSE, we substitute the optimum weight vector w
o
 R
À1
p
for w(n) in (8.2.7), resulting in
x
min

 Ed
2
n À p
T
w
o
: 8:2:13
362
ADAPTIVE FILTERING
Since R is positive semidefinite, the quadratic form on the right-hand side of (8.2.7)
indicates that anydeparture of the weight vector w(n) from the optimum w
o
would
increase the error above its minimum value. In other words, the error surface is concave
and possesses a unique minimum. This feature is veryuseful when we utilize search
techniques in seeking the optimum weight vector. In such cases, our objective is to
develop an algorithm that can automaticallysearch the error surface to find the
optimum weights that minimize xn using the input signal x(n) and the error signal e(n).
Example 8.6: Consider a second-order FIR filter with two coefficients w
0
and
w
1
, the desired signal dn

2
p
sinn!
0
, n ! 0, and the reference signal

xndn À 1. Find w
o
and x
min
Similar to Example 8.2, we can obtain r
xx
0Ex
2
n  Ed
2
n  1,
r
xx
1cos!
0
, r
xx
2cos2!
0
, r
dx
0r
xx
1,andr
dx
1r
xx
2.From
(8.2.12), we have
w

o
 R
À1
p 
1 cos!
0

cos!
0
 1
!
À1
cos!
o

cos2!
0

!

2 cos!
0

À1
!
:
From (8.2.13), we obtain
x
min
 1 Àcos!

0
 cos2!
0

2 cos!
0

À1
!
 0:
Equation (8.2.7) is the general expression for the performance function of an adaptive
FIR filter with given weights. That is, the MSE is a function of the filter coefficient
vector w(n). It is important to note that the MSE is a quadratic function because the
weights appear onlyto the first and second degrees in (8.2.7). For each coefficient vector
w(n), there is a corresponding (scalar) value of MSE. Therefore the MSE values
associated with w(n) form an L  1-dimensional space, which is commonlycalled the
MSE surface, or the performance surface.
For L  2, this corresponds to an error surface in a three-dimensional space. The
height of xn corresponds to the power of the error signal e(n) that results from filtering
the signal x(n) with the coefficients w(n). If the filter coefficients change, the power in the
error signal will also change. This is indicated bythe changing height on the surface
above w
0
Àw
1
the plane as the component values of w(n) are varied. Since the error
surface is quadratic, a unique filter setting wnw
o
will produce the minimum MSE,
x

min
. In this two-weight case, the error surface is an elliptic paraboloid. If we cut the
paraboloid with planes parallel to the w
0
Àw
1
plane, we obtain concentric ellipses of
constant mean-square error. These ellipses are called the error contours of the error
surface.
Example 8.7: Consider a second-order FIR filter with two coefficients w
0
and w
1
.
The reference signal x(n) is a zero-mean white noise with unit variance. The
desired signal is given as
dnb
0
xnb
1
xn À 1:
ADAPTIVE FILTERS
363
Plot the error surface and error contours.
From Equation (8.2.10), we obtain R 
r
xx
0 r
xx
1

r
xx
1 r
xx
0
!

10
01
!
.From
(8.2.8), we have p 
r
dx
0
r
dx
1
!

b
0
b
1
!
. From (8.2.7), we get
x  Ed
2
n À 2p
T

w  w
T
Rw b
2
0
 b
2
1
À2b
0
w
0
À 2b
1
w
1
 w
2
0
 w
2
1
Let b
0
 0:3andb
1
 0:5, we have
x  0:34 À 0:6w
0
À w

1
 w
2
0
 w
2
1
:
The MATLAB script (exam8_7a.m in the software package) is used to plot the error
surface shown in Figure 8.4(a) and the script exam8_7b.m is used to plot the error
contours shown in Figure 8.4(b).
1200
1000
800
600
MSE
400
200
20
15
10
5
0
−5
−10
−15
−20
−30 −20 −10
0
w0

Error Contour
Error Surface
w1
10 20 30
0
40
20
0
−20
−40
40
20
0
w0w1
−20
−40
Figure 8.4 Performance surface and error contours, L  2
364
ADAPTIVE FILTERING
One of the most important properties of the MSE surface is that it has onlyone global
minimum point. At that minimum point, the tangents to the surface must be 0. Minim-
izing the MSE is the objective of manycurrent adaptive methods such as the LMS
algorithm.
8.2.3 Method of Steepest Descent
As shown in Figure 8.4, the MSE of (8.2.7) is a quadratic function of the weights that
can be pictured as a positive-concave hyperparabolic surface. Adjusting the weights to
minimize the error involves descending along this surface until reaching the `bottom of
the bowl.' Various gradient-based algorithms are available. These algorithms are based
on making local estimates of the gradient and moving downward toward the bottom of
the bowl. The selection of an algorithm is usuallydecided bythe speed of convergence,

steady-state performance, and the computational complexity.
The steepest-descent method reaches the minimum byfollowing the direction in
which the performance surface has the greatest rate of decrease. Specifically, an algo-
rithm whose path follows the negative gradient of the performance surface. The
steepest-descent method is an iterative (recursive) technique that starts from some initial
(arbitrary) weight vector. It improves with the increased number of iterations. Geomet-
rically, it is easy to see that with successive corrections of the weight vector in the
direction of the steepest descent on the concave performance surface, we should arrive
at its minimum, x
min
, at which point the weight vector components take on their
optimum values. Let x0 represent the value of the MSE at time n  0 with an arbitrary
choice of the weight vector w(0). The steepest-descent technique enables us to descend
to the bottom of the bowl, w
o
, in a systematic way. The idea is to move on the error
surface in the direction of the tangent at that point. The weights of the filter are updated
at each iteration in the direction of the negative gradient of the error surface.
The mathematical development of the method of steepest descent is easilyseen from
the viewpoint of a geometric approach using the MSE surface. Each selection of a filter
weight vector w(n) corresponds to onlyone point on the MSE surface, [wn, xn].
Suppose that an initial filter setting w(0) on the MSE surface, [w0, x0] is arbitrarily
chosen. A specific orientation to the surface is then described using the directional
derivatives of the surface at that point. These directional derivatives quantifythe rate of
change of the MSE surface with respect to the w(n) coordinate axes. The gradient of the
error surface rxn is defined as the vector of these directional derivatives.
The concept of steepest descent can be implemented in the following algorithm:
wn  1wnÀ
m
2

rxn8:2:14
where m is a convergence factor (or step size) that controls stabilityand the rate of
descent to the bottom of the bowl. The larger the value of m, the faster the speed of
descent. The vector rxn denotes the gradient of the error function with respect to
w(n), and the negative sign increments the adaptive weight vector in the negative
gradient direction. The successive corrections to the weight vector in the direction of
ADAPTIVE FILTERS
365
the steepest descent of the performance surface should eventuallylead to the minimum
mean-square error x
min
, at which point the weight vector reaches its optimum value w
o
.
When w(n) has converged to w
o
, that is, when it reaches the minimum point of the
performance surface, the gradient rxn0. At this time, the adaptation in (8.2.14) is
stopped and the weight vector stays at its optimum solution. The convergence can be
viewed as a ball placed on the `bowl-shaped' MSE surface at the point [w0, x0]. If the
ball was released, it would roll toward the minimum of the surface, and would initially
roll in a direction opposite to the direction of the gradient, which can be interpreted as
rolling towards the bottom of the bowl.
8.2.4 The LMS Algorithm
From (8.2.14), we see that the increment from w(n)town  1 is in the negative
gradient direction, so the weight tracking will closelyfollow the steepest descent path
on the performance surface. However, in manypractical applications the statistics of
d(n) and x(n) are unknown. Therefore the method of steepest descent cannot be used
directly, since it assumes exact knowledge of the gradient vector at each iteration.
Widrow [13] used the instantaneous squared error, e

2
n, to estimate the MSE. That is,
^
xne
2
n: 8:2:15
Therefore the gradient estimate used bythe LMS algorithm is
r
^
xn2
Â
ren
Ã
en: 8:2:16
Since endnÀw
T
nxx, renÀxn, the gradient estimate becomes
r
^
xnÀ2xnen: 8:2:17
Substituting this gradient estimate into the steepest-descent algorithm of (8.2.14), we have
wn  1wnmxnen: 8:2:18
This is the well-known LMS algorithm, or stochastic gradient algorithm. This algorithm
is simple and does not require squaring, averaging, or differentiating. The LMS algo-
rithm provides an alternative method for determining the optimum filter coefficients
without explicitlycomputing the matrix inversion suggested in (8.2.12).
Widrow's LMS algorithm is illustrated in Figure 8.5 and is summarized as follows:
1. Determine L, m, and w(0), where L is the order of the filter, m is the step size, and
w(0) is the initial weight vector at time n  0.
2. Compute the adaptive filter output

yn

LÀ1
l0
w
l
nxn À l: 8:2:19
366
ADAPTIVE FILTERING
x(n) y(n)
d(n)
e(n)
+

w(n)
LMS
Figure 8.5 Block diagram of an adaptive filter with the LMS algorithm
3. Compute the error signal
endnÀyn: 8:2:20
4. Update the adaptive weight vector from w(n)tow(n+1) byusing the LMS
algorithm
w
l
n  1w
l
nmxn À len, l  0, 1, ..., L À 1: 8:2:21
8.3 Performance Analysis
A detailed discussion of the performance of the LMS algorithm is available in many
textbooks. In this section, we present some important properties of the LMS algorithm
such as stability, convergence rate, and the excess mean-square error due to gradient

estimation error.
8.3.1 Stability Constraint
As shown in Figure 8.5, the LMS algorithm involves the presence of feedback. Thus
the algorithm is subject to the possibilityof becoming unstable. From (8.2.18), we
observe that the parameter m controls the size of the incremental correction applied
to the weight vector as we adapt from one iteration to the next. The mean weight
convergence of the LMS algorithm from initial condition w(0) to the optimum filter w
o
must satisfy
0 < m <
2
l
max
, 8:3:1
where l
max
is the largest eigenvalue of the autocorrelation matrix R defined in (8.2.10).
Applying the stability constraint on m given in (8.3.1) is difficult because of the compu-
tation of l
max
when L is large.
In practical applications, it is desirable to estimate l
max
using a simple method. From
(8.2.10), we have
PERFORMANCE ANALYSIS
367
trRLr
xx
0


LÀ1
l0
l
l
, 8:3:2
where tr[R] denotes the trace of matrix R. It follows that
l
max


LÀ1
l0
l
l
 Lr
xx
0LP
x
, 8:3:3
where
P
x
 r
xx
0E
Â
x
2
n

Ã
8:3:4
denotes the power of x(n). Therefore setting
0 < m <
2
LP
x
8:3:5
assures that (8.3.1) is satisfied.
Equation (8.3.5) provides some important information on how to select m, and they
are summarized as follows:
1. Since the upper bound on m is inverselyproportional to L, a small m is used for large-
order filters.
2. Since m is made inverselyproportional to the input signal power, weaker signals use
a larger m and stronger signals use a smaller m. One useful approach is to normalize
 with respect to the input signal power P
x
. The resulting algorithm is called the
normalized LMS algorithm, which will be discussed in Section 8.4.
8.3.2 Convergence Speed
In the previous section, we saw that w(n) converges to w
o
if the selection of m satisfies
(8.3.1). Convergence of the weight vector w(n) from w(0) to w
o
corresponds to the
convergence of the MSE from x0 to x
min
. Therefore convergence of the MSE toward
its minimum value is a commonlyused performance measurement in adaptive systems

because of its simplicity. During adaptation, the squared error e
2
n is non-stationaryas
the weight vector w(n) adapts toward w
o
. The corresponding MSE can thus be defined
onlybased on ensemble averages. A plot of the MSE versus time n is referred to as the
learning curve for a given adaptive algorithm. Since the MSE is the performance
criterion of LMS algorithms, the learning curve is a natural wayto describe the transient
behavior.
Each adaptive mode has its own time constant, which is determined bythe overall
adaptation constant m and the eigenvalue l
l
associated with that mode. Overall con-
vergence is clearlylimited bythe slowest mode. Thus the overall MSE time constant can
be approximated as
368
ADAPTIVE FILTERING
t
mse

1
ml
min
, 8:3:6
where l
min
is the minimum eigenvalue of the R matrix. Because t
mse
is inverselypropor-

tional to m, we have a large t
mse
when m is small (i.e., the speed of convergence is slow). If
we use a large value of m, the time constant is small, which implies faster convergence.
The maximum time constant t
mse
 1=ml
min
is a conservative estimate of filter per-
formance, since onlylarge eigenvalues will exert significant influence on the conver-
gence time. Since some of the projections maybe negligiblysmall, the adaptive filter
error convergence maybe controlled byfewer modes than the number of adaptive
filter weights. Consequently, the MSE often converges more rapidly than the upper
bound of (8.3.6) would suggest.
Because the upper bound of t
mse
is inverselyproportional to l
min
, a small l
min
can
result in a large time constant (i.e., a slow convergence rate). Unfortunately, if l
max
is
also verylarge, the selection of m will be limited by(8.3.1) such that onlya small m can
satisfythe stabilityconstraint. Therefore if l
max
is verylarge and l
min
is verysmall, from

(8.3.6), the time constant can be verylarge, resulting in veryslow convergence. As
previouslynoted, the fastest convergence of the dominant mode occurs for m  1=l
max
.
Substituting this smallest step size into (8.3.6) results in
t
mse

l
max
l
min
: 8:3:7
For stationaryinput and sufficientlysmall m, the speed of convergence of the algorithm
is dependent on the eigenvalue spread (the ratio of the maximum to minimum eigen-
values) of the matrix R.
As mentioned in the previous section, the eigenvalues l
max
and l
min
are verydifficult
to compute. However, there is an efficient wayto estimate the eigenvalue spread from
the spectral dynamic range. That is,
l
max
l
min

maxjX!j
2

minjX!j
2
, 8:3:8
where X! is DTFT of x(n) and the maximum and minimum are calculated over the
frequencyrange 0 ! p. From (8.3.7) and (8.3.8), input signals with a flat (white)
spectrum have the fastest convergence speed.
8.3.3 Excess Mean-Square Error
The steepest-descent algorithm in (8.2.14) requires knowledge of the gradient rxn,
which must be estimated at each iteration. The estimated gradient r
^
xn produces the
gradient estimation noise. After the algorithm converges, i.e., w(n) is close to w
o
, the
true gradient rxn%0. However, the gradient estimator r
^
xn T 0. As indicated by
the update of Equation (8.2.14), perturbing the gradient will cause the weight vector
wn  1 to move awayfrom the optimum solution w
o
. Thus the gradient estimation
PERFORMANCE ANALYSIS
369

×