Quantized Filter Analysis

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (490.71 KB, 27 trang )

CHAPTER 7
Quantized Filter Analysis
7.1 INTRODUCTION
The analysis and design of discrete-time systems, digital ﬁlters, and their realiza-
tions, computation of DFT-IDFT, and so on discussed in the previous chapters
of this book were carried out by using mostly the functions in the Signal Pro-
cessing Toolbox working in the MATLAB environment, and the computations
were carried out with double precision. This means that all the data representing
the values of the input signal, coefﬁcients of the ﬁlters, or the values of the unit
impulse response, and so forth were represented with 64 bits; therefore, these
numbers have a range approximately between 10
−308
and 10
308
and a precision
of ∼2
−52
= 2.22 × 10
−6
. Obviously this range is so large and the precision with
which the numbers are expressed is so small that the numbers can be assumed to
have almost “inﬁnite precision.” Once these digital ﬁlters and DFT-IDFT have
been obtained by the procedures described so far, they can be further analyzed
by mainframe computers, workstations, and PCs under “inﬁnite precision.” But
when the algorithms describing the digital ﬁlters and FFT computations have
to be implemented as hardware in the form of special-purpose microprocessors
or application-speciﬁc integrated circuits (ASICs) or the digital signal processor
(DSP) chip, many practical considerations and constraints come into play. The
registers used in these hardware systems, to store the numbers have ﬁnite length,
and the memory capacity required for processing the data is determined by the
number of bits—also called the wordlength —chosen for storing the data. More

memory means more power consumption and hence the need to minimize the
wordlength. In microprocessors and DSP chips and even in workstations and PCs,
we would like to use registers with as few bits as possible and yet obtain high
computational speed, low power, and low cost. But such portable devices such as
cell phones and personal digital assistants (PDAs) have a limited amount of mem-
ory, containing batteries with low voltage and short duration of power supply.
These constraints become more severe in other devices such as digital hearing
aids and biomedical probes embedded in capsules to be swallowed. So there is a
Introduction to Digital Signal Processing and Filter Design, by B. A. Shenoi
Copyright © 2006 John Wiley & Sons, Inc.
354
FILTER DESIGN –ANALYSIS TOOL
355
great demand for designing digital ﬁlters and systems in which they are embed-
ded, with the lowest possible number of bits to represent the data or to store the
data in their registers. When the ﬁlters are built with registers of ﬁnite length and
the analog-to-digital converters (ADCs) are designed to operate at increasingly
high sampling rates, thereby reducing the number of bits with which the samples
of the input signal are represented, the frequency response of the ﬁlters and the
results of DFT-IDFT computations via the FFT are expected to differ from those
designed with “inﬁnite precision.” This process of representing the data with a
ﬁnite number of bits is known as quantization, which occurs at several points
in the structure chosen to realize the ﬁlter or the steps in the FFT computation
of the DFT-IDFT. As pointed out in the previous chapter, a vast number of
structures are available to realize a given transfer function, when we assume inﬁ-
nite precision. But when we design the hardware with registers of ﬁnite length to
implement their corresponding difference equation, the effect of ﬁnite wordlength
is highly dependent on the structure. Therefore we ﬁnd it necessary to analyze
this effect for a large number of structures. This analysis is further compounded
by the fact that quantization can be carried out in several ways and the arithmetic

operations of addition and multiplication of numbers with ﬁnite precision yield
results that are inﬂuenced by the way that these numbers are quantized.
In this chapter, we discuss a new MATLAB toolbox called FDA Tool avail-
able
1
for analyzing and designing the ﬁlters with a ﬁnite number of bits for the
wordlength. The different form of representing binary numbers and the results of
adding and multiplying such numbers will be explained in a later section of this
chapter. The third factor that inﬂuences the deviation of ﬁlter performance from
the ideal case is the choice of FIR or IIR ﬁlter. The type of approximation chosen
for obtaining the desired frequency response is another factor that also inﬂuences
the effect of ﬁnite wordlength. We discuss the effects of all these factors in this
chapter, illustrating their inﬂuence by means of a design example.
7.2 FILTER DESIGN –ANALYSIS TOOL
An enormous amount of research has been carried out to address these problems,
but analyzing the effects of quantization on the performance of digital ﬁlters
and systems is not well illustrated by speciﬁc examples. Although there is no
analytical method available at present to design or analyze a ﬁlter with ﬁnite
precision, some useful insight can be obtained from the research work, which
serves as a guideline in making preliminary decisions on the choice of suitable
structures and quantization forms. Any student interested in this research work
should read the material on ﬁnite wordlength effects found in other textbooks
[1,2,4]. In this chapter, we discuss the software for ﬁlter design and analysis
that has been developed by The MathWorks to address the abovementioned
1
MATLAB and its Signal Processing Toolbox are found in computer systems of many schools and
universities but the FDA Tool may not be available in all of them.
356
QUANTIZED FILTER ANALYSIS
problem

2
. This FDA Tool ﬁnite design–analysis (FDA) tool, found in the Filter
Design Toolbox, works in conjunction with the Signal Processing (SP) Toolbox.
Unlike the SP Toolbox, the FDA Tool has been developed by making extensive use
of the object-oriented programming capability of MATLAB, and the syntax for the
functions available in the FDA Tool is different from the syntax for the functions
we ﬁnd in MATLAB and the SP Toolbox. When we log on to MATLAB and type
fdatool
, we get two screens on display. On one screen, we type the
fdatool
functions as command lines to design and analyze quantized ﬁlters, whereas the
other screen is a graphical user interface (GUI) to serve the same purpose. The
GUI window shown in Figure 7.1a displays a dialog box with an immense array
of design options as explained below.
First we design a ﬁlter with double precision on the GUI window using the
FDA Tool or on the command window using the Signal Processing Toolbox and
then import it into the GUI window. In the dialog box for the FDA Tool, we can
choose the following options under the
Filter Type
panel:
1. Lowpass
2. Highpass
3. Bandpass
4. Bandstop
5. Differentiator. By clicking the arrow on the tab for this feature, we get
the following additional options.
6. Hilbert transformer
7. Multiband
8. Arbitrary magnitude
9. Raised cosine

10. Arbitrary group delay
11. Half-band lowpass
12. Half-band highpass
13. Nyquist
Below the
Filter Type
panel is the panel for the design method. When the
button for IIR ﬁlter is clicked, the dropdown list gives us the following options
specifying the type of frequency response:
•
Butterworth
•
Chebyshev I
•
Chebyshev II
•
Elliptic
•
Least-pth norm
•
Constrained least-pth norm
2
The author acknowledges that the material on the FDA Tool described in this chapter is based on
the Help Manual for Filter Design Toolbox found in MATLAB version 6.5.
FILTER DESIGN –ANALYSIS TOOL
357
(a)
(b)
Figure 7.1 Screen capture of
fdatool

window: (a) window for ﬁlter design;
(b) window for quantization analysis.
the following options are available for the FIR ﬁlter:
•
Equiripple
•
Least squares
•
Window
•
Maximally ﬂat
•
Least-pth norm
•
Constrained equiripple
358
QUANTIZED FILTER ANALYSIS
To the right of the panel for design method is the one for ﬁlter order. We can
either specify the order of the ﬁlter or let the program compute the minimum order
(by use of SP Tool functions
Chebord, Buttord
, etc.). Remember to choose an
odd order for the lowpass ﬁlter when it is to be designed as a parallel connection
of two allpass ﬁlters, if an even number is given as the minimum order. Below
this panel is the panel for other options, which are available depending on the
abovementioned inputs. For example, if we choose a FIR ﬁlter with the window
option, this panel displays an option for the windows that we can choose. By
clicking the button for the windows, we get a dropdown list of more than 10
windows. To the right of this panel are two panels that we use to specify the
frequency speciﬁcations, that is, to specify the sampling frequency, cutoff fre-

quencies for the passband and stopband, the magnitude in the passband(s) and
stopband(s), and so on depending on the type of ﬁlter and the design method
chosen. These can be expressed in hertz, kilohertz, megahertz, gigahertz, or nor-
malized frequency. The magnitude can be expressed in decibels, with magnitude
squared or actual magnitude as displayed when we click
Analysis
in the main
menu bar and then click the option
Frequency Specifications
in the drop-
down list. The frequency speciﬁcations are displayed in the
Analysis
panel,
which is above the panel for frequency speciﬁcations, when we start with the
ﬁlter design.
The options available under any of these categories are dependent on the
other options chosen. All the FDA Tool functions, which are also the functions
of the SP Tool, are called overloaded functions. After all the design options are
chosen, we click the
Design Filter
button at the bottom of the dialog box. The
program designs the ﬁlter and displays the magnitude response of the ﬁlter in the
Analysis
area. But it is only a default choice, and by clicking the appropriate
icons shown above this area, the
Analysis
area displays one of the following
features:
•
Magnitude response

•
Phase response
•
Magnitude and phase response
•
Group delay response
•
Impulse response
•
Step response
•
Pole–zero plot
•
Filter coefﬁcients
This information can also be displayed by clicking the
Analysis
button in the
main menu bar, and choosing the information we wish to display in the
Anal-
ysis
area. We can also choose some additional information, for example, by
clicking the
Analysis Parameters
. At the bottom of this dropdown list is the
option
Full View Analysis
. When this is chosen, whatever is displayed in the
Analysis
area is shown in a new panel of larger dimensions with features that
FILTER DESIGN –ANALYSIS TOOL

359
are available in a ﬁgure displayed under the SP Tool. For example, by clicking the
Edit
button and then selecting either
Figure Properties
,
Axis Properties
,
or
Current Object Properties
,the
Property Editor
becomes active and
properties of these three objects can be modiﬁed.
Finally, we look at the ﬁrst panel titled
Current Filter Information
.
This lists the structure, order, and number of sections of the ﬁlter that we have
designed. Below this information, it indicates whether the ﬁlter is stable and
points out whether the source is the designed ﬁlter (i.e., reference ﬁlter designed
with double precision) or the quantized ﬁlter with a ﬁnite wordlength. The default
structure for the IIR reference ﬁlter is a cascade connection of second-order
sections, and for the FIR ﬁlter, it is the direct form. When we have completed
the design of the reference ﬁlter with double precision, we verify whether it
meets the desired speciﬁcation, and if we wish, we can convert the structure of
the reference ﬁlter to any one of the other types listed below. We click the
Edit
button on the main menu and then the
Convert Structure
button. A dropdown

list shows the structures to which we can convert from the default structure or
the one that we have already converted.
For IIR ﬁlters, the structures are
1. Direct form I
2. Direct form II
3. Direct form I transposed
4. Direct form II transposed
5. Lattice ARMA
6. Lattice-coupled allpass
7. Lattice-coupled allpass—power complementary
8. State space
Items 6 and 7 in this list refer to structures of the two allpass networks in
parallel as described in Chapter 6, with transfer functions G(z) =
1
2
[A
1
(z) +
A
2
(z)]andH(z) =
1
2
[A
1
(z) − A
2
(z)], respectively. The allpass ﬁlters A
1
(z) and

A
2
(z) are realized in the form of lattice allpass structures like the one shown
in Figure 6.19b. The MA and AR structures are considered special cases of the
lattice ARMA structure, which are also discussed in Chapter 6.
For FIR ﬁlters, the options for the structures are
•
Direct-form FIR
•
Direct-form FIR transposed
•
Direct-form symmetric FIR
When we have converted to a new structure, the information that can be
displayed in the
Analysis
area, like the coefﬁcients of the ﬁlter, changes. We also
like to point out that any one of the lowpass, highpass, bandpass, and bandstop
ﬁlters that we have designed can be converted to any other type, by clicking
360
QUANTIZED FILTER ANALYSIS
the ﬁrst icon on the left-hand bar in the dialog box and adding the frequency
speciﬁcations for the new ﬁlter.
7.3 QUANTIZED FILTER ANALYSIS
When we have ﬁnished the analysis of the reference ﬁlter, we can move to
construct the quantized ﬁlter as an object, by clicking the last icon on the bar
above the
Analysis
area and the second icon on the left-hand bar, which sets
the quantization parameters. The panel below the
Analysis

area now changes
as shown in Figure 7.1b. We can construct three objects inside the FDA Tool:
qfilt
,
qfft
,and
quantizer
. Each of them has several properties, and these
properties have values, which may be strings or numerical values. Currently
we use the objects
qfilt
and
quantizer
to analyze the performance of the
reference ﬁlter when it is quantized. When we click the
Turn Quantization
On
button and the
Set Quantization Parameters
icon, we can choose the
quantization parameters for the coefﬁcients of the ﬁlter. Quantization of the ﬁlter
coefﬁcients alone are sufﬁcient for ﬁnding the ﬁnite wordlength effect on the
magnitude response, phase response, and group delay response of the quantized
ﬁlter, which for comparison with the response of the reference ﬁlter displayed
in the
Analysis
area. Quantization of the other data listed below are necessary
when we have to ﬁlter an input signal:
•
The input signal

•
The output signal
•
The multiplicand: the value of the signal that is multiplied by the multiplier.
•
The product of the multiplicand and the multiplier constant
•
The output signal
The object
quantizer
is used to convert each of these data, and this object has
four properties:
Mode, Round Mode, Overflow mode
,and
Format
.Inorderto
understand the values of these properties, it is necessary to review and understand
the binary representation of numbers and the different results of adding them and
multiplying them. These will be discussed next.
7.4 BINARY NUMBERS AND ARITHMETIC
Numbers representing the values of the signal, the coefﬁcients of both the ﬁlter
and the difference equation or the recursive algorithm and other properties cor-
responding to the structure for the ﬁlter are represented in binary form. They are
based on the radix of 2 and therefore consist of only two binary digits, 0 and 1,
which are more commonly known as bits, just as the decimal numbers based on a
radix of 10 have 10 decimal numbers from 0 to 9. Placement of the bits in a string
determines the binary number as illustrated by the example x
2
= 1001


1010,
BINARY NUMBERS AND ARITHMETIC
361
which is equivalent to x
10
= 1 × 2
0
+ 1 × 2
3
+ 2
−1
+ 2
−3
= 9.625. In this dis-
cussion of binary number representation, we have used the symbol  to separate
the integer part and the fractional part and the subscripts 2 and 10 to denote the
binary number and the decimal number. Another example given by
x
2
= b
2
b
1
b
0
b
−1
b
−2
b

−3
b
−4
(7.1)
has a decimal value computed as
x
10
= b
2
2
+ b
1
1
+ b
0
0
+ b
−1
−1
+ b
−2
−2
+ b
−3
−3
+ b
4
−4
(7.2)
where the bits b

2
,b
1
,b
0
,b
−1
,b
−2
,b
−3
,b
−4
are either 1 or 0. In general, when
x
2
is represented as
x
2
= b
I −1
b
I −2
···b
1
b
0
b
−1
b

−2
···b
−F
(7.3)
the decimal number has a value given by
x
10
=
I −1

i=−F
b
i
2
i
(7.4)
In the binary representation (7.3), the integer part contains I bits and the bit b
I −1
at the leftmost position is called the most signiﬁcant bit (MSB); the fractional
part contains F bits, and the bit b
−F
at the rightmost position is called the least
signiﬁcant bit (LSB). This can only represent the magnitude of positive numbers
and is known as the unsigned ﬁxed-point binary number. In order to represent
positive as well as negative numbers, one more bit called the sign bit is added to
the left of the MSB. The sign bit, represented by the symbol s in (7.5), assigns
a negative sign when this bit is 1 and a positive sign when it is 0. So it becomes
a signed magnitude ﬁxed-point binary number. Therefore a signed magnitude
number x
2

= 11001

1010 is x
10
=−9.625. In general, the signed magnitude
ﬁxed-point number is given by
x
10
= (−1)
s
I −1

i=−F
b
i
2
i
(7.5)
and the total number of bits is called the wordlength w = 1 + I + F .When
two signed magnitude numbers with widely different values for the integer part
and/or the fractional part have to be added, it is not easy to program the adders
in the digital hardware to implement this operation. So it is common practice
to choose I = 0, keeping the sign bit and the bits for the fractional part only
so that F = w − 1 in the signed magnitude ﬁxed-point representation. But when
two numbers larger than 0.5 in decimal value are added, their sum is larger
than 1, and this cannot be represented by the format shown above, where I = 0.
362
QUANTIZED FILTER ANALYSIS
So two other form of representing the numbers are more commonly used: the
one’s-complement and two’s-complement forms (also termed one-complementary

and two-complementary forms) for representing the signed magnitude ﬁxed-point
numbers. In the one’s-complement form, the bits of the fractional part are replaced
by their complement, that is, the ones are replaced by zeros and vice versa. By
adding a one as the least signiﬁcant bit to the one’s-complement form, we get
the two’s-complement form of binary representation; the sign bit is retained in
both forms. But it must be observed that when the binary number is positive, the
signed magnitude form, one’s-complement form, and two’s-complement form are
the same.
Example 7.1
Given: x
2
= 0

1100 is the 5-bit, signed magnitude ﬁxed-point number equal to
x
10
=+2
−1
+ 2
−2
= 0.75 and v
2
= 1

1100 is equal to v
10
=−0.75. The one’s
complement of v
2
= 1


1100 is 1

0011, whereas the two’s complement of v
2
is
1

0011 +

0001 = 1

0100.
The values that can be represented by the signed magnitude ﬁxed-point repre-
sentation range from −2
w−F −1
to 2
w−F −1
− 2
−F
. In order to increase the range
of numbers that can be represented, two more formats are available: the ﬂoating-
point and block ﬂoating-point representations. The ﬂoating-point representation
of a binary number is of the form
X
10
= (1)
s
M(2
E

) (7.6)
where M is the mantissa, which is usually represented by a signed magnitude,
ﬁxed-point binary number, and E is a positive- or negative-valued integer with
E bits and is called the exponent. To get both positive and negative exponents,
the bias is provided by an integer, usually the bias is chosen as e
7
− 1 = 127
when the exponent E is 8 bits or e
10
− 1 = 1023 when E is 11 bits. Without
the bias, an 8-bit integer number varies from 0 to 255, but with a bias of 127,
the exponent varies from −127 to 127. Also the magnitude of the fractional part
F is limited to 0 ≤ M<1. In order to increase the range of the mantissa, one
more bit is added to the most signiﬁcant bit of F so that it is represented as
(1.F ). Now it is assumed to be normalized, but this bit is not counted in the total
wordlength.
The IEEE 754-1985 standard for representing ﬂoating-point numbers is the
most common standard used in DSP processors. It uses a single-precision format
with 32 bits and a double-precision format with 64 bits.
The single-precision ﬂoating point number is given by
X
10
= (−1)
s
(1.F )2
E−127
(7.7)
According to this standard, the (32-bit) single-precision, ﬂoating-point number
uses one sign bit, 8 bits for the exponent, and 23 bits for the fractional part
BINARY NUMBERS AND ARITHMETIC

363
(b)
b
11
s
b
10
b
0
E (11 bits) F (52 bits)
b
−
1
b
−52
(a)
b
8
s
b
7
b
0
E (8 bits) F (23 bits)
b
−
1
b
−23
Figure 7.2 IEEE format of bits for the 32- and 64-bit ﬂoating-point numbers.

F (and one bit to normalize it). A representation of this format is shown in
Figure 7.2a. But this formula is implemented according to the following rules in
order to satisfy conditions other than the ﬁrst one listed below:
1. When 0 <E<255, then X
10
= (−1)
s
(1

F)2
E−127
.
2. When E = 0andM = 0, then X
10
= (−1)
s
(0

F)(2
−126
).
3. When E = 255 and M = 0, then X
10
is not a number and is denoted as
NaN.
4. When E = 255 and M = 0, then X
10
= (−1)
s
∞.

5. When E = 0andM = 0, then X
10
= (−1)
s
(0).
Here, (1

F)is the normalized mantissa with one integer bit and 23 fractional bits,
whereas (0

F) is only the fractional part with 23 bits. Most of the commercial
DSP chips use this 32-bit, single-precision, ﬂoating-point binary representation,
although 64-bit processors are becoming available. Note that there is no provision
for storing the binary point (

) in these chips; their registers simply store the bits
and implement the rules listed above. The binary point is used only as a notation
for our discussion of the binary number representation and is not counted in the
total number of bits.
The IEEE 754-1985 standard for the (64-bit), double-precision, ﬂoating-point
number is expressed by
X
10
= (−1)
s
(1.F )2
E−1023
(7.8)
It uses one sign bit, 11 bits for the exponent E, and 52 bits for F (one bit is
added to normalize it but is not counted). The representation for this format is

shown in Figure 7.2b.
Example 7.2
Consider the 16-bit ﬂoating-point number with 8 bits for the unbiased exponent
and 4 bits for the denormalized fractional part, namely, E = 8andF = 4. The

Quantized Filter Analysis

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về