Tải bản đầy đủ (.pdf) (27 trang)

Quantized Filter Analysis

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (490.71 KB, 27 trang )

CHAPTER 7
Quantized Filter Analysis
7.1 INTRODUCTION
The analysis and design of discrete-time systems, digital filters, and their realiza-
tions, computation of DFT-IDFT, and so on discussed in the previous chapters
of this book were carried out by using mostly the functions in the Signal Pro-
cessing Toolbox working in the MATLAB environment, and the computations
were carried out with double precision. This means that all the data representing
the values of the input signal, coefficients of the filters, or the values of the unit
impulse response, and so forth were represented with 64 bits; therefore, these
numbers have a range approximately between 10
−308
and 10
308
and a precision
of ∼2
−52
= 2.22 × 10
−6
. Obviously this range is so large and the precision with
which the numbers are expressed is so small that the numbers can be assumed to
have almost “infinite precision.” Once these digital filters and DFT-IDFT have
been obtained by the procedures described so far, they can be further analyzed
by mainframe computers, workstations, and PCs under “infinite precision.” But
when the algorithms describing the digital filters and FFT computations have
to be implemented as hardware in the form of special-purpose microprocessors
or application-specific integrated circuits (ASICs) or the digital signal processor
(DSP) chip, many practical considerations and constraints come into play. The
registers used in these hardware systems, to store the numbers have finite length,
and the memory capacity required for processing the data is determined by the
number of bits—also called the wordlength —chosen for storing the data. More


memory means more power consumption and hence the need to minimize the
wordlength. In microprocessors and DSP chips and even in workstations and PCs,
we would like to use registers with as few bits as possible and yet obtain high
computational speed, low power, and low cost. But such portable devices such as
cell phones and personal digital assistants (PDAs) have a limited amount of mem-
ory, containing batteries with low voltage and short duration of power supply.
These constraints become more severe in other devices such as digital hearing
aids and biomedical probes embedded in capsules to be swallowed. So there is a
Introduction to Digital Signal Processing and Filter Design, by B. A. Shenoi
Copyright © 2006 John Wiley & Sons, Inc.
354
FILTER DESIGN –ANALYSIS TOOL
355
great demand for designing digital filters and systems in which they are embed-
ded, with the lowest possible number of bits to represent the data or to store the
data in their registers. When the filters are built with registers of finite length and
the analog-to-digital converters (ADCs) are designed to operate at increasingly
high sampling rates, thereby reducing the number of bits with which the samples
of the input signal are represented, the frequency response of the filters and the
results of DFT-IDFT computations via the FFT are expected to differ from those
designed with “infinite precision.” This process of representing the data with a
finite number of bits is known as quantization, which occurs at several points
in the structure chosen to realize the filter or the steps in the FFT computation
of the DFT-IDFT. As pointed out in the previous chapter, a vast number of
structures are available to realize a given transfer function, when we assume infi-
nite precision. But when we design the hardware with registers of finite length to
implement their corresponding difference equation, the effect of finite wordlength
is highly dependent on the structure. Therefore we find it necessary to analyze
this effect for a large number of structures. This analysis is further compounded
by the fact that quantization can be carried out in several ways and the arithmetic

operations of addition and multiplication of numbers with finite precision yield
results that are influenced by the way that these numbers are quantized.
In this chapter, we discuss a new MATLAB toolbox called FDA Tool avail-
able
1
for analyzing and designing the filters with a finite number of bits for the
wordlength. The different form of representing binary numbers and the results of
adding and multiplying such numbers will be explained in a later section of this
chapter. The third factor that influences the deviation of filter performance from
the ideal case is the choice of FIR or IIR filter. The type of approximation chosen
for obtaining the desired frequency response is another factor that also influences
the effect of finite wordlength. We discuss the effects of all these factors in this
chapter, illustrating their influence by means of a design example.
7.2 FILTER DESIGN –ANALYSIS TOOL
An enormous amount of research has been carried out to address these problems,
but analyzing the effects of quantization on the performance of digital filters
and systems is not well illustrated by specific examples. Although there is no
analytical method available at present to design or analyze a filter with finite
precision, some useful insight can be obtained from the research work, which
serves as a guideline in making preliminary decisions on the choice of suitable
structures and quantization forms. Any student interested in this research work
should read the material on finite wordlength effects found in other textbooks
[1,2,4]. In this chapter, we discuss the software for filter design and analysis
that has been developed by The MathWorks to address the abovementioned
1
MATLAB and its Signal Processing Toolbox are found in computer systems of many schools and
universities but the FDA Tool may not be available in all of them.
356
QUANTIZED FILTER ANALYSIS
problem

2
. This FDA Tool finite design–analysis (FDA) tool, found in the Filter
Design Toolbox, works in conjunction with the Signal Processing (SP) Toolbox.
Unlike the SP Toolbox, the FDA Tool has been developed by making extensive use
of the object-oriented programming capability of MATLAB, and the syntax for the
functions available in the FDA Tool is different from the syntax for the functions
we find in MATLAB and the SP Toolbox. When we log on to MATLAB and type
fdatool
, we get two screens on display. On one screen, we type the
fdatool
functions as command lines to design and analyze quantized filters, whereas the
other screen is a graphical user interface (GUI) to serve the same purpose. The
GUI window shown in Figure 7.1a displays a dialog box with an immense array
of design options as explained below.
First we design a filter with double precision on the GUI window using the
FDA Tool or on the command window using the Signal Processing Toolbox and
then import it into the GUI window. In the dialog box for the FDA Tool, we can
choose the following options under the
Filter Type
panel:
1. Lowpass
2. Highpass
3. Bandpass
4. Bandstop
5. Differentiator. By clicking the arrow on the tab for this feature, we get
the following additional options.
6. Hilbert transformer
7. Multiband
8. Arbitrary magnitude
9. Raised cosine

10. Arbitrary group delay
11. Half-band lowpass
12. Half-band highpass
13. Nyquist
Below the
Filter Type
panel is the panel for the design method. When the
button for IIR filter is clicked, the dropdown list gives us the following options
specifying the type of frequency response:

Butterworth

Chebyshev I

Chebyshev II

Elliptic

Least-pth norm

Constrained least-pth norm
2
The author acknowledges that the material on the FDA Tool described in this chapter is based on
the Help Manual for Filter Design Toolbox found in MATLAB version 6.5.
FILTER DESIGN –ANALYSIS TOOL
357
(a)
(b)
Figure 7.1 Screen capture of
fdatool

window: (a) window for filter design;
(b) window for quantization analysis.
the following options are available for the FIR filter:

Equiripple

Least squares

Window

Maximally flat

Least-pth norm

Constrained equiripple
358
QUANTIZED FILTER ANALYSIS
To the right of the panel for design method is the one for filter order. We can
either specify the order of the filter or let the program compute the minimum order
(by use of SP Tool functions
Chebord, Buttord
, etc.). Remember to choose an
odd order for the lowpass filter when it is to be designed as a parallel connection
of two allpass filters, if an even number is given as the minimum order. Below
this panel is the panel for other options, which are available depending on the
abovementioned inputs. For example, if we choose a FIR filter with the window
option, this panel displays an option for the windows that we can choose. By
clicking the button for the windows, we get a dropdown list of more than 10
windows. To the right of this panel are two panels that we use to specify the
frequency specifications, that is, to specify the sampling frequency, cutoff fre-

quencies for the passband and stopband, the magnitude in the passband(s) and
stopband(s), and so on depending on the type of filter and the design method
chosen. These can be expressed in hertz, kilohertz, megahertz, gigahertz, or nor-
malized frequency. The magnitude can be expressed in decibels, with magnitude
squared or actual magnitude as displayed when we click
Analysis
in the main
menu bar and then click the option
Frequency Specifications
in the drop-
down list. The frequency specifications are displayed in the
Analysis
panel,
which is above the panel for frequency specifications, when we start with the
filter design.
The options available under any of these categories are dependent on the
other options chosen. All the FDA Tool functions, which are also the functions
of the SP Tool, are called overloaded functions. After all the design options are
chosen, we click the
Design Filter
button at the bottom of the dialog box. The
program designs the filter and displays the magnitude response of the filter in the
Analysis
area. But it is only a default choice, and by clicking the appropriate
icons shown above this area, the
Analysis
area displays one of the following
features:

Magnitude response


Phase response

Magnitude and phase response

Group delay response

Impulse response

Step response

Pole–zero plot

Filter coefficients
This information can also be displayed by clicking the
Analysis
button in the
main menu bar, and choosing the information we wish to display in the
Anal-
ysis
area. We can also choose some additional information, for example, by
clicking the
Analysis Parameters
. At the bottom of this dropdown list is the
option
Full View Analysis
. When this is chosen, whatever is displayed in the
Analysis
area is shown in a new panel of larger dimensions with features that
FILTER DESIGN –ANALYSIS TOOL

359
are available in a figure displayed under the SP Tool. For example, by clicking the
Edit
button and then selecting either
Figure Properties
,
Axis Properties
,
or
Current Object Properties
,the
Property Editor
becomes active and
properties of these three objects can be modified.
Finally, we look at the first panel titled
Current Filter Information
.
This lists the structure, order, and number of sections of the filter that we have
designed. Below this information, it indicates whether the filter is stable and
points out whether the source is the designed filter (i.e., reference filter designed
with double precision) or the quantized filter with a finite wordlength. The default
structure for the IIR reference filter is a cascade connection of second-order
sections, and for the FIR filter, it is the direct form. When we have completed
the design of the reference filter with double precision, we verify whether it
meets the desired specification, and if we wish, we can convert the structure of
the reference filter to any one of the other types listed below. We click the
Edit
button on the main menu and then the
Convert Structure
button. A dropdown

list shows the structures to which we can convert from the default structure or
the one that we have already converted.
For IIR filters, the structures are
1. Direct form I
2. Direct form II
3. Direct form I transposed
4. Direct form II transposed
5. Lattice ARMA
6. Lattice-coupled allpass
7. Lattice-coupled allpass—power complementary
8. State space
Items 6 and 7 in this list refer to structures of the two allpass networks in
parallel as described in Chapter 6, with transfer functions G(z) =
1
2
[A
1
(z) +
A
2
(z)]andH(z) =
1
2
[A
1
(z) − A
2
(z)], respectively. The allpass filters A
1
(z) and

A
2
(z) are realized in the form of lattice allpass structures like the one shown
in Figure 6.19b. The MA and AR structures are considered special cases of the
lattice ARMA structure, which are also discussed in Chapter 6.
For FIR filters, the options for the structures are

Direct-form FIR

Direct-form FIR transposed

Direct-form symmetric FIR
When we have converted to a new structure, the information that can be
displayed in the
Analysis
area, like the coefficients of the filter, changes. We also
like to point out that any one of the lowpass, highpass, bandpass, and bandstop
filters that we have designed can be converted to any other type, by clicking
360
QUANTIZED FILTER ANALYSIS
the first icon on the left-hand bar in the dialog box and adding the frequency
specifications for the new filter.
7.3 QUANTIZED FILTER ANALYSIS
When we have finished the analysis of the reference filter, we can move to
construct the quantized filter as an object, by clicking the last icon on the bar
above the
Analysis
area and the second icon on the left-hand bar, which sets
the quantization parameters. The panel below the
Analysis

area now changes
as shown in Figure 7.1b. We can construct three objects inside the FDA Tool:
qfilt
,
qfft
,and
quantizer
. Each of them has several properties, and these
properties have values, which may be strings or numerical values. Currently
we use the objects
qfilt
and
quantizer
to analyze the performance of the
reference filter when it is quantized. When we click the
Turn Quantization
On
button and the
Set Quantization Parameters
icon, we can choose the
quantization parameters for the coefficients of the filter. Quantization of the filter
coefficients alone are sufficient for finding the finite wordlength effect on the
magnitude response, phase response, and group delay response of the quantized
filter, which for comparison with the response of the reference filter displayed
in the
Analysis
area. Quantization of the other data listed below are necessary
when we have to filter an input signal:

The input signal


The output signal

The multiplicand: the value of the signal that is multiplied by the multiplier.

The product of the multiplicand and the multiplier constant

The output signal
The object
quantizer
is used to convert each of these data, and this object has
four properties:
Mode, Round Mode, Overflow mode
,and
Format
.Inorderto
understand the values of these properties, it is necessary to review and understand
the binary representation of numbers and the different results of adding them and
multiplying them. These will be discussed next.
7.4 BINARY NUMBERS AND ARITHMETIC
Numbers representing the values of the signal, the coefficients of both the filter
and the difference equation or the recursive algorithm and other properties cor-
responding to the structure for the filter are represented in binary form. They are
based on the radix of 2 and therefore consist of only two binary digits, 0 and 1,
which are more commonly known as bits, just as the decimal numbers based on a
radix of 10 have 10 decimal numbers from 0 to 9. Placement of the bits in a string
determines the binary number as illustrated by the example x
2
= 1001


1010,
BINARY NUMBERS AND ARITHMETIC
361
which is equivalent to x
10
= 1 × 2
0
+ 1 × 2
3
+ 2
−1
+ 2
−3
= 9.625. In this dis-
cussion of binary number representation, we have used the symbol  to separate
the integer part and the fractional part and the subscripts 2 and 10 to denote the
binary number and the decimal number. Another example given by
x
2
= b
2
b
1
b
0
b
−1
b
−2
b

−3
b
−4
(7.1)
has a decimal value computed as
x
10
= b
2
2
+ b
1
1
+ b
0
0
+ b
−1
−1
+ b
−2
−2
+ b
−3
−3
+ b
4
−4
(7.2)
where the bits b

2
,b
1
,b
0
,b
−1
,b
−2
,b
−3
,b
−4
are either 1 or 0. In general, when
x
2
is represented as
x
2
= b
I −1
b
I −2
···b
1
b
0
b
−1
b

−2
···b
−F
(7.3)
the decimal number has a value given by
x
10
=
I −1

i=−F
b
i
2
i
(7.4)
In the binary representation (7.3), the integer part contains I bits and the bit b
I −1
at the leftmost position is called the most significant bit (MSB); the fractional
part contains F bits, and the bit b
−F
at the rightmost position is called the least
significant bit (LSB). This can only represent the magnitude of positive numbers
and is known as the unsigned fixed-point binary number. In order to represent
positive as well as negative numbers, one more bit called the sign bit is added to
the left of the MSB. The sign bit, represented by the symbol s in (7.5), assigns
a negative sign when this bit is 1 and a positive sign when it is 0. So it becomes
a signed magnitude fixed-point binary number. Therefore a signed magnitude
number x
2

= 11001

1010 is x
10
=−9.625. In general, the signed magnitude
fixed-point number is given by
x
10
= (−1)
s
I −1

i=−F
b
i
2
i
(7.5)
and the total number of bits is called the wordlength w = 1 + I + F .When
two signed magnitude numbers with widely different values for the integer part
and/or the fractional part have to be added, it is not easy to program the adders
in the digital hardware to implement this operation. So it is common practice
to choose I = 0, keeping the sign bit and the bits for the fractional part only
so that F = w − 1 in the signed magnitude fixed-point representation. But when
two numbers larger than 0.5 in decimal value are added, their sum is larger
than 1, and this cannot be represented by the format shown above, where I = 0.
362
QUANTIZED FILTER ANALYSIS
So two other form of representing the numbers are more commonly used: the
one’s-complement and two’s-complement forms (also termed one-complementary

and two-complementary forms) for representing the signed magnitude fixed-point
numbers. In the one’s-complement form, the bits of the fractional part are replaced
by their complement, that is, the ones are replaced by zeros and vice versa. By
adding a one as the least significant bit to the one’s-complement form, we get
the two’s-complement form of binary representation; the sign bit is retained in
both forms. But it must be observed that when the binary number is positive, the
signed magnitude form, one’s-complement form, and two’s-complement form are
the same.
Example 7.1
Given: x
2
= 0

1100 is the 5-bit, signed magnitude fixed-point number equal to
x
10
=+2
−1
+ 2
−2
= 0.75 and v
2
= 1

1100 is equal to v
10
=−0.75. The one’s
complement of v
2
= 1


1100 is 1

0011, whereas the two’s complement of v
2
is
1

0011 +

0001 = 1

0100.
The values that can be represented by the signed magnitude fixed-point repre-
sentation range from −2
w−F −1
to 2
w−F −1
− 2
−F
. In order to increase the range
of numbers that can be represented, two more formats are available: the floating-
point and block floating-point representations. The floating-point representation
of a binary number is of the form
X
10
= (1)
s
M(2
E

) (7.6)
where M is the mantissa, which is usually represented by a signed magnitude,
fixed-point binary number, and E is a positive- or negative-valued integer with
E bits and is called the exponent. To get both positive and negative exponents,
the bias is provided by an integer, usually the bias is chosen as e
7
− 1 = 127
when the exponent E is 8 bits or e
10
− 1 = 1023 when E is 11 bits. Without
the bias, an 8-bit integer number varies from 0 to 255, but with a bias of 127,
the exponent varies from −127 to 127. Also the magnitude of the fractional part
F is limited to 0 ≤ M<1. In order to increase the range of the mantissa, one
more bit is added to the most significant bit of F so that it is represented as
(1.F ). Now it is assumed to be normalized, but this bit is not counted in the total
wordlength.
The IEEE 754-1985 standard for representing floating-point numbers is the
most common standard used in DSP processors. It uses a single-precision format
with 32 bits and a double-precision format with 64 bits.
The single-precision floating point number is given by
X
10
= (−1)
s
(1.F )2
E−127
(7.7)
According to this standard, the (32-bit) single-precision, floating-point number
uses one sign bit, 8 bits for the exponent, and 23 bits for the fractional part
BINARY NUMBERS AND ARITHMETIC

363
(b)
b
11
s
b
10
b
0
E (11 bits) F (52 bits)
b

1
b
−52
(a)
b
8
s
b
7
b
0
E (8 bits) F (23 bits)
b

1
b
−23
Figure 7.2 IEEE format of bits for the 32- and 64-bit floating-point numbers.

F (and one bit to normalize it). A representation of this format is shown in
Figure 7.2a. But this formula is implemented according to the following rules in
order to satisfy conditions other than the first one listed below:
1. When 0 <E<255, then X
10
= (−1)
s
(1

F)2
E−127
.
2. When E = 0andM = 0, then X
10
= (−1)
s
(0

F)(2
−126
).
3. When E = 255 and M = 0, then X
10
is not a number and is denoted as
NaN.
4. When E = 255 and M = 0, then X
10
= (−1)
s
∞.

5. When E = 0andM = 0, then X
10
= (−1)
s
(0).
Here, (1

F)is the normalized mantissa with one integer bit and 23 fractional bits,
whereas (0

F) is only the fractional part with 23 bits. Most of the commercial
DSP chips use this 32-bit, single-precision, floating-point binary representation,
although 64-bit processors are becoming available. Note that there is no provision
for storing the binary point (

) in these chips; their registers simply store the bits
and implement the rules listed above. The binary point is used only as a notation
for our discussion of the binary number representation and is not counted in the
total number of bits.
The IEEE 754-1985 standard for the (64-bit), double-precision, floating-point
number is expressed by
X
10
= (−1)
s
(1.F )2
E−1023
(7.8)
It uses one sign bit, 11 bits for the exponent E, and 52 bits for F (one bit is
added to normalize it but is not counted). The representation for this format is

shown in Figure 7.2b.
Example 7.2
Consider the 16-bit floating-point number with 8 bits for the unbiased exponent
and 4 bits for the denormalized fractional part, namely, E = 8andF = 4. The

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×