pankaj n. topiwala - wavelet image and video compression

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (30.12 MB, 449 trang )

WAVELET IMAGE AND VIDEO
COMPRESSION
1
1
4
4
7
7
9
10
11
12
13
15
15
15
19
21
23
24
25
28
30
33
33
38
38
40
43
45

51
51
56
57
58
Contents
1 Introduction by Pankaj N. Topiwala
1
2
3
4
Background
. . .
Compression Standards
Fourier versus Wavelets
Overview of Book

4.1
4.2
4.3
4.4
Part I: Background Material

Part II: Still Image Coding

Part III: Special Topics in Still Image Coding
Part IV: Video Coding

5
References

I Preliminaries
2 Preliminaries by Pankaj N. Topiwala
1
2
3
4
Mathematical Preliminaries
1.1
1.2
1.3
Finite-Dimensional Vector Spaces

Analysis

Fourier Analysis
.
Digital Signal Processing

2.1
2.2
Digital Filters

Z-Transform and Bandpass Filtering

Primer on Probability

References
3 Time-Frequency Analysis, Wavelets And Filter Banks
.

by Pankaj N. Topiwala
1
2
3
4
5
6
7
Fourier Transform and the Uncertainty Principle

Fourier Series, Time-Frequency Localization

2.1
2.2
Fourier Series

Time-Frequency Representations

The Continuous Wavelet Transform

Wavelet Bases and Multiresolution Analysis

Wavelets and Subband Filter Banks
5.1
5.2
Two-Channel Filter Banks
Example FIR PR QMF Banks

Wavelet Packets

References

Contents vi
61
61
63
64
64
66
66
67
68
70
70
73
73
73
73
75
76
78
79
79
80
83
83
85
86
91
93

95
95
96
97
101
103
105
106
111
111
4
Introduction
To
Compression

by
Pankaj
N.
Topiwala
1
3
4
5
6
7
Types of Compression
Resume of Lossless Compression
2.1
2.2
2.3

2.4
DPCM
Huffman Coding

Arithmetic Coding
Run-Length Coding
Quantization
3.1
3.2
Scalar Quantization

Vector Quantization . . .
Summary of Rate-Distortion Theory

Approaches to Lossy Compression
5.1
5.2
5.3
5.4
5.5
VQ
Transform Image Coding Paradigm
JPEG

Pyramid
Wavelets
Image Quality Metrics

6.1
6.2

Metrics
Human Visual SystemMetrics

References

5
Symmetric
Extension
Transforms

by
Christopher
M.
Brislawn
1
2
3
4
Expansive vs. nonexpansive transforms

Four types of symmetry

Nonexpansive two-channel SET’s
References

II Still Image Coding
6 Wavelet Still Image Coding: A Baseline MSE and HVS Approach
by Pankaj N. Topiwala
1
2

3
4
5
6
7
Introduction

Subband Coding .
(Sub)optimal Quantization

Interband Decorrelation, Texture Suppression
Human Visual System Quantization

Summary

References

7 Image Coding Using Multiplier-Free Filter Banks by
Alen Docef, Faouzi Kossentini, Wilson C. Chung and Mark J. T. Smith
1
Introduction
1

1
Based on “Multiplication-Free Subband Coding of Color Images”, by Docef, Kossen-
tini, Chung and Smith, which appeared in the Proceedings of the Data Compression
Contents vii
112
114
116

117
119
123
123
124
124
125
125
125
126
127
128
128
130
132
135
135
136
137
138
139
140
141
143
146
146
157
157
159
160

161
162
163
165
168
168
2
3
4
5
6
Coding System
Design Algorithm

Multiplierless Filter Banks

Performance

References

8 Embedded Image Coding Using Zerotrees of Wavelet Coefficients
by Jerome M. Shapiro
1 Introduction and Problem Statement

1.1
1.2
1.3
Embedded Coding

Features of the Embedded Coder

Paper Organization

2
Wavelet Theory and Multiresolution Analysis
2.1
2.2
2.3
Trends and Anomalies
Relevance to Image Coding

A Discrete Wavelet Transform

3
Zerotrees of Wavelet Coefficients

3.1
3.2
3.3
3.4
Significance Map Encoding
Compression of Significance Maps using Zerotrees of Wavelet
Coefficients
Interpretation as a Simple Image Model
Zerotree-like Structures in Other Subband Configurations
. .
4
Successive-Approximation
4.1
4.2

4.3
4.4
4.5
Successive-Approximation Entropy-Coded Quantization
. .
Relationship to Bit Plane Encoding
Advantage of Small Alphabets for Adaptive Arithmetic Coding
Order of Importance of the Bits
Relationship to Priority-Position Coding
5
6
7
8
A Simple Example
Experimental Results
Conclusion

References

9
A New Fast/Efficient Image Codec Based on Set Partitioning in
Hierarchical Trees . by Amir Said and William A. Pearlman
1
2
3
4
5
6
7
8

9
Introduction

Progressive Image Transmission
Transmission
of the
Coefficient
Values

Set Partitioning Sorting Algorithm

Spatial Orientation Trees

Coding Algorithm

Numerical Results

Summary and Conclusions

References
Conference, Snowbird, Utah, March 1995, pp 352-361, ©1995 IEEE
Contents viii
171
171
173
173
173
174
176
177

177
181
183
184
184
185
187
188
189
190
190
191
191
191
192
194
199
199
199
200
202
202
203
204
206
208
209
211
212
214

215
215
10 Space-frequency Quantization for Wavelet Image Coding
. . . by Zixiang Xiong, Kannan Ramchandran, and Michael T. Orchard
1
2
3
4
5
6
7
8
9
10
11
Introduction

Background and Problem Statement
2.1
2.2
2.3
2.4
Defining
the
tree

Motivation
and
high
level

description

Notation
and
problem
statement

Proposed approach
The SFQ Coding Algorithm

3.1
3.2
3.3
3.4
Tree pruning algorithm: Phase I (for fixed quantizer q and
fixed
)

Predicting the tree: Phase II
Joint
Optimization
of
Space-Frequency
Quantizers

Complexity of the SFQ algorithm
Coding Results
Extension of the SFQ Algorithm from Wavelet to Wavelet Packets .
Wavelet packets
Wavelet packet SFQ

Wavelet packet SFQ coder design
8.1
8.2
Optimal design: Joint application of the single tree algorithm
and SFQ
Fast heuristic: Sequential applications of the single tree algo-
rithm and SFQ
Experimental Results
9.1
9.2
Results from the joint wavelet packet transform and SFQ design
Results from the sequential wavelet packet transform and
SFQ design
Discussion and Conclusions
References
11 Subband Coding of Images Using Classification and Trellis Coded
Quantization by Rajan Joshi and Thomas R. Fischer
1
2
3
4
5
6
7
Introduction
Classification
of
blocks
of an
image

subband

2.1
2.2
2.3
2.4
Classification gain for a single subband
Subband classification gain
Non-uniform
classification

The trade-off between the side rate and the classification gain
Arithmetic coded trellis coded quantization
3.1
3.2
3.3
Trellis coded quantization

Arithmetic coding
Encoding generalized Gaussian sources with ACTCQ system
Image
subband
coder
based
on
classification
and
ACTCQ

4.1

Description of the image subband coder
Simulation results
Acknowledgment

References

Contents ix
221
221
222
224
224
227
229
232
233
234
237
239
239
239
240
240
242
243
243
243
245
245
246

246
246
247
247
249
250
251
253
253
254
255
256
257
258
261
261
12 Low-Complexity Compression of Run Length Coded Image Sub-
bands by John D. Villasenor and Jiangtao Wen
1
2
3
4
5
6
7
Introduction

Large-scale statistics of run-length coded subbands

Structured code trees

3.1
3.2
Code Descriptions
Code Efficiency for Ideal Sources

Application to image coding

Image coding results
Conclusions
References

III Special Topics in Still Image Coding
13
Fractal Image Coding as Cross-Scale Wavelet Coefficient Predic-
tion

by
Geoffrey
Davis
1
2
3
4
5
6
7
Introduction
Fractal Block Coders
2.1
2.2

2.3
Motivation for Fractal Coding

Mechanics of Fractal Block Coding

Decoding Fractal Coded Images

A Wavelet Framework
3.1
3.2
Notation

A Wavelet Analog of Fractal Block Coding

Self-Quantization of Subtrees

4.1
4.2
Generalization to non-Haar bases

Fractal Block Coding of Textures

Implementation

5.1
Bit Allocation
Results
6.1
6.2
6.3

SQS vs. Fractal Block Coders

Zerotrees
Limitations of Fractal Coding
References
14 Region of Interest Compression In Subband Coding
by Pankaj N. Topiwala
1
2
3
4
5
6
Introduction
Error Penetration
Quantization

Simulations

Acknowledgements

References
15
Wavelet-Based Embedded Multispectral Image Compression . . .
by Pankaj N. Topiwala
1
Introduction

262
263

265
265
266
267
268
271
271
272
275
276
276
277
278
280
280
281
281
281
282
283
283
283
283
286
286
287
289
289
290
291

291
292
294
295
296
300
303
303
304
305
2
3
4
An Embedded Multispectral Image Coder
2.1
2.2
2.3
2.4
Algorithm Overview
Transforms
Quantization
Entropy Coding

Simulations
References
16 The FBI Fingerprint Image Compression Specification
. by Christopher M. Brislawn
1
2
3

4
5
6
7
Introduction
1.1
1.2
Background
Overview of the algorithm

The DWT subband decomposition for fingerprints

2.1
2.2
2.3
Linear phase filter banks

Symmetric boundary conditions
Spatial frequency decomposition
Uniform scalar quantization
3.1
3.2
Quantizer characteristics

Bit allocation

Huffman coding
4.1
4.2
The

Huffman
coding
model

Adaptive
Huffman
codebook
construction

The first-generation fingerprint image encoder
5.1
5.2
5.3
5.4
Source image normalization
First-generation wavelet filters
Optimal bit allocation and quantizer design
Huffman coding blocks
Conclusions

References
17 Embedded Image Coding Using Wavelet Difference Reduction . .
by Jun Tian and Raymond O. Wells, Jr.
1
2
3
4
5
6
7

8
9
Introduction
Discrete Wavelet Transform

Differential Coding

Binary Reduction

Description of the Algorithm

Experimental Results

SAR Image Compression
Conclusions
References

18
Block
Transforms
in
Progressive
Image
Coding

by
Trac
D. Tran and Truong Q. Nguyen
1
2

3
Introduction

The
wavelet
transform
and
progressive
image
transmission

Wavelet and block transform analogy

Contents x
Contents xi
307
308
313
317
319
319
319
320
321
321
322
323
323
325
326

327
328
329
330
333
334
335
336
337
338
339
341
342
342
343
343
344
349
349
4
5
6
Transform Design
Coding Results
References
IV Video Coding
19
Brief on Video Coding Standards by Pankaj N. Topiwala
1
2

3
4
5
6
Introduction
H.261
MPEG-1
MPEG-2 .
H.263 and MPEG-4
References

20
Interpolative Multiresolution Coding of Advanced TV with Sub-
channels by K. Metin Uz, Didier J. LeGall and Martin
Vetterli
1
2
3
Introduction
2

Multiresolution Signal Representations for Coding
Subband and Pyramid Coding

3.1
3.2
3.3
Characteristics of Subband Schemes

Pyramid Coding

Analysis of Quantization Noise
4
5
The Spatiotemporal Pyramid

Multiresolution Motion Estimation and Interpolation

5.1
5.2
5.3
Basic Search Procedure
Stepwise Refinement

Motion Based Interpolation
6
Compression for ATV

6.1
6.2
6.3
Compatibility and Scan Formats

Results
Relation to Emerging Video Coding Standards

7
Complexity

7.1

7.2
Computational Complexity

Memory Requirement

8
9
Conclusion and Directions
References

21
Subband Video Coding for Low to High Rate Applications
. by Wilson C. Chung, Faouzi Kossentini and Mark J. T. Smith
1
Introduction
3

2
©1991
IEEE.
Reprinted,
with
permission,
from
IEEE
Transactions
of
Circuits
and
Systems for Video Technology, pp.86-99, March, 1991.

3
Based on “A New Approach to Scalable Video Coding”, by Chung, Kossentini and
Smith, which appeared in the Proceedings of the Data Compression Conference, Snowbird,
Utah, March 1995, ©1995 IEEE
Contents xii
350
354
355
357
361
361
362
364
364
365
373
374
374
379
381
383
383
384
385
385
387
387
389
389
391

391
391
392
392
393
395
395
397
398
400
400
400
402
403
404
407
408
2
3
4
5
Basic Structure of the Coder
Practical Design & Implementation Issues

Performance

References

22
Very Low Bit Rate Video Coding Based on Matching Pursuits . .

by Ralph Neff and Avideh Zakhor
1
2
3
Introduction

Matching Pursuit Theory

Detailed System Description
3.1
3.2
3.3
3.4
Motion Compensation
Matching-Pursuit
Residual
Coding

Buffer
Regulation

Intraframe Coding

4
5
6
Results
Conclusions
References

23
Object-Based Subband/Wavelet Video Compression
by Soo-Chul Han and John W. Woods
1
2
Introduction

Joint Motion Estimation and Segmentation

2.1
2.2
2.3
2.4
Problem formulation
Probability models
Solution .
Results
3
Parametric Representation of Dense Object Motion Field

3.1
3.2
3.3
Parametric motion of objects
Appearance of new regions

Coding the object boundaries

4

Object Interior Coding

4.1
4.2
Adaptive Motion-Compensated Coding

Spatiotemporal (3-D) Coding of Objects

5
6
7
Simulation results
Conclusions

References
24
Embedded Video Subband Coding with 3D SPIHT by
William A. Pearlman, Beong-Jo Kim, and Zixiang Xiong
1
2
Introduction
System Overview
2.1
2.2
System Configuration
3D Subband Structure
3
4
SPIHT

3D SPIHT and Some Attributes
4.1
4.2
4.3
Spatio-temporal Orientation Trees

Color Video Coding
Scalability of SPIHT image/video Coder

410
411
411
412
413
414
415
415
418
419
420
422
422
433
433
433
435
437
4.4
Multiresolutional
Encoding

Motion Compensation
5.1
5.2
5.3
Block Matching Method
Hierarchical Motion Estimation
Motion Compensated Filtering
6
7
Implementation Details

Coding results

7.1
7.2
7.3
7.4
The High Rate Regime

The Low Rate Regime
Embedded Color Coding

Computation Times

8
9
Conclusions

References .

A Wavelet Image and Video Compression — The Home page
by Pankaj N. Topiwala
1
2
Homepage For This Book

Other Web Resources

B The Authors
C Index
5
Contents xiii
1
Introduction
Pankaj N. Topiwala
1 Background
It is said that a picture is worth a thousand words. However, in the Digital Era,
we find that a typical color picture corresponds to more than a million words, or
bytes. Our ability to sense something like 30 frames a second of color imagery, each
the equivalent of tens of millions of pixels, means that we can process a wealth of
image data – the equivalent of perhaps 1 Gigabyte/second
The wonder and preeminent role of vision in our world can hardly be overstated,
and today’s digital dialect allows us to quantify this. In fact, our eyes and minds
are not only acquiring and storing this much data, but processing it for a multitude
of tasks, from 3-D rendering to color processing, from segmentation to pattern
recognition, from scene analysis and memory recall to image understanding and
finally data archiving, all in real time. Rudimentary computer science analogs of
similar image processing functions can consume tens of thousands of operations
per pixel, corresponding to perhaps operations/s. In the end, this continuous

data stream is stored in what must be the ultimate compression system, utilizing
a highly prioritized, time-dependent bit allocation method. Yet despite the high
density mapping (which is lossy—nature chose lossy compression!), on important
enough data sets we can reconstruct images (e.g., events) with nearly perfect clarity.
While estimates on the brain’s storage capacity are indeed astounding (e.g.,
B [1]), even such generous capacity must be efficiently used to permit the
full breadth of human capabilities (e.g., a weekend’s worth of video data, if stored
“raw,” would alone swamp this storage). However, there is fairly clear evidence that
sensory information is not stored raw, but highly processed and stored in a kind
of associative memory. At least since Plato, it has been known that we categorize
objects by proximity to prototypes (i.e., sensory memory is contextual or “object-
oriented”), which may be the key.
What we can do with computers today isn’t anything nearly so remarkable. This
is especially true in the area of pattern recognition over diverse classes of structures.
Nevertheless, an exciting new development has taken place in this digital arena that
has captured the imagination and talent of researchers around the globe—wavelet
image compression. This technology has deep roots in theories of vision (e.g. [2])
and promises performance improvements over all other compression methods, such
as those baaed on Fourier transforms, vectors quantizers, fractals, neural nets, and
many others. It is this revolutionary new technology that we wish to present in this
edited volume, in a form that is accessible to the largest readership possible. A first
glance at the power of this approach is presented below in figure 1, where we achieve
1. Introduction 2
a dramatic 200:1 compression. Compare this to the international standard JPEG,
which cannot achieve 200:1 compression on this image; at its coarsest quantization
(highest compression), it delivers an interesting example of cubist art, figure 2.
FIGURE 1. (a) Original shuttle image, image, in full color (24 b/p); (b) Shuttle
image compressed 200:1 using wavelets.
FIGURE 2. The shuttle image compressed by the international JPEG standard, using the
coarsest quantization possible, giving 176:1 compression (maximum).

If we could do better pattern recognition than we can today, up to the level of
“image understanding,”
then neural nets or some similar learning-based technology
could potentially provide the most valuable avenue for compression, at least for
purposes of subjective interpretation. While we may be far from that objective,
1. Introduction 3
wavelets offer the first advance in this direction, in that the multiresolution image
analysis appears to be well-matched to the low-level characteristics of human vision.
It is an exciting challenge to develop this approach further and incorporate addi-
tional aspects of human vision, such a spectral response characteristics, masking,
pattern primitives, etc. [2].
Computer data compression is, of course, a powerful, enabling technology that
is playing a vital role in the Information Age. But while the technologies for ma-
chine data acquisition, storage, and processing are witnessing their most dramatic
developments in history, the ability to deliver that data to the broadest audiences
is still often hampered by either physical limitations, such as available spectrum
for broadcast TV, or existing bandlimited infrastructure, such as the twisted cop-
per telephone network. Of the various types of data commonly transferred over
networks, image data comprises the bulk of the bit traffic; for example, current
estimates indicate that image data transfers take up over 90% of the volume on
the Internet. The explosive growth in demand for image and video data, cou-
pled with these and other delivery bottlenecks, mean that compression technol-
ogy is at a premium. While emerging distribution systems such as Hybrid Fiber
Cable Networks (HFCs), Asymmetric Digital Subscriber Lines (ADSL), Digital
Video/Versatile Discs (DVDs), and satellite TV offer innovative solutions, all of
these approaches still depend on heavy compression to be viable. A Fiber-To-The-
Home Network could potentially boast enough bandwidth to circumvent the need
for compression for the forseeable future. However, contrary to optimistic early pre-
dictions, that technology is not economically feasible today and is unlikely to be
widely available anytime soon. Meanwhile, we must put digital imaging “on a diet.”

The subject of digital dieting, or data compression, divides neatly into two cate-
gories: lossless compression, in which exact recovery of the original data is ensured;
and lossy compression, in which only an approximate reconstruction is available.
The latter naturally requires further analysis of the type and degree of loss, and its
suitability for specific applications. While certain data types cannot tolerate any
loss (e.g., financial data transfers), the volume of traffic in such data types is typ-
ically modest. On the other hand, image data, which is both ubiquitous and data
intensive, can withstand surprisingly high levels of compression while permitting
reconstruction qualities adequate for a wide variety of applications, from consumer
imaging products to publishing, scientific, defense, and even law enforcement imag-
ing. While lossless compression can offer a useful two-to-one (2:1) reduction for
most types of imagery, it cannot begin to address the storage and transmission bot-
tlenecks for the most demanding applications. Although the use of lossless compres-
sion techniques is sometimes tenaciously guarded in certain quarters, burgeoning
data volumes may soon cast a new light on lossy compression. Indeed, it may be
refreshing to ponder the role of lossy memory in natural selection.
While lossless compression is largely independent of lossy compression, the reverse
is not true. On the face of it, this is hardly surprising since there is no harm (and
possibly some value) in applying lossless coding techniques to the output of any
lossy coding system. However, the relationship is actually much deeper, and lossy
compression relies in a fundamental way on lossless compression. In fact, it is only
a slight exaggeration to say that the art of lossy compression is to “simplify” the
given data appropriately in order to make lossless compression techniques effective.
We thus include a brief tour of lossless coding in this book devoted to lossy coding.
1. Introduction 4
2 Compression Standards
It is one thing to pursue compression as a research topic, and another to develop
live imaging applications based on it. The utility of imaging products, systems and
networks depends critically on the ability of one system to “talk” with another—
interoperabilty. This requirement has mandated a baffling array of standardization

efforts to establish protocols, formats, and even classes of specialized compression
algorithms. A number of these efforts have met with worldwide agreement leading
to products and services of great public utility (e.g., fax), while others have faced
division along regional or product lines (e.g., television). Within the realm of digital
image compression, there have been a number of success stories in the standards
efforts which bear a strong relation to our topic: JPEG, MPEG-1, MPEG-2, MPEG-
4, H.261, H.263. The last five deal with video processing and will be touched upon
in the section on video compression; JPEG is directly relevant here.
The JPEG standard derives its name from the international body which drafted
it: the Joint Photographic Experts Group, a joint committee of the International
Standards Organization (ISO), the International Telephone and Telegraph Con-
sultative Committee (CCITT, now called the International Telecommunications
Union-Telecommunications Sector, ITU-T), and the International Electrotechnical
Commission (IEC). It is a transform-based coding algorithm, the structure of which
will be explored in some depth in these pages. Essentially, there are three stages
in such an algorithm: transform, which reorganizes the data; quantization, which
reduces data complexity but incurs loss; and entropy coding, which is a lossless cod-
ing method. What distinguishes the JPEG algorithm from other transform coders
is that it uses the so-called Discrete Cosine Transform (DCT) as the transform,
applied individually to 8-by-8 blocks of image data.
Work on the JPEG standard began in 1986, and a draft standard was approved in
1992 to become International Standard (IS) 10918-1. Aspects of the JPEG standard
are being worked out even today, e.g., the lossless standard, so that JPEG in its
entirety is not completed. Even so, a new international effort called JPEG2000, to
be completed in the year 2000, has been launched to develop a novel still color image
compression standard to supersede JPEG. Its objective is to deliver a combination
of improved performance and a host of new features like progressive rendering, low-
bit rate tuning, embedded coding, error tolerance, region-based coding, and perhaps
even compatibility with the yet unspecified MPEG-4 standard. While unannounced
by the standards bodies, it may be conjectured that new technologies, such as

the ones developed in this book, offer sufficient advantages over JPEG to warrant
rethinking the standard in part. JPEG is fully covered in the book [5], while all of
these standards are covered in [3].
3 Fourier versus Wavelets
The starting point of this monograph, then, is that we replace the well-worn 8-by-8
DCT by a different transform: the wavelet tranform (WT), which, for reasons to be
clarified later, is now applied to the whole image. Like the DCT, the WT belongs to
a class of linear, invertible, “angle-preserving” transforms called unitary transforms.
1. Introduction 5
However, unlike the DCT, which is essentially unique, the WT has an infinitude of
instances or realizations, each with somewhat different properties. In this book, we
will present evidence suggesting that the wavelet transform is more suitable than
the DCT for image coding, for a variety of realizations.
What are the relative virtues of wavelets versus Fourier analysis? The real strength
of Fourier-based methods in science is that oscillations—waves—are everywhere in
nature. All electromagnatic phenomena are associated with waves, which satisfy
Maxwell's (wave) equations. Additionally, we live in a world of sound waves, vibra-
tional waves, and many other waves. Naturally, waves are also important in vision,
as light is a wave. But visual information—what is in images—doesn’t appear to
have much oscillatory structure. Instead, the content of natural images is typically
that of variously textured objects, often with sharp boundaries. The objects them-
selves, and their texture, therefore constitute important structures that are often
present at different “scales.” Much of the structure occurs at fine scales, and is of
low “amplitude” or contrast, while key structures often occur at mid to large scales
with higher contrast. A basis more suitable for representing information at a variety
of scales, with local contrast changes as well as larger scale structures, would be a
better fit for image data; see figure 3.
The importance of Fourier methods in signal processing comes from station-
arity assumptions on the statistics of signals. Stationary signals have a “spec-
tral” representation. While this has been historically important, the assumption

of stationarity—that the statistics of the signal (at least up to 2nd order) are con-
stant in time (or space, or whatever the dimensionality)—may not be justified for
many classes of signals. So it is with images; see figure 4. In essence, images have
locally varying statistics, have sharp local features like edges as well as large homo-
geneous regions, and generally defy simple statistical models for their structure. As
an interesting contrast, in the wavelet domain image in figure 5, the local statis-
tics in most parts of the image are fairly consistent, which aids modeling. Even
more important, the transform coefficients are for the most part very nearly zero
in magnitude, requiring few bits for their representation.
Inevitably, this change in transform leads to different approaches to follow-on
stages of quantization and even entropy coding to a lesser extent. Nevertheless, a
simple baseline coding scheme can be developed in a wavelet context that roughly
parallels the structure of the JPEG algorithm; the fundamental difference in the
new approach is only in the transform. This allows us to measure in a sense the
value added by using wavelets instead of DCT. In our experience, the evidence is
conclusive: wavelet coding is superior. This is not to say that the main innovations
discussed herein are in the transform. On the contrary, there is apparently fairly
wide agreement on some of the best performing wavelet “filters,” and the research
represented here has been largely focused on the following stages of quantization
and encoding.
Wavelet image coders are among the leading coders submitted for consideration
in the upcoming JPEG2000 standard. While this book is not meant to address the
issues before the JPEG2000 committee directly (we just want to write about our
favorite subject!), it is certainly hoped that the analyses, methods and conclusions
presented in this volume may serve as a valuable reference—to trained researchers
and novices alike. However, to meet that objective and simultaneously reach a broad
audience, it is not enough to concatenate a collection of papers on the latest wavelet
1. Introduction 6
FIGURE 3. The time-frequency structure of local Fourier bases (a), and wavelet bases
(b).

FIGURE 4. A typical natural image (Lena), with an analysis of the local histogram vari-
ations in the image domain.
algorithms. Such an approach would not only fail to reach the vast and growing
numbers of students, professionals and researchers, from whatever background and
interest, who are drawn to the subject of wavelet compression and to whom we are
principally addressing this book; it would perhaps also risk early obsolescence. For
the algorithms presented herein will very likely be surpassed in time; the real value
and intent here is to educate the readers of the methodology of creating compression
1. Introduction 7
FIGURE 5. An analysis of the local histogram variations in the wavelet transform domain.
algorithms, and to enable them to take the next step on their own. Towards that
end, we were compelled to provide at least a brief tour of the basic mathematical
concepts involved, a few of the compression paradigms of the past and present, and
some of the tools of the trade. While our introductory material is definitely not
meant to be complete, we are motivated by the belief that even a little background
can go a long way towards bringing the topic within reach.
4 Overview of Book
This book is divided into four parts: (I) Background Material, (II) Still Image
Coding, (III) Special Topics in Image Coding, and (IV) Video Coding.
4.1 Part I: Background Material
Part I introduces the basic mathematical structures that underly image compression
algorithms. The intent is to provide an easy introduction to the mathematical
concepts that are prerequisites for the remaider of the book. This part, written
largely by the editor, is meant to explain such topics as change of bases, scalar and
vector quantization, bit allocation and rate-distortion theory, entropy coding, the
discrete-cosine transform, wavelet filters, and other related topics in the simplest
terms possible. In this way, it is hoped that we may reach the many potential
readers who would like to understand image compression but find the research
literature frustrating. Thus, it is explicitly not assumed that the reader regularly
reads the latest research journals in this field. In particular, little attempt is made

to refer the reader to the original sources of ideas in the literature, but rather the
1. Introduction 8
most accessible source of reference is given (usually a book). This departure from
convention is dictated by our unconventional goals for such a technical subject. Part
I can be skipped by advanced readers and researchers in the field, who can proceed
directly to relevant topics in parts II through IV.
Chapter 2 (“Preliminaries”) begins with a review of the mathematical concepts
of vector spaces, linear transforms including unitary transforms, and mathematical
analysis. Examples of unitary transforms include Fourier Transforms, which are at
the heart of many signal processing applications. The Fourier Transform is treated
in both continuous and discrete time, which leads to a discussion of digital signal
processing. The chapter ends with a quick tour of probability concepts that are
important in image coding.
Chapter 3 (“Time-Frequency Analysis, Wavelets and Filter Banks”) reviews the
continuous Fourier Transform in more detail, introduces the concepts of transla-
tions, dilations and modulations, and presents joint time-frequency analysis of sig-
nals by various tools. This leads to a discussion of the continuous wavelet transform
(CWT) and time-scale analysis. Like the Fourier Transform, there is an associated
discrete version of the CWT, which is related to bases of functions which are transla-
tions and dilations of a fixed function. Orthogonal wavelet bases can be constructed
from multiresolution analysis, which then leads to digital filter banks. A review of
wavelet filters, two-channel perfect reconstruction filter banks, and wavelet packets
round out this chapter.
With these preliminaries, chapter 4 (“Introduction to Compression”) begins the
introduction to compression concepts in earnest. Compression divides into lossless
and lossy compression. After a quick review of lossless coding techniques, including
Huffman and arithmetic coding, there is a discussion of both scalar and vector
quantization — the key area of innovation in this book. The well-known Lloyd-
Max quantizers are outlined, together with a discussion of rate-distortion concepts.
Finally, examples of compression algorithms are given in brief vignettes, covering

vector quantization, transforms such as the discrete-cosine transform (DCT), the
JPEG standard, pyramids, and wavelets. A quick tour of potential mathematical
definitions of image quality metrics is provided, although this subject is still in its
formative stage.
Chapter 5 (“Symmetric Extension Transforms”) is written by Chris Brislawn, and
explains the subtleties of how to treat image boundaries in the wavelet transform.
Image boundaries are a significant source of compression errors due to the disconti-
nuity. Good methods for treating them rely on extending the boundary, usually by
reflecting the point near the boundary to achieve continuity (though not a contin-
uous derivative). Preferred filters for image compression then have a symmetry at
their middle, which can fall either on a tap or in between two taps. The appropri-
ate reflection at boundaries depends on the type of symmetry of the filter and the
length of the data. The end result is that after transform and downsampling, one
preserves the sampling rate of the data exactly, while treating the discontinuity at
the boundary properly for efficient coding. This method is extremely useful, and is
applied in practically every algorithm discussed.
1. Introduction 9
4.2 Part II: Still Image Coding
Part II presents a spectrum of wavelet still image coding techniques. Chapter 6
(“Wavelet Still Image Coding: A Baseline MSE and HVS Approach”) by Pankaj
Topiwala presents a very low-complexity image coder that is tuned to minimizing
distortion according to either mean-squared error or models of human visual system.
Short integer wavelets, a simple scalar quantizer, and a bare-bones arithmetic coder
are used to get optimized compression speed. While use of simplified image models
mean that the performance is suboptimal, this coder can serve as a baseline for
comparision for more sophisticated coders which trade complexity for performance
gains. Similar in spirit is chapter 7 by Alen Docef et al (“Image Coding Using
Multiplier-Free Filter Banks”), which employs multiplication-free subband filters for
efficient image coding. The complexity of this approach appears to be comparable
to that of chapter 6’s, with similar performance.

Chapter 8 (“Embedded Image Coding Using Zerotrees of Wavelet Coefficients”)
by Jerome Shapiro is a reprint of the landmark 1993 paper in which the concept
of zerotrees was used to derive a rate-efficient, embedded coder. Essentially, the
correlations and self-similarity across wavelet subbands are exploited to reorder
(and reindex) the tranform coefficients in terms of “significance” to provide for
embedded coding. Embedded coding means the a single coded bitstream can be
decoded at any bitrate below the encoding rate (with optimal performance) by
simple truncation, which can be highly advantageous for a variety of applications.
Chapter 9 (“A New, Fast and Efficient Image Codec Based on Set Partitioning in
Hierarchical Trees”) by Amir Said and William Pearlman, presents one of the best-
known wavelet image coders today. Inspired by Shapiro’s embedded coder, Said-
Pearlman developed a simple set-theoretic data structure to achieve a very efficient
embedded coder, improving upon Shapiro’s in both complexity and performance.
Chapter 10 (“Space-Frequency Quantization for Wavelet Image Coding”) by Zix-
iang Xiong, Kannan Ramchandran and Michael Orchard, develops one of the most
sophisticated and best-performing coders in the literature. In addition to using
Shapiro’s advance of exploiting the structure of zero coefficients across subbands,
this coder uses iterative optimization to decide the order in which nonzero pixels
should be nulled to achieve the best rate-distortion performance. A further in-
novation is to use wavelet packet decompositions, and not just wavelet ones, for
enhanced performance. Chapter 11 (“Subband Coding of Images Using Classifica-
tion and Trellis Coded Quantization”) by Rajan Joshi and Thomas Fischer presents
a very different approach to sophisticated coding. Instead of conducting an itera-
tive search for optimal quantizers, these authors attempt to index regions of an
image that have similar statistics (classification, say into four classes) in order to
achieve tighter fits to models. A further innovation is to use “trellis coded quanti-
zation,” which is a type of vector quantization in which codebooks are themselves
divided into disjoint codebooks (e.g., successive VQ) to achieve high-performance
at moderate complexity. Note that this is a “forward-adaptive” approach in that
statistics-based pixel classification decisions are made first, and then quantization is

applied; this is in distinction from recent “backward-adaptive” approaches such as
[4] that also perform extremely well. Finally, Chapter 12 (“Low-Complexity Com-
pression of Run Length Coded Image Subbands”) by John Villasenor and Jiangtao
Wen innovates on the entropy coding approach rather than in the quantization as
1. Introduction 10
in nearly all other chapters. Explicitly aiming for low-complexity, these authors
consider the statistics of quantized and run-length coded image subbands for good
statistical fits. Generalized Gaussian source statistics are used, and matched to a
set of Goulomb-Rice codes for efficient encoding. Excellent performance is achieved
at a very modest complexity.
4.3 Part III: Special Topics in Still Image Coding
Part III is a potpourri of example coding schemes with a special flavor in either ap-
proach or application domain. Chapter 13 (“Fractal Image Coding as Cross-Scale
Wavelet Coefficient Prediction”) by Geoffrey Davis is a highly original look at
fractal image compression as a form of wavelet subtree quantization. This insight
leads to effective ways to optimize the fractal coders but also reveals their limi-
tations, giving the clearest evidence available on why wavelet coders outperform
fractal coders. Chapter 14 (“Region of Interest Compression in Subband Coding”)
by Pankaj Topiwala develops a simple second-generation coding approach in which
regions of interest within images are exploited to achieve image coding with vari-
able quality. As wavelet coding techniques mature and compression gains saturate,
the next performance gain available is to exploit high-level content-based criteria to
trigger effective quantization decisions. The pyramidal structure of subband coders
affords a simple mechanism to achieving this capability, which is especially relevant
to surveillance applications. Chapter 15 (“Wavelet-Based Embedded Multispectral
Image Compression”) by Pankaj Topiwala develops an embedded coder in the con-
text of a multiband image format. This is an extension of standard color coding,
in which three spectral bands are given (red, green, and blue) and involves a fixed
color transform (e.g., RGB to YUV, see the chapter) followed by coding of each
band separately. For multiple spectral bands (from three to hundreds) a fixed spec-

tral transform may not be efficient, and a Karhunen-Loeve Transform is used for
spectral decorrelation.
Chapter 16 (“The FBI Fingerprint Image Compression Standard”) by Chris Bris-
lawn is a review of the FBI’s Wavelet Scalar Quantization (WSQ) standard by one
its key contributors. Set in 1993, WSQ was a landmark application of wavelet
transforms for live imaging systems that signalled their ascendency. The standard
was an early adopter of the now famous Daubechies9/7 filters, and uses a spe-
cific subband decomposition that is tailor-made for fingerprint images. Chapter 17
(“Embedded Image Coding using Wavelet Difference Reduction”) by Jun Tian and
Raymond Wells Jr. presents what is actually a general-purpose image compression
algorithm. It is similar in spirit to the Said-Pearlman algorithm in that it uses sets
of (in)significant coefficients and successive approximation for data ordering and
quantization, and achieves similar high-quality coding. A key application discussed
is in the compression of synthetic aperature radar (SAR) imagery, which is criti-
cal for many military surveillance applications. Finally, chapter 18 (“Block Trans-
forms in Progressive Image Coding”) by Trac Tran and Truong Nguyen presents
an update on what block-based transforms can achieve, with the lessons learned
from wavelet image coding. In particular, they develop advanced, overlapping block
coding techniques, generalizing an approach initiated by Enrico Malvar called the
Lapped Orthogonal Transform to achieve extremely competitive coding results, at
some cost in transform complexity.
1. Introduction 11
4.4 Part IV: Video Coding
Part IV examines wavelet and pyramidal coding techniques for video data. Chapter
19 (“Review of Video Coding Standards”) by Pankaj Topiwala is a quick lineup of
relevant video coding standards ranging from H.261 to MPEG-4, to provide appro-
priate context in which the following contributions on video compression can be
compared. Chapter 20 (“Interpolative Multiresolution Coding of Advanced Tele-
vision with Compatible Subchannels”) by Metin
Martin Vetterli and Didier

LeGall is a reprint of a very early (1991) application of pyramidal methods (in
both space and time) for video coding. It uses the freedom of pyramidal (rather
than perfect reconstruction subband) coding to achieve excellent coding with bonus
features, such as random access, error resiliency, and compatibility with variable
resolution representation. Chapter 21 (“Subband Video Coding for Low to High
Rate Applications”) by Wilson Chung, Faouzi Kossentini and Mark Smith, adapts
the motion-compensation and I-frame/P-frame structure of MPEG-2, but intro-
duces spatio-temporal subband decompositions instead of DCTs.
Within the
spatio-temporal subbands, an optimized rate-allocation mechanism is constructed,
which allows for more flexible yet consistent picture quality in the video stream. Ex-
perimental results confirm both consistency as well as performance improvements
against MPEG-2 on test sequences. A further benefit of this approach is that it is
highly scalable in rate, and comparisions are provided against the H.263 standard
as well.
Chapter 22 (“Very Low Bit Rate Video Coding Based on Matching Pursuits”) by
Ralph Neff and Avideh Zakhor is a status report of their contribution to MPEG-4.
It is aimed at surpassing H.263 in performance at target bitrates of 10 and 24 kb/s.
The main innovation is to use not an orthogonal basis but a highly redundant dic-
tionaries of vectors (e.g., a “frame”) made of time-frequency-scale translates of a
single function in order to get greater compression and feature selectivity. The com-
putational demands of such representations are extremely high, but these authors
report fast search algorithms that are within potentially acceptable complexity
costs compared to H.263, while providing some performance gains. A key objec-
tive of MPEG-4 is to achieve object-level access in the video stream, and chapter
23 (“Object-Based Subband/Wavelet Video Compression”) by Soo-Chul Han and
John Woods directly attempts a coding approach based on object-based image seg-
mentation and object-tracking. Markov models are used for object transitions, and a
version of I-P frame coding is adopted. Direct comparisons with H.263 indicate that
while PSNRs are similar, the object-based approach delivers superior visual quality

at extremely low bitrates (8 and 24 kb/s). Finally, Chapter 24 (“Embedded Video
Subband Coding with 3D Set Partitioning in Hierarchical Trees (3D SPIHT)”) by
William Pearlman, Beong-Jo Kim and Zixiang Xiong develops a low-complexit y
motion-compensation free video coder based on 3D subband decompositions and
application of Said-Pearlman’s set-theoretic coding framework. The result is an
embedded, scalable video coder that outperforms MPEG-2 and matches H.263, all
with very low complexity. Furthermore, the performance gains over MPEG-2 are
not just in PSNR, but are visual as well.
1. Introduction 12
5 References
[1] A. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, 1989.
[2] D. Marr, Vision, W. H. Freeman and Co., 1982.
[3]
K. Rao and J. Hwang, Techniques and Standards for Image, Video and Audio
Coding, Prentice-Hall, 1996.
[4]
S. LoPresto, K. Ramchandran, and M. Orchard, “Image Coding Based on
Mixture Modelling of Wavelet Coefficients and a Fast Estimation-Quantization
Framework,” DCC97, Proc. Data Comp. Conf., Snowbird, UT, pp. 221-230,
March, 1997.
[5]
W. Penebaker and J. Mitchell, JPEG Still Image Data Compression Standard,
Van Nostrand, 1993.
Part I
Preliminaries
This page intentionally left blank

pankaj n. topiwala - wavelet image and video compression

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về