Tải bản đầy đủ (.pdf) (29 trang)

Tài liệu 52 Still Image Compression pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (361.99 KB, 29 trang )

Ramstad, T.A. “Still Image Compression”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c

1999byCRCPressLLC
52
Still Image Compression
1
Tor A. Ramstad
Norwegian University of Science
and Technology (NTNU)
52.1 Introduction
Signal Chain

Compressibility of Images

The Ideal Coding
System

Coding with Reduced Complexity
52.2 Signal Decomposition
Decomposition by Transforms

Decomposition by Filter
Banks

Optimal Transforms/Filter Banks

Decomposition


by Differential Coding
52.3 Quantization and Coding Strategies
Scalar Quantization

Vector Quantization

Efficient Use of
Bit-Resources
52.4 Frequency Domain Coders
The JPEG Standard

Improved Coders: State-of-the-Art
52.5 Fractal Coding
Mathematical Background

Mean-Gain-Shape Attractor
Coding

Discussion
52.6 Color Coding
References
52.1 Introduction
Digital representationofimagesisimportant for digital transmission and storage on differentmedia
suchasmagneticorlaserdisks. However,pictorialmaterialrequiresvastamountsofbitsifrepresented
throughdirectquantization. Asanexample,anSVGAcolorimagerequires3×600×800bytes = 1, 44
Mbytes when each color component is quantized using 1 byte per pixel, the amount of bytes that
can be stored on one standard 3.5-inch diskette. Itis therefore evident that compression (often called
coding) is necessary for reducing the amount of data [33].
In this chapter we address three fundamental questions concerning image compression:
• Why is image compression possible?

• What are the theoretical coding limits?
• Which pra ctical compression methods can be devised?
The first two questions concern statistical and structural properties of the image material and
human visual perception. Even if we were able to answer these questions accurately, the methodol-
1
Parts of this manuscript are based on Ramtad, T.A., Aase, S.O., and Husøy, J.H., Subband Compression of Images —
Principles and Examples, Elsevier Science Publishers BV,North Holland, 1995. Permission to use the material is given by
ELSEVIER Science Publishers BV.
c

1999 by CRC Press LLC
ogy for image compression (third question) does not follow thereof. That is, the practical coding
algorithms must be found otherwise. The bulk of the chapter will review image coding principles
and present some of the best proposed still image coding methods.
The prevailing technique for image coding is transform coding. This is part of the JPEG (Joint
Picture Expert Group) standard [14] as well as a part of all the existing video coding standards
(H.261, H.263, MPEG-1, MPEG-2) [15, 16, 17, 18]. Another closely related technique, subband
coding, is in some respects better, but has not yet been recognized by the standardization bodies. A
thirdtechnique, differential coding, has not been successfulforstillimagecoding, but isoftenusedto
code the lowpass-lowpass band in subband coders, and is an integral part of hybrid video coders for
removal of temporal redundancy. Vector quantization (VQ) is the ultimate technique if there were
no complexity constraints. Becauseallpracticalsystemsmusthavelimitedcomplexity, VQ is usually
usedasacomponentinamulti-componentcodingscheme. Finally,fractalorattraclorcodingisbased
on an idea far from other methods, but it is, nevertheless, strongly related to vector quantization.
For natural images, no exact digital representation exists because the quantization, which is an
integral part of digital representations, is a lossy technique. Lossy techniques w ill always add noise,
but the noise level and its characteristics can be controlled and depend on the number of bits per
pixel as well as the performance of the method employed. Lossless techniques will be discussed as a
component in other coding methods.
52.1.1 Signal Chain

We assume a model where the input signal is properly bandlimited and digitized by an appropriate
analog-to-digital converter. All subsequent processing in the encoder will be digital. The decoder is
alsodigital up tothedigital-to-analogconverter, which is followedbyalowpass reconstructionfilter.
Under idealized conditions, the interconnection of the signal chain excluding the compression
unit will be assumed to be noise-free. (In reality, the analog-to-digital conversion will rendera noise
powerwhichcanbeapproximatedby
2
/12,where isthequantizerinterval. Thisintervaldepends
on the number of bits, and we assume that it is so high that the contribution to the overall noise
from this process is neg ligible). The performance of the coding chain can then be assessed from the
differencebetweentheinputand outputofthedigitalcompressionunitdisregardingtheanalogpart.
Still images must be sampled on some two-dimensional grid. Several schemes are viable choices,
andtherearegoodreasonsforselectingnonrectangulargrids. However,tosimplify,rectangularsam-
pling will be considered only, and all filtering w ill be based on separable operations, first performed
on the rows and subsequently on the columns of the image. The theory is therefore presented for
one-dimensional models, only.
52.1.2 Compressibility of Images
There are two reasons why images can be compressed:
• All meaningful images exhibit some form of internal structure, often expressed through
statistical dependencies between pixels. We call this property signal redundancy.
• The human visual system is not perfect. This means that certain degradations cannot be
perceivedbyhumanobservers. Thedegreeofallowablenoiseiscalledirrelevancyorvisual
redundancy. If we furthermore accept visual degradation, we can exploit what might be
termed tolerance.
Inthis section we makesomespeculationsaboutthe compression potential resultingfromredun-
dancy and irrelevancy.
Thetwofundamentalconceptsinevaluatingacodingschemearedistortion,whichmeasuresquality
in the compressed signal, and rate, which measures how costly it is to transmit or store a signal.
c


1999 by CRC Press LLC
Distortion is a measure of the deviation between the encoded/decoded signal and the original
signal. Usually, distortion is measured by a single number for a given coder and bit rate. There are
numerous ways of mapping an error signal onto a single number. Moreover, it is hard to conceive
that a single number could mimic the quality assessment performed by a human observer. An easy-
to-use and well-known error measure is the mean square error (mse). The visual correctness of this
measureispoor. The human visual systemissensitivetoerrorsinshapesanddeterministic patterns,
but not so much in stochastic textures. The mse defined over the entire image can, therefore, be
entirely erroneous in the visual sense. Still, mse is the prevailing error measure, and it can be argued
that it reflects well small changes due to optimization in a given coder structure, but poor as for the
comparison between different models that create different noise characteristics.
Rate is defined as bits per pixel and is connected to the information content in a signal, which can
be measured by entropy.
A Lower Bound for Lossless Coding
To define image entropy, we introduce the set S containing all possible images of a certain size
and call the numberofimagesinthesetN
S
. Toexemplify, assume the image set under consideration
has dimension 512 × 512 pixels and each pixel is represented by 8 bits. The number of different
images that exist in this set is 2
512×512×8
, an overwhelming number!
Given the probability P
i
of each image in the set S,wherei ∈ N
S
is the index pointing to the
different images, the source entropy is given by
H =−


i∈N
S
P
i
log
2
P
i
. (52.1)
The entropy is a lower bound for the rate in lossless coding of the digital images.
A Lower Bound for Visually Lossless Coding
In order to incorporate perceptual redundancies, it is observed that all the images in the given
set cannot be distinguished visually. We therefore introduce visual entropy as an abstract measure
which incorporates distortion.
We now partition the image set into disjoint subsets, S
i
, in which all the different images have
similarappearance. Oneimagefromeachsubsetischosenastherepresentationimage. Thecollection
of these N
R
representation images constitutes a subset R, that is a set spanning all distinguishable
images in the original set.
Assume that image i ∈ R appears with probability
ˆ
P
i
. Then the visual entropy is defined by
H
V
=−


i∈N
R
ˆ
P
i
log
2
ˆ
P
i
. (52.2)
Theminimumattainablebitrateislowerbounded bythisnumberforimagecoderswithoutvisual
degradation.
52.1.3 The Ideal Coding System
Theoretically, we can approachthe visual entropylimitusingan unrealistic vector quantizer (VQ),in
conjunctionwithanidealentropycoder. Theprincipleof suchanoptimalcodingschemeisdescribed
next.
The set of representation images is stored in what is usually called a codebook. The encoder
and decoder have similar copies of this codebook. In the encoding process, the image to be coded
is compared to all the vectors in the codebook applying the visually correct distortion measure.
c

1999 by CRC Press LLC
The codebook member with the closest resemblance to the sample image is used as the coding
approximation. The corresponding codebook index (address) is entropy coded and transmitted to
the decoder. The decoder looks up the image located at the address given by the transmitted index.
Obviously, the above method is unrealistic. The complexity is beyond any practical limit both in
terms of storage and computational requirement. Also, the correct visual distortion measure is not
presently known. We should therefore only view the indicated coding strategy as the limit for any

coding scheme.
52.1.4 Coding with Reduced Complexity
Inpracticalcodingmethods,therearebasicallytwowaysofavoidingthe extreme complexityofideal
VQ. In the first method, the encoder oper ates on small image blocks rather than on the complete
image. This is obviously suboptimal because the method cannot profit fromthe redundancyoffered
bylargestructuresinanimage. Butthelargertheblocks, thebetterthemethod. Thesecondstrategy
is very different and applies some preprocessing on the image prior to quantization. The aim is to
remove statistical dependencies among the image pixels, thus avoiding representation of the same
information more than once. Both techniques are exploited in practical coders, either separately or
in combination.
A typical image encoder incorporating preprocessing is shown in Fig. 52.1.
FIGURE 52.1: Generic encoder structure block diagram. D = decomposition unit, Q = quantizer,
B = coder for minimum bit-representation.
Thefirstblock(D)decomposesthesignalintoasetofcoefficients. Thecoefficientsaresubsequently
quantized(inQ),andarefinallycodedtoaminimumbitrepresentation(inB). Thismodeliscorrect
for frequency domain coders, but in clos ed loop differential coders (DPCM), the decomposition and
quantizationisperformedinthesameblock,aswillbedemonstratedlater. Usuallythedecomposition
is exact. In fractal coding, the decomposition is replaced by approximate modeling.
Letusconsiderthedecoderandintroduceaseriesexpansionasaunifyingdescriptionofthedifferent
image representation methods:
ˆx(l) =

k
ˆa
k
φ
k
(l) . (52.3)
The formula represents the recombination of signal components. Here {ˆa
k

} are the coefficients (the
parameters in the representation), and {φ
k
(l)} are the basis functions. A major distinction between
coding methods is their set of basis functions, as will be demonstrated in the next section.
ThecompletedecoderconsistsofthreemajorpartsasshowninFig.52.2. Thefirstblock(I )receives
the bit representation which it partitions into entities representing the different coder parameters
and decodes them. The second block (Q
−1
) is a dequantizer which maps the code to the parametric
approximation. The third block (R) reconstructs the signal from the parameters using the series
representation.
c

1999 by CRC Press LLC
FIGURE 52.2: Block diagram of generic decoder structure. I = bit-representation decoder, Q
−1
=
inverse quantizer, R = signal reconstruction unit.
The second important distinction between compression structures is the coding of the series
expansion coefficients in terms of bits. This is dealt w ith in section 52.3.
52.2 Signal Decomposition
As introduced in the previous section, series expansion can be v iewed as a common tool to describe
signal decomposition. The choice of basis functions will distinguish different coders and influence
such featuresascoding gain and the types of distor tions presentinthedecodedimageforlowbitrate
coding. Possible classes of basis functions are:
1. Block-oriented basis functions.
• The basis functions can cover the whole signal length L. L linearly independent
basis functions will make a complete representation.
• Blocks of size N ≤ L can be decomposed individually. Transform coders operate in

this way. If the blocks are small, the decomposition can catch fast transients. On
the other hand, regions with constant features, such as smooth areas or textures,
require long basis functions to fully exploit the correlation.
2. Overlapping basis functions:
The length of the basis functions and the degree of overlap are important parameters.
The issue of reversibility of the system becomes nont rivial.
• In differential coding, one basis function is used over and over again, shifted by one
sample relative to the previous function. In this case, the basis function usually
varies slowly according to some adaptation criterion with respect to the local sig nal
statistics.
• In subband coding using a uniform filter bank, N distinct basis functions are used.
These are repeated over and over w ith a shift between each group by N samples.
The length of the basis functions is usually several times larger than the shifts ac-
commodating for handling fast transients as well as long-term correlations if the
basis functions taper off at both ends.
• The basis functions may be finite (FIR filters) or semi-infinite (IIR filters).
Both time domain and frequency domain properties of the basis functions are indicators of the
coder performance. It can be argued that decomposition, whether it is p erformed by a transform
or a filter bank, represents a spect ral decomposition. Coding gain is obtained if the different output
channels are decorrelated. It is therefore desirable that the frequency responses of the different basis
functions are localized and separate in frequency. At the same time, they must cover the whole
frequency band in order to make a complete representation.
c

1999 by CRC Press LLC
The desire to have highly localized basis functions to handle transients, with localized Fourier
transforms to obtain good coding gain, are contr adictory requirementsdue to the Heisenberg uncer-
tainty relation[33]betweenafunctionandits Fouriertransform. Theselectionofthebasis functions
must be a compromise between these conflicting requirements.
52.2.1 Decomposition by Transforms

When nonoverlapping block transforms are used, the Karhunen-Lo
`
eve transform decorrelates, in
a statistical sense, the signal within each block completely. It is composed of the eigenvectors of
the correlation matrix of the signal. This means that one either has to know the signal statistics in
advance or estimate the correlation matrix from the image itself.
Mathematically the eigenvalue equation is given by
R
xx
h
n
= λ
n
h
n
. (52.4)
If the eigenvectors are column vectors, the KLT matrix is composed of the eigenvectors h
n
,n=
0, 1, ···,N − 1, as its rows:
K =

h
0
h
1
h
N−1

T

. (52.5)
The decomposition is performed as
y = Kx .
(52.6)
The eigenvalues are equal to the power of each transform coefficient.
Inpractice,theso-calledCosineTransform (oftypeII)isusuallyusedbecauseitisafixedtransform
anditisclosetotheKLT when the signal can be described as a first-orderautoregressiveprocesswith
correlation coefficient close to 1.
The cosine transform of length N in one dimension is given by:
y(k) =

2
N
α(k)
N−1

n=0
x(n) cos
(2n + 1)kπ
2N
,k= 0, 1, ···,N − 1 ,
(52.7)
where
α(0) =
1

2
and α(k) = 1 for k = 0 .
(52.8)
The inverse transform is similar except that the scaling factor α(k) is inside the summation.

Many other transforms have been suggested in the literature (DFT, Hadamard Transform, Sine
Transform, etc.), but none of these seem to have any significance today.
52.2.2 Decomposition by Filter Banks
Uniform analysis and synthesis filter banks are shown in Fig. 52.3.
Inthe analysis filterbank the input signal is split in contiguousand slightly overlappingfrequency
bands denoted subbands. An ideal frequency partitioning is shown in Fig. 52.4.
If the analysis filter bank was able to decorrelate the signal completely, the output signal would be
white. For all practical signals, complete decorrelation requires an infinite number of channels.
In the encoder the symbol ↓ N indicates decimation by a factor of N. By performing this deci-
mation in each of the N channels, the total number of samples is conserved from the system input
to decimator outputs. With the channel arrangement in Fig. 52.4, the decimation also serves as a
demodulator. All channels will haveabasebandrepresentation in the frequency range [0,π/N]after
decimation.
c

1999 by CRC Press LLC
FIGURE 52.3: Subband coder system.
FIGURE 52.4: Ideal frequency partitioning in the analysis channel filters in a subband coder.
Thesynthesisfilterbank, asshowninFig . 52.3,consistsofN brancheswith interpolatorsindicated
by ↑ N and bandpass filters arranged as the filters in Fig. 52.4.
The reconstruction formula constitutes the following ser ies expansion of the output signal:
ˆx(l) =
N−1

n=0


k=−∞
e
n

(k)g
n
(l − kN) , (52.9)
where{e
n
(k), n = 0, 1, ,N−1,k=−∞, ,−1, 0, 1, ,∞}aretheexpansioncoefficients
representing the quantized subband signals and {g
n
(k), n = 0, 1, ,N} are the basis functions,
which are implemented as unit sample responses of bandpass filters.
Filter Bank Structures
Through the last two decades, an extensive literature on filter banks and filter bank st ructures
has evolved. Perfectreconstruction (PR) is often considereddesirable in subband coding systems. It
isnotatrivialtasktodesignsuchsystemsduetothedownsamplingrequiredtomaintainaminimum
sampling rate. PR filter banks are often called identity systems. Certain filter bank structures
inherently guarantee PR.
It is beyond the scope of this chapter to give a comprehensive treatment of filter banks. We shall
only present different alternative solutions at an overview level, and in detail discuss an important
two-channel system with inherent perfect reconstruction properties.
We can distinguish between different filter banks based on several properties. In the following,
five classifications are discussed.
1. FIR vs. IIR filters — Although IIR filters have an attractive complexity, their inherent
long unit sample response and nonlinear phase are obstacles in image coding. The
unit sample response length influences the ringing problem, which is a main source of
c

1999 by CRC Press LLC
objectionabledistortioninsubbandcoders. Thenonlinearphasemakestheedgemirroring
technique [30] for efficient coding of images near their borders impossible.
2. Uniform vs. nonuniform filter banks — This issue concerns the spectrum partioning

in frequency subbands. Currently it is the general conception that nonuniform filter
banks perform better than uniform filter banks. There are two reasons for that. The first
reason is that our visual system also performs a nonuniform partioning, and the coder
should mimic the type of receptor for which it is designed. The second reason is that
the filterbank should be able to copewith slowlyvarying signals (correlation overalarge
region) as well as transients that are short and represent high frequency signals. Ideally,
the filterbanksshouldbeadaptive(andgoodexamplesofadaptivefilterbankshavebeen
demonstrated in the literature [2, 11]), but w ithout adaptivity one filter bank has to be a
good compromise between the two extreme cases cited above. Nonuniform filter banks
can give the best tradeoff in terms of space-frequency resolution.
3. Parallel vs. tree-structured filter banks — The parallel filter banks are the most general,
buttree-structuredfilterbanksenjoyalargepopularity,especiallyforoctaveband(dyadic
frequencypartitioning) filter banks as they are easily constructed and implemented. The
popular subclass of filter banks denoted wavelet filter banks or wavelet transforms belong
tothis class. For octave band partioning, the tree-structured filter banks are as general as
the parallel filter banks when perfect reconstruction is required [4].
4. Linear phase vs. nonlinear phase filters — There is no general consensus about the
optimality of linear phase. In fact, the traditional wavelet transforms cannot be made
linear phase. There are, however, three indications that linear phase should be chosen.
(1) The noise in the reconstructed image will be antisymmetrical around edges with
nonlinear phase filters. This does not appear to be visually pleasing. (2) The mirror
extensiontechnique [30]cannotbeusedfornonlinear phase filters. (3) Practical coding
gain optimizations have given better results for linear than nonlinear phase filters.
5. Unitaryvs. nonunitarysystems—Aunitaryfilterbankhasthesameanalysisandsynthesis
filters(exceptforareversaloftheunitsampleresponsesinthesynthesisfilterswithrespect
totheanalysisfilterstomaketheoverallphaselinear). Becausetheanalysisandsynthesis
filters play different roles, it seems plausible that they, in fact, should not be equal. Also,
the gain can be larger, as demonstrated in section 52.2.3, for nonunitary filter banks as
long as straightforward scalar quantization is performed on the subbands.
Several other issues could be taken into consideration when optimizing a filter bank. These are,

among others, the actual frequency partitioning including the number of bands, the length of the
individual filters, and other design criteria than coding gain to alleviate coding artifacts, especially
at low rates. As an example of the last requirement, it is important that the different phases in the
reconstruction processgeneratethesame noise; in otherwords, thenoiseshouldbe stationary rather
thancyclo-stationary. Thismaybeguaranteedthroughrequirementsonthenormsoftheunitsample
responses of the polyphase components [4].
The Two-Channel Lattice Str ucture
A versatile perfect reconstruction system can be built from two-channel substructures based
on lattice filters [36]. The analysis filter bank is shown in Fig. 52.5. It consists of delay-free blocks
given in matrix forms as
η =

ab
cd

,
(52.10)
and single delaysinthelowerbranchbetween eachblock. Attheinput,thesignalismultiplexed into
the two branches, which also constitutes the decimation in the analysis system.
c

1999 by CRC Press LLC
FIGURE 52.5: Multistage two-channel lattice analysis lattice filter bank.
FIGURE 52.6: Multistage two-channel polyphase synthesis lattice filter bank.
A similar synthesis filter st ructure is shown in Fig. 52.6. In this case, the lattices are given by the
inverse of the matrix in Eq. 52.10:
η
−1
=
1

ad −bc

d −b
−ca

,
(52.11)
and the delays are in the upper branches. It is not hard to realize that the two systems are inverse
systems provided ad − bc = 0,exceptforasystemdelay.
As the structure can be extended as much as wanted, the flexibilit y is good. The filters can be
made unitary orthey can have a linear phase. In the unitary case, the coefficients arerelatedthrough
a = d = cosφ and b =−c = sinφ, whereas in the linear phase case, the coefficients are a = d = 1
and b = c. In the linear phase case, the last block (η
L
) must be a Hadamard transform.
Tree Structured Filter Banks
In tree-structured filter banks, the signal is first split in two channels. The resulting outputs
are input to a second stage with further separation. This process can go on as indicated in Fig. 52.7
for a system where at every stage the outputs are split further until the required resolution has been
obtained.
Tree-structured systems have a rather high flexibility. Nonuniform filter banks are obtained by
splitting only some of the outputs at each stage. To guarantee perfect reconstruction, each stage in
the synthesis filter bank (Fig. 52.7) must reconstruct the input signal to the corresponding analysis
filter.
52.2.3 Optimal Transforms/Filter Banks
The gain in subband and transform coders depends on the detailed construction of the filter bank as
well as the quantization scheme.
Assumethattheanalysisfilterbankunitsampleresponsesaregivenby{h
n
(k), n = 0, 1, ,N−1}.

The corresponding unit sample responses of the synthesis filters are required to have unit norm:
L−1

k=0
g
2
n
(k) = 1 .
c

1999 by CRC Press LLC
FIGURE 52.7: Left: Tree structured analysis filter bank consisting of filter blocks where the signal
is split in two and decimated by a factor of two to obtain critical sampling. Right: Corresponding
synthesis filter bank for recombination and interpolation of the signals.
The coding gain of a subband coder is defined as the ratio between the noise using scalar quan-
tization (PCM) and the subband coder noise incorporating optimal bit-allocation as explained in
section 52.3:
G
SBC
=

N−1

n=0
σ
2
x
n
σ
2

x

−1/N
(52.12)
Here σ
2
x
isthevarianceof theinputsignalwhile{σ
2
x
n
,n= 0, 1 ,N−1}arethesubband variances
given by
σ
2
x
n
=


l=−∞
R
xx
(l)


j=−∞
h
n
(j)h

n
(l + j) (52.13)
=

π
−π
S
xx
(e

)|H
n
(e

)|
2


.
(52.14)
The subband variances depend both on the filters and the second order spectral information of the
input signal.
For images, the gain is often estimated assuming that the image can be modeled as a first order
Markov source (also called an AR(1) process) characterized by
R
xx
(l) = σ
2
x
0.95

|l|
. (52.15)
(Strictly speaking, the model is valid only after removal of the image average).
Weconsider the maximum gain using this model for three special cases. The first is the transform
coder performance, which is an important reference as all image and video coding standards are
based ontransform coding. T he second is for unitary filterbanks,forwhichoptimalityisreachedby
usingidealbrick-wallfilters. The thirdcaseis for nonunitary filterbanks,oftendenoted biorthogonal
when the perfect reconstruction property is guaranteed. In the nonunitary case, halfwhitening is
obtained within each band. Mathematically this can be seen from the optimal magnitude response
for the filter in channel n:
|H
n
(e

)|=

c
2

S
xx
(e

)
σ
2
x

−1/4
for ω ∈±[

πn
N
,
π(n+1)
N
]
0 otherwise,
(52.16)
c

1999 by CRC Press LLC
where c
2
is a constant that can be selected for correct gain in each band.
The inverse operation must be performed in the synthesis filter to make completely flat responses
within each band.
In Fig. 52.8, we give optimal coding gains as a function of the number of channels.
FIGURE 52.8: Maximum coding gain as function of the number of channels for different one-
dimensional coders operating on a first order Markov source with one-delay correlation ρ = 0.95.
Lower curve: Cosine transform. Middle curve: Unitary filter bank. Upper curve: Unconstrained
filter bank. Nonunitary case.
52.2.4 Decomposition by Differential Coding
Inclosed-loopdifferentialcoding,thegenericencoderstructure(Fig.52.1)isnotvalidasthequantizer
is placed inside a feedback loop. The decoder, however, behaves according to the generic decoder
structure. Basic block diagrams of a closed-loop differential encoder and the corresponding decoder
are shown in Figs. 52.9(a) and (b), respectively.
In the encoder, the input signal x is represented by the bit-stream b. Q is the quantizer and Q
−1
the dequantizer, but QQ
−1

= 1, except for the case of infinite resolution in the quantizer. The
signald, whichisquantizedandtransmittedbysomebinary code,isthedifferencebetweentheinput
signal andapredictedvalueof the inputsignalbasedonprevious outputs andapredictionfilterwith
transfer functionG(z) = 1/(1−P(z)). Noticethat thedecoderisa substructureoftheencoder,and
that ˜x = x in the limiting case of infinite quantizer resolution. The last property guarantees exact
representation when discarding quantization.
Introducing the inverse z-transform of G(z) as g(l), the reconstruction is performed on the
dequantized values as
˜x(l) =


k=0
e(k)g(l − k) . (52.17)
Theoutputisthusalinearcombinationofunitsampleresponsesexcitedbythesampleamplitudes
at different times and, can be viewed as a series expansion of the output signal. Inthis case, the basis
c

1999 by CRC Press LLC
FIGURE 52.9: (a) DPCM encoder. (b) DPCM decoder.
functions are generated by shifts of a single basis function [the unit sample response g(l)] and the
coefficients represent the coded difference signal e(n).
Withanadaptivefilterthebasisfunctionw illvaryslowly,dependingonsomespectr almodification
derived from the incoming samples.
52.3 Quantization and Coding Strategies
Quantization is the means of providing approximations to signals and signal parameters by a finite
numberofrepresentationlevels. This processisnonreversibleandthus alwaysintroducesnoise. The
representation levels constitute a finite alphabet which is usually represented by binary symbols, or
bits. Themappingfromsymbolsinafinitealphabettobitsisnotunique. Someimportanttechniques
for quantization and coding will be reviewed next.
52.3.1 Scalar Quantization

The simplest quantizer is the scalar quantizer. It can be optimized to match the probability density
function (pdf) of the input signal.
A scalar quantizer maps a continuous variable x to a finite set according to the rule
x ∈ R
i
⇒ Q[x]=y
i
, (52.18)
where R
i
= (x
i
,x
i+1
), i = 1, ,L, are nonoverlapping, contiguous intervals covering the real
line, and (·, ·) denotes open, half open, or closed intervals. {y
i
,i= 1, 2, ,L} arereferredto
as representation levels or reconstruction values. The associated values {x
i
} defining the partition are
referred to as decision levels or decision thresholds. Fig. 52.10 depicts the representation and decision
levels.
FIGURE 52.10: Quantization notation.
c

1999 by CRC Press LLC
In a uniform quantizer, all intervals are of the same length and the representation levels are the
midpoints in each interval. Furthermore, in a uniform threshold quantizer, the decision levels form
a uniform partitioning of the real line, while the representation levels are the centroids (see below)

in each decision interval. Str ictly speaking, uniform quantizers consist of an infinite number of
intervals. In practice,thenumberofintervals isadaptedto thedynamicrangeofthesignal. Allother
quantizers are non-uniform.
The optimization task is to minimize the average distortion between the original samples and the
appropriate representation levels g iven the number of levels. This is the so-called pdf-optimized
quantizer. Allowing for variable rate per sy mbol, the entropy constrained quantizer can be used.
These schemes are described in the following two subsections.
The Lloyd-Max Quantizer
The Lloyd-Max quantizer is a scalar quantizer where the 1st order signal pdf is exploited to
increasethequantizerperformance. Itisthereforeoftenreferredtoasapdf-optimizedquantizer. Each
signal sample is quantized using the same number of bits. The optimization is done by minimizing
thetotaldistortion of aquantizerwithagivennumberL ofrepresentationlevels. Foraninput signal
X with pdf p
X
(x), the average mean square distortion is
D =
L

i=1

x
i+1
x
i
(x − y
i
)
2
p
X

(x)dx . (52.19)
Minimization of D leads to the following implicit expressions connecting the decision and repre-
sentation levels:
x
k,opt
=
1
2
(y
k,opt
+ y
k−1,opt
), k = 1, ,L− 1 (52.20)
x
0,opt
=−∞ (52.21)
x
L,opt
=∞ (52.22)
y
k,opt
=

x
k+1,opt
x
k,opt
xp
X
(x)dx


x
k+1,opt
x
k,opt
p
X
(x)dx
,k= 0, ,L− 1 .
(52.23)
Equation 52.20 indicates that the decision levels should be the midpoints between neighboring
representationlevels,while Eq. 52.23 requiresthat the optimal representation levels are the centroids
of the pdf in the appropriate interval.
The equations can be solved iteratively [21]. For high bit rates it is possible to derive approximate
formulas assuming that the signal pdf is flat within each quantization inter val [21].
In most practical situations the pdf is not known, and the optimization is based on a training set.
This will be discussed in section 52.3.2.
Entropy Constrained Quantization
When minimizing the total distortion for a fixed number of possible representation levels,
we have tacitly assumed that every signal sample is coded using the same number of bits: log
2
L
bits/sample. Ifweallowforavariablenumberofbitsforcodingeachsample,afurtherrate-distortion
advantageisgained. TheLloyd-Maxsolutionisthennolongeroptimal. Anewoptimizationisneeded,
leading to the entropy constrained quantizer.
Athighbitrates,theoptimum isreachedwhenusinga uniform quantizerwithaninfinitenumber
oflevels. Atlowbitrates,uniformquantizersperformclosetooptimumprovided therepresentation
levelsareselectedasthecentroidsaccordingtoEq.52.23. Theperformanceoftheentropyconstrained
quantizer is significantly better than the performance of the Lloyd-Max quantizer [21].
c


1999 by CRC Press LLC
A standard algorithm for assigning codewords of variable length to the representation levels was
given by Huffman [12]. The Huffman code will minimize the average rate for a given set of proba-
bilitiesandtheresultingaverage bitratewill be close totheentropybound. Evencloserperformance
to the bound is obtained by arithmetic coders [32].
Athighbitrates,scalarquantization on statistically independent samplesrendersa bit ratewhichis
atleast0.255bits/samplehigherthantherate distortionboundirrespectiveofthesignalpdf. Huffman
coding of the quantizer output typically gives a somewhat higher rate.
52.3.2 Vector Quantization
Simultaneous quantization of several samples is referred to as vector quantization (VQ) [9], as men-
tioned in the introductory section. VQ is a generalization of scalar quantization:
AvectorquantizermapsacontinuousN -dimensionalvectorx toadiscrete-valuedN-dimensional
vector according to the rule
x ∈ C
i
⇒ Q[x]=y
i
, (52.24)
where C
i
is an N-dimensional cell. The L possible cells are nonoverlapping and contiguous and
fill the entire geometric space. The vectors {y
i
} correspond to the representation levels in a scalar
quantizer. In a VQ setting the collection of representation levels is referred to as the codebook. The
cells C
i
,alsocalledVoronoiregions, correspondtothedecisionregions,andcanbethoughtof assolid
polygons in the N -dimensional space.

Inthescalarcase, it is trivial to test if a signal sample belongs toagiveninterval. InVQ an indirect
approach is utilized via a fidelity criterion or distortion measure d(·, ·):
Q[x]=y
i
⇐⇒ d(x, y
i
) ≤ d(x, y
j
), j = 0, ,L− 1 . (52.25)
When the best match, y
i
, has been found, the index i identifies that vector and is therefore coded
asanefficientrepresentationofthevector. T he receivercanthenreconstructthevectory
i
bylooking
up the contents of cell number i in a copy of the codebook. Thus, the bit rate in bits per sample in
this scheme is log
2
L/N when using straightforward bit-representation for i. A block diagram of
vector quantization is shown in Fig. 52.11.
FIGURE 52.11: Vector quantization procedure.
c

1999 by CRC Press LLC
In the previous section we stated that scalar entropy coding was sub-optimal, even for sources
producing independent samples. The reason for the sub-optimal performance of the entropy con-
strained quantizer is a phenomenon called sphere packing. In addition to obtaining good sphere
packing, a VQ scheme also exploits both correlation and higher order statistical dependencies of a
signal. Thehigherorderstatisticaldependencycanbethoughtofas“apreferenceforcertainvectors”.
Excellent examples of spherepacking and higher order statistical dependencies can be found in [28].

Inprinciple, the codebookdesign is based on the N-dimensionalpdf. Butasthepdfisusuallynot
known, the codebook is optimized from a training data set. This set consists of a large number of
vectors that are representative for the signal source. A sub-optimal codebook can then be designed
using an iterative algorithm, for example the K-means or LBG algorithm [25].
Multistage Vector Quantization
To alleviate the complexity problems of vector quantization, several methods have been sug-
gested. They allintroducesome structureintothecodebookwhichmakesfastsearchpossible. Some
systems also reduce storage requirements, like the one we present in this subsection. The obtainable
performance is always reduced, but the perfor mance in an implementable coder can be improved.
Fig. 52.12 illustrates the encoder structure.
FIGURE 52.12: K-stage VQ encoder structure showing the successive approximation of the signal
vector .
The first block in the encoder makes a rough approximation to the input vector by selecting
the codebook vector which, upon scaling by e
1
, is closest in some distortion measure. Then this
approximation is subtracted from the input signal. In the second stage, the difference signal is
approximated by a vector from the second codebook scaled by e
2
. This procedure continues in K
stages,andcanbethoughtofasasuccessiveapproximationtotheinputvector. Theindices{i(k), k =
1, 2, ···,K}are transmitted as part of the code for the particular vector under consideration.
Comparedto unstructured VQ, this method is suboptimal but has a much lower complexity than
the optimal case due to the small codebooks that can be used.
A special case is the mean-gain-shape VQ [9], where one stage only is kept, but in addition the
mean is represented separately.
In all multistage VQs, the code consists of the codebook address and codes for the quantized
versions of the scaling coefficients.
52.3.3 Efficient Use of Bit-Resources
Assume we have a signal that can be split in classes with different statistics. As an example, after

applyingsignaldecomposition,thedifferenttransformcoefficientstypicallyhavedifferentvariances.
Assumealso that we have a pool of bits to be used for representing a collection of signal vectors from
the different classes, or we try to minimize the number of bits to be used after all signals have been
quantized. These two situations are described below.
c

1999 by CRC Press LLC
Bit Allocation
AssumethatasignalconsistsofN components{x
i
,i= 1, 2, ···,N}forming avectorx where
the variance of component number i is equal to σ
2
x
i
and all components are zero mean.
Wewanttoquantizethevectorxusingscalarquantizationoneachofthecomponentsandminimize
the total distortion with the only constraint that the total number of bits to be used for the whole
vector be fixed and equal to B. Denoting the quantized signal components Q
i
(x
i
), the average
distortion per component can be written as
D
DS
=
1
N
N


i=1
E[x
i
− Q
i
(x
i
)]
2
=
1
N
N

i=1
D
i
, (52.26)
where E[·]is the expectation operator, and the subscript DS stands for decomposed source.
The bit-constraint is given by
B =
N

i=1
b
i
, (52.27)
where b
i

is the number of bits used to quantize component number i.
Minimizing D
DS
with Eq. 52.27 as a constraint, we obtain the following bit assignment
b
j
=
B
N
+
1
2
log
2
σ
2
x
j


N
n=1
σ
2
x
n

1/N
. (52.28)
This formula will in general render noninteger and even negative values of the bit count. So-called

“greedy” algorithms can be used to avoid this problem.
To evaluate the coder performance, we use coding gain. It is defined as the distortion advantage of
the component-wise quantizationoveradirectscalarquantizationatthesame rate. Fortheexample
at hand, the coding gain is found to be
G
DS
=
1
N

N
j=1
σ
2
x
j
(

N
j=1
σ
2
n
j
)
1/N
. (52.29)
Thegainisequaltotheratiobetweenthearithmeticmeanandthegeometricmeanofthecomponent
variances. Theminimumvalueofthevarianceratioisequalto1 whenallthecomponentvariancesare
equal. Otherwise,thegainislargerthanone. Using theoptimal bitallocation, thenoisecontribution

is equal in all components.
If we assume that the different components are obtained by passing the signal through a bank
of bandpass filters, then the variance from one band is given by the integral of the power spectr al
density overthatband. Iftheprocessisnon-white,thevariancesaremoredifferent themorecolored
the original spectrum is. The maximum possible gain is obtained when the number of bands tends
to infinity [21]. Then the gain is equal to the maximum gain of a differential coder which again is
inversely proportional to the spectral flatness measure [21]givenby
γ
2
x
=
exp[

π
−π
ln S
xx
(e

)


]

π
−π
S
xx
(e


)


, (52.30)
where S
xx
(e

) is the spectral density of the input signal. In both subband coding and differential
coding, the complexity of the systems must approach infinity to reach the coding gain limit.
To be able to apply bit allocation dynamically to non-stationary sources,the decoder must receive
information about the local bit allocation. This can be done either by transmitting the bit allocation
c

1999 by CRC Press LLC
table, or the variances from which the bit allocation was derived. Forrealimages where the statistics
vary rapidly, the cost of transmitting the side information may become costly, especially for low rate
coders.
Rate Allocation
Assumewehavethesamesignalcollectionasabove. Thistimewewanttominimizethenumber
of bits to be used after the signal components have been quantized. The first order entropy of the
decomposedsource will be selected as the measure for the obtainable minimum bit-rate when scalar
representation is specified.
To simplify, assume all signal components are Gaussian. The entropy of a Gaussian source with
zero mean and variance σ
2
x
and statistically independent samples quantized by a uniform quantizer
with quantization interval  can, for high rates, be approximated by
H

G
(X) =
1
2
log
2
(2πe(σ
x
/)
2
). (52.31)
Theratedifference[24]betweendirectscalarquantizationofthesignalcollectionusingoneentropy
coder and the rate when using an adapted entropy coder for each component is
H = H
PCM
− H
DS
=
1
2
log
2
σ
2
x
[

N
i=1
σ

2
x
i
]
1/N
, (52.32)
provided the decomposition is power conserving, meaning that
σ
2
x
=
N

i=1
σ
2
x
i
. (52.33)
The coding gain in Eq. 52.29 and the rate gain in Eq. 52.32 are equivalent for Gaussian sources.
In order to exploit this result in conjunction with signal decomposition, we can view each output
component as a stationary source, each with different signal statistics. The variances will depend on
thespectrumof theinputsig nal. FromEq. 52.32andEq. 52.33weseethattheratedifferenceis larger
the more different the channel variances are.
To obtain the rate gain indicated by Eq. 52.32, different Huffman or arithmetic coders [9] adapted
totherategiveninEq.52.31mustbeemployed. Inpractice,apoolofsuchcodersshouldbegenerated
and stored. During encoding, the closest fitting coder is chosen for each block of components. An
index indicating which coder was used is transmitted as side information to enable the decoder to
reinterpret the received code.
52.4 Frequency Domain Coders

In this section we present the JPEG standard and some of the best subband coders that have been
presented in the literature.
52.4.1 The JPEG Standard
TheJPEGcoder[37] is the only internationally standardized still image coding method. Presently
there is an international effort to bring forth a new, improved standard under the title JPEG2000.
The principle can be sketched as follows: First, the image is decomposed using a two-dimensional
cosine transform of size 8 × 8. Then, the transform coefficients are arranged in an 8 × 8 matrix
as given in Fig. 52.13,wherei and j are the horizontal and vertical frequency indices, respectively.
c

1999 by CRC Press LLC
A vector is formed by a scanning sequence which is chosen to make large amplitudes, on average,
appear first, and smaller amplitudes at the end of the scan. In this arrangement, the samples at the
end ofthescanstring approachzero. The scan vectorisquantized inanon-uniform scalar quantizer
with characteristics as depicted in Fig. 52.14.
FIGURE 52.13: Zig-zag scanning of the coefficient matrix.
FIGURE 52.14: Non-uniform quantizer characteristic obtained by combining a midtread uniform
quantizer and a thresholder.  is the quantization interval and T is the threshold.
Duetothethresholder,manyof thetrailingcoefficientsinthe scanvectoraresettozero. Oftenthe
zero values appear in clusters. This property is exploited by using runlength coding, which basically
amounts to finding zero-runs. After runlength coding, each run is represented by a number pair
(a, r) where the number a is the amplitude and r is the length of the run. Finally, the number pair
is entropy coded using the Huffman method, or arithmetic coding.
Thethresholdingwillincreasethedistortionandlowertheentropybothw ith andwithoutdecom-
c

1999 by CRC Press LLC
position, although not necessarily with the same amounts.
As can be observed from Fig. 52.13, the coefficient in position (0,0) is not part of the string. This
coefficient represents the block average. After collecting all block averages in one image, this image

is coded using a DPCM scheme [37].
Coding results for three images are given in Fig. 52.16.
52.4.2 Improved Coders: State-of-the-Art
Many coders that outperform JPEG have been presented in the scientific literature. Most of these
are based on subband decomposition (or the special case: wavelet decomposition). Subband coders
have a higher potential coding gain by using filter banks rather than tr ansforms, and thus exploiting
correlations over larger image areas. Figure 52.8 shows the theoretical gain for a stochastic image
model. Visually, subband coders can avoid the blocking-effects experienced in transform coders
at low bit-rates. This property is due to the overlap in basis functions in subband coders. On the
other hand, Gibb’s phenomenon is more prevalent in subband coders and can cause severe ringing
in homogeneous areas close to edges. The detailed choice and optimization of the filter bank will
strongly influence the visual performance of subband coders. The other factor which decides the
codingqualityisthedetailedquantizationofthesubbandsignals. Thefinalbit-representationmethod
does not effect the quality, only the rate for a given quality.
Depending onthebit-representation, the totalratecanbepresetforsomecoders, and will depend
on some quality factor specified for other coders. Even though it would be desirable to preset the
visual quality in a coder, this is a challenging task, which has not yet been satisfactorily solved.
Inthe following we present four subband coders with differentcodingschemesanddifferentfilter
banks.
Subband Coder Based on Entropy Coder Allocation [24]
This coder uses an 8 × 8 uniform filter bank optimized for reducing blocking and ringing
artifacts, plus maximizing the coding gain [1]. The lowpass-lowpass band is quantized using a fixed
rate DPCM coder with a third-order two-dimensional predictor. The other subband signals are
segmentedintoblocksofsize4×4, andeachblockisclassifiedbasedontheblock power. Depending
on the block power, each block is allocated a corresponding entropy coder (implemented as an
arithmeticcoder). Theentropycodershavebeenpreoptimizedbyminimizingthefirst-orderentropy
giventhe numberofavailableentropycoders(Seesection52.3.3). This numberisselected tobalance
the amount of side information necessary in the decodertoidentify the correctentropy decoder and
the gain by using more entropy coders. Depending on the bit-rate, the number of entropy coders is
typically 3 to 5. In the presented results, three arithmetic coders are used. Conditional arithmetic

coding has been used to represent the side information efficiently.
Coding results are presented in Fig. 52.16 under the name “Lervik”.
Zero-Tree Coding
Shapiro [35] introduced a method that exploits some dependency between pixels in corre-
sponding location in the bands of an octave band filter bank. The basic assumed dependencies are
illustrated in Fig. 52.15. The low-pass band is coded separately. Starting in any location in any of
the other three bands of same size, any pixel will have an increasing number of descendants as one
passes down the tree representing information from the same location in the original image. The
number of corresponding pixels increases by a factor of four from one level to the next. When used
inacodingcontext,thetreeisterminatedatanyzero-valuedpixel(obtainedafter quantizationusing
some threshold) after which all subsequent pixels are assumed to be zero as well. Due to the growth
by a factor of four between levels, many samples can be discarded this way.
c

1999 by CRC Press LLC
FIGURE 52.15: Zero-tree arrangement in an octave-band decomposed image.
What is the underlying mechanism that makes this technique work so well? On one hand, the
image spectrum falls off rapidly as a function of frequency for most images. This means that there
is a tendency to have many zeros when approaching the leaves of the tree. Our visual system is
furthermore more tolerant to high frequency errors. This should be comparedtothe zig-zag scan in
theJPEGcoder. Ontheotherhand,v iewedfromapurestatisticalangle,thesubbandsareuncorrelated
if the filter bank has done what is required from it! However, the statistical argument is based on
the assumption of “local ergodicity”, which means that statistical parameters derived locally from
the data have the same mean values everywhere. With real images composed of objects with edges,
textures, etc. these assumptions do not hold. The “activity” in the subbands tends to appear in the
same locations. This is typical at edges. One can look at these connections as energy correlations
among the subbands. The zero-tree method will efficiently cope with these types of phenomena.
Shapirofurthermorecombinedthezero-treerepresentationwithbit-planecoding. Said [34]went
one step further and introduced what he calls se t partitioning. The resulting algorithm is simple
and fast, and is embedded in the sense that the bit-stream can be cut off at any point in time in the

decoder, and the obtained approximation is optimal using that number of bits. The subbands are
obtained using the 9/7 biorthogonal spline filters [38].
Coding results from Said’s coder are shown in Fig. 52.16 and mar ked “Said”.
Pyramid VQ and Improved Filter Bank
This coder is based on bit-allocation, or rather, allocation of vector quantizers of different
sizes. This implies that the coder is fixed rate, that is, we can preset the total number of bits for an
image. It is assumed that the subband signals have a Laplacian distribution, which makes it possible
to apply pyramid vector quantizers [6]. These are suboptimal compared to trained codebook vector
quantizers,butsignificantlybetterthanscalarquantizerswithoutincreasingthecomplexitytoomuch.
The signal decomposition in the encoder is performed using an 8 × 8 channel uniform filter
bank [1], followed by an octave-band filter bank of three stages operating on the resulting lowpass-
lowpass band. The unifor m filter bank is nonunitary and optimized for coding gain. The building
blocks of the octave band filter bank have been carefully selected from all available perfect recon-
struction, two-channel filter systems with limited FIR filter orders.
Coding results from this coder are shown in Figs. 52.16 and are marked “Balasingham”.
c

1999 by CRC Press LLC
Trellis Coded Quantization
Joshi [22] has presented what is presently the “state-of-the-art” coder. Being based on trellis
codedquantization[29],theencoderismorecomplexthantheothercoderspresented. Furthermore,
it does not have the embedded character of Said’s coder.
Thefilterbankemployedhas22subbands. Thisisobtainedbyfirstemployinga4×4uniformfilter
bank,followedbyafurther splitofthe resultinglowpass-lowpassbandusinga two-stageoctaveband
filter bank. All filters in the cur ves shown in the next section are 9/7 biorthogonal spline filters [38].
The encoding of the subbands is performed in several stages:
• Separate classification of signal blocks in each band.
• Rate allocation among all blocks.
• Individual arithmetic coding of the trellis-coded quantized signals in each class.
Thetrelliscoded quantization[7]isamethod that canreachtheratedistortionboundinthe same

wayasvectorquantization. Itusessearchmethodsintheencoder, which adds to its complexity. The
decoder is much simpler.
Coding results from this coder are shown in Fig. 52.16 and are marked “Joshi”.
Frequency Domain Coding Results
The five coders presented above are compared in this section. All of them are simulated using
the threeimages“Lenna”,“Barbara”,and“Goldhill”ofsize 512 × 512. Thesethree imageshavequite
different contents in terms of spectrum, textures, edges, and so on. Fig. 52.16 shows the PSNR as a
function of bit-rate for the five coders. The PSNR is defined as
PSNR = 10 log
10







255
2
1
NM
N

n=1
M

m=1
(x(n, m) −ˆx(n, m))
2








.
(52.34)
Asisobserved,thecodingquality amongthecodersvarieswhenexposedtosuchdifferentstimuli.
The exception is that all subband coders are superior to JPEG, which was expected from the use of
better decomposition as well as more clever quantization and coding strategies. Joshi’s coder is best
for “Lenna” and “Goldhill” at high rates. Balasingham’s coder is, however, better for “Barbara” and
for “Goldhill” at low rates. These results are interpreted as follows. The Joshi coder uses the most
elaborate quantization/coding scheme, but the Balasingham coder applies a better filter bank in two
respects. First, ithasbetterhigh frequencyresolution,whichexplainsthatthe“Barbara”image,with
a relatively high frequency content, gives a better result for the latter coder. Second, the improved
low frequency resolution of this filter bank also implies better coding at low rates for “Goldhill”.
From the results above, it is also observed that the Joshi coder performs well for images with a
lowpass character such as the “Lenna” image, especially at low rates. In these cases there are many
“zeros”toberepresented,andthezero-treecodingcantypically copewellwith zero-representations.
A combination of several of the aforementioned coders, picking up their best components, would
probably render an improved system.
52.5 Fractal Coding
Thissectionisplacedtowardsthe end ofthechapterbecausefractalcodingdeviatesinmanyrespects
fromthegenericcoderontheonehand,butontheotherhandcanbecomparedtovectorquantization.
c

1999 by CRC Press LLC
FIGURE 52.16: Coding results. Top: “Lenna”, middle: “Barbara”,bottom: “Goldhill”.
c


1999 by CRC Press LLC
A good overview of the field can be found in [8].
Fractal coding (also called attractor coding) is based on Banach’s fixed point theorem and exploits
self-similarity orpartial self-similarity amongdifferentscalesofagivenimage. Anonlineartransform
gives the fractal image representation. Iterative operations using this transform starting from any
initial image will converge to the image approximation, called the attractor. The success of such a
schemewillrestuponthecompactness,intermsofbits,ofthedescriptionofthenonlineartransform.
A classical example of self-similarity is Michael Barnsley’s fern, where each branch is a small copy
of the complete fern. Even the branches are composed of small copies of itself. A very compact
description can be found for the class of images exhibiting self similarity. In fact, the fern can be
described by 24 numbers, according to Barnsley.
Self-similarity is a dependency among image elements (possibly objects) that is not described by
correlation, but can be called affine correlation.
There is an enormous potential for image compression if images really have the self-similarity
property. However, there seems to be no reason to believe that global self-similarity exists in any
complex image created, e.g., by photographing natural or man-made scenes. The less requiring
notion of partial self-similarity among image blocks of different scales has provento be fruitful [19].
In this section we will, in fact, present a practical fractal coder exploiting partial self-similarity
among different scales, which can be directly compared to mean-gain-shape vector quantization
(MGSVQ). The difference between the two systems is that the vector quantizer uses an optimized
codebook based on data from a large collection of different images, whereas the fractal coder uses
a self codebook, in the sense that the codebook is generated from the image itself and implicitly and
approximately transmitted to the receiver as part of the image code. The question is then, “Is the
‘adaptive’nature ofthefractalcodebookbetterthanthestatisticallyoptimizedcodebookofstandard
vector quantization?”
We will also comment on other models and give a brief status of fractal compression techniques.
52.5.1 Mathematical Background
Thecodeofanimageinthelanguageoffractalcodingisgivenasthebit-representationofanonlinear
transform T . The transform defines what is called the collage x

c
of the image. The collage is found
by
x
c
= Tx,
where x is the original image.
The collage is the object we try to make resemble the image as closely as possible in the encoder
through minimization of the distortion function
D = d(x,x
c
). (52.35)
Usually the distortion function is chosen as the Euclidean distance between the two vectors. T he
decodercannotreconstruct thecollageas it dependsontheknowledge oftheoriginalimage,andnot
only the transform T . We therefore have to accept reconstruction of the image with less accuracy.
The reconstruction algorithm is based on Banach’s fixed point theorem: If a transform T is con-
tractive or eventually contractive [26], the fixed point theorem states that the transform then has a
unique attractor or fixed p oint given by
x
T
= Tx
T
, (52.36)
and that the fixed point can be approached by iteration from any starting vector according to
x
T
= lim
n→∞
T
n

y;∀y ∈ X, (52.37)
c

1999 by CRC Press LLC
where X is a normed linear space.
The similarity between the collage and the attractor is indicated from an extended version of the
collage theorem [27]:
Given an original image x and its collage Tx where x − Tx≤, then
x − x
T
≤
1 − s
K
1
(1 − s
1
)(1 − s
K
)

(52.38)
wheres
1
and s
K
are the Lipschitz constants of T and T
K
, respectively, provided |s
1
| < 1 and |s

K
| < 1.
Provided the collageisagoodapproximationoftheoriginal image and the Lipschitzconstantsare
small enough, there w ill also be similarity between the original image and the attractor.
In the special case of fractal block coding, a given image block (usually called a domain block)is
supposed to resemble another block (usually called a range block) after some affine transformation.
The transformation that is most commonlyusedmoves the image block to a differentpositionwhile
shrinking theblock,rotatingit or shufflingthepixels,andaddingwhat wedenoteafixedterm, which
could be some predefined function with possible parameters to be decided in the encoding process.
Inmost naturalimagesitisnotdifficulttofindaffine similarity, e.g., intheformofobjects situatedat
different distances and positions in relation to the camera. In standard block coding methods, only
local statistical dependencies can be utilized. The inclusion of affine redundancies should therefore
offer some extra advantage.
In this formalism we do not see much resemblance with VQ. However, the similarities and differ-
ences between fr actal coding and VQ were pointed out already in the original work by Jacquin [20].
We shall, in the following section, present a specific model that enforces further similarity to VQ.
52.5.2 Mean-Gain-Shape Attractor Coding
It has been proven [31] that in all cases where each domain block is a union of range blocks, the
decoding algorithms for sampled images where the nonlinear part (fixed term) of the transform is
orthogonal to the image transformed by the linear part, full convergence is reached after a finite and
small number of iterations. Inone special case there are no iterations at all [31], and then x
T
= Tx.
Weshall discuss only this important case here because it has an important application potential due
toitssimplicityinthedecoder,but,moreimportantly,wecanmoreclearlydemonstratethesimilarity
to V Q.
Codebook Formation
In the encoder two tasks have to be performed, the codebook formation and the codebook
search, to find the best representation of the transform T with as few bits as possible.
First the image is split in non-overlapping blocks of size L × L so that the complete image is

covered.
Thecodebookconstructiongoesasfollows:
• Calculate the mean value m in each block.
• Quantize the mean values, resulting in the approximation ˆm, and transmit their code to
the receiver.
These values will serve two purposes:
1. They are the additive, nonlinear terms in the block transform.
2. They are the building elements for the codebook.
All the following steps must be performed both in the encoder and the decoder.
c

1999 by CRC Press LLC

×