Ramstad, T.A. “Still Image Compression”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c
1999byCRCPressLLC
52
Still Image Compression
1
Tor A. Ramstad
Norwegian University of Science
and Technology (NTNU)
52.1 Introduction
Signal Chain
•
Compressibility of Images
•
The Ideal Coding
System
•
Coding with Reduced Complexity
52.2 Signal Decomposition
Decomposition by Transforms
•
Decomposition by Filter
Banks
•
Optimal Transforms/Filter Banks
•
Decomposition
by Differential Coding
52.3 Quantization and Coding Strategies
Scalar Quantization
•
Vector Quantization
•
Efficient Use of
Bit-Resources
52.4 Frequency Domain Coders
The JPEG Standard
•
Improved Coders: State-of-the-Art
52.5 Fractal Coding
Mathematical Background
•
Mean-Gain-Shape Attractor
Coding
•
Discussion
52.6 Color Coding
References
52.1 Introduction
Digital representation of images is important for digital transmission and storage on different media
suchasmagnetic or laserdisks. However, pictorialmaterialrequiresvastamountsofbitsif represented
throughdirectquantization. Asanexample,anSVGAcolorimagerequires3×600×800bytes = 1, 44
Mbytes when each color component is quantized using 1 byte per pixel, the amount of bytes that
can be stored on one standard 3.5-inch diskette. It is therefore evident that compression (often called
coding) is necessary for reducing the amount of data [33].
In this chapter we address three fundamental questions concerning image compression:
• Why is image compression possible?
• What are the theoretical coding limits?
• Which practical compression methods can be devised?
The first two questions concern statistical and structural properties of the image material and
human visual perception. Even if we were able to answer these questions accurately, the methodol-
1
Parts of this manuscript are based on Ramtad, T.A., Aase, S.O., and Husøy, J.H., Subband Compression of Images —
Principles and Examples, Elsevier Science Publishers BV, North Holland, 1995. Permission to use the material is given by
ELSEVIER Science Publishers BV.
c
1999 by CRC Press LLC
ogy for image compression (third question) does not follow thereof. That is, the practical coding
algorithms must be found otherwise. The bulk of the chapter will review image coding principles
and present some of the best proposed still image coding methods.
The prevailing technique for image coding is transform coding. This is part of the JPEG (Joint
Picture Expert Group) standard [14] as well as a part of all the existing video coding standards
(H.261, H.263, MPEG-1, MPEG-2) [15, 16, 17, 18]. Another closely related technique, subband
coding, is in some respects better, but has not yet been recognized by the standardization bodies. A
third technique, differential coding, has not been successful for still image coding, but is often used to
code the lowpass-lowpass band in subband coders, and is an integral part of hybrid video coders for
removal of temporal redundancy. Vector quantization (VQ) is the ultimate technique if there were
no complexity constraints. Because all practical systems must have limited complexity, VQ is usually
used as a componentin a multi-component codingscheme. Finally, fractal or attraclor codingis based
on an idea far from other methods, but it is, nevertheless, strongly related to vector quantization.
For natural images, no exact digital representation exists because the quantization, which is an
integral part of digital representations, is a lossy technique. Lossy techniques will always add noise,
but the noise level and its characteristics can be controlled and depend on the number of bits per
pixel as well as the performance of the method employed. Lossless techniques will be discussed as a
component in other coding methods.
52.1.1 Signal Chain
We assume a model where the input signal is properly bandlimited and digitized by an appropriate
analog-to-digital converter. All subsequent processing in the encoder will be digital. The decoder is
also digital up to the digital-to-analog converter, which is followed by a lowpass reconstruction filter.
Under idealized conditions, the interconnection of the signal chain excluding the compression
unit will be assumed to be noise-free. (In reality, the analog-to-digital conversion will render a noise
powerwhich can be approximatedby
2
/12,whereis the quantizerinterval. Thisinterval depends
on the number of bits, and we assume that it is so high that the contribution to the overall noise
from this process is negligible). The performance of the coding chain can then be assessed from the
difference between the input and output of the digital compression unit disregarding the analog part.
Still images must be sampled on some two-dimensional grid. Several schemes are viable choices,
and there are good reasons for selecting nonrectangular grids. However, to simplify, rectangular sam-
pling will be considered only, and all filtering will be based on separable operations, first performed
on the rows and subsequently on the columns of the image. The theory is therefore presented for
one-dimensional models, only.
52.1.2 Compressibility of Images
There are two reasons why images can be compressed:
• All meaningful images exhibit some form of internal structure, often expressed through
statistical dependencies between pixels. We call this property signal redundancy.
• The human visual system is not perfect. This means that certain degradations cannot be
perceivedby humanobservers. The degree of allowablenoise iscalled irrelevancy or visual
redundancy. If we furthermore accept visual degradation, we can exploit what might be
termed tolerance.
In this section we make some speculations about the compression potential resulting from redun-
dancy and irrelevancy.
Thetwofundamental conceptsin evaluating acodingschemearedistortion, which measures quality
in the compressed signal, and rate, which measures how costly it is to transmit or store a signal.
c
1999 by CRC Press LLC
Distortion is a measure of the deviation between the encoded/decoded signal and the original
signal. Usually, distortion is measured by a single number for a given coder and bit rate. There are
numerous ways of mapping an error signal onto a single number. Moreover, it is hard to conceive
that a single number could mimic the quality assessment performed by a human observer. An easy-
to-use and well-known error measure is the mean square error (mse). The visual correctness of this
measure is poor. The human visual system is sensitive to errors in shapes and deterministic patterns,
but not so much in stochastic textures. The mse defined over the entire image can, therefore, be
entirely erroneous in the visual sense. Still, mse is the prevailing error measure, and it can be argued
that it reflects well small changes due to optimization in a given coder structure, but poor as for the
comparison between different models that create different noise characteristics.
Rate is defined as bits per pixel and is connected to the information content in a signal, which can
be measured by entropy.
A Lower Bound for Lossless Coding
To define image entropy, we introduce the set S containing all possible images of a certain size
and call the number of images in the set N
S
. To exemplify, assume the image set under consideration
has dimension 512 × 512 pixels and each pixel is represented by 8 bits. The number of different
images that exist in this set is 2
512×512×8
, an overwhelming number!
Given the probability P
i
of each image in the set S,wherei ∈ N
S
is the index pointing to the
different images, the source entropy is given by
H =−
i∈N
S
P
i
log
2
P
i
.
(52.1)
The entropy is a lower bound for the rate in lossless coding of the digital images.
A Lower Bound for Visually Lossless Coding
In order to incorporate perceptual redundancies, it is observed that all the images in the given
set cannot be distinguished visually. We therefore introduce visual entropy as an abstract measure
which incorporates distortion.
We now partition the image set into disjoint subsets, S
i
, in which all the different images have
similar appearance. One imagefrom each subsetis chosen as the representation image. The collection
of these N
R
representation images constitutes a subset R, that is a set spanning all distinguishable
images in the original set.
Assume that image i ∈ R appears with probability
ˆ
P
i
. Then the visual entropy is defined by
H
V
=−
i∈N
R
ˆ
P
i
log
2
ˆ
P
i
.
(52.2)
The minimum attainable bit rate is lower bounded by this number for image coders without visual
degradation.
52.1.3 The Ideal Coding System
Theoretically, we can approach the visual entropy limit using an unrealistic vector quantizer (VQ), in
conjunction with an ideal entropy coder. The principle of such an optimal coding scheme is described
next.
The set of representation images is stored in what is usually called a codebook. The encoder
and decoder have similar copies of this codebook. In the encoding process, the image to be coded
is compared to all the vectors in the codebook applying the visually correct distortion measure.
c
1999 by CRC Press LLC
The codebook member with the closest resemblance to the sample image is used as the coding
approximation. The corresponding codebook index (address) is entropy coded and transmitted to
the decoder. The decoder looks up the image located at the address given by the transmitted index.
Obviously, the above method is unrealistic. The complexity is beyond any practical limit both in
terms of storage and computational requirement. Also, the correct visual distortion measure is not
presently known. We should therefore only view the indicated coding strategy as the limit for any
coding scheme.
52.1.4 Coding with Reduced Complexity
In practical coding methods, there are basically two ways of avoiding the extreme complexity of ideal
VQ. In the first method, the encoder operates on small image blocks rather than on the complete
image. This is obviously suboptimal because the method cannot profit from the redundancy offered
by large structures in an image. But the larger the blocks, the better the method. The second strategy
is very different and applies some preprocessing on the image prior to quantization. The aim is to
remove statistical dependencies among the image pixels, thus avoiding representation of the same
information more than once. Both techniques are exploited in practical coders, either separately or
in combination.
A typical image encoder incorporating preprocessing is shown in Fig. 52.1.
FIGURE 52.1: Generic encoder structure block diagram. D = decomposition unit, Q = quantizer,
B = coder for minimum bit-representation.
Thefirstblock(D)decomposesthesignalintoasetofcoefficients. Thecoefficientsaresubsequently
quantized (in Q), and arefinally coded toa minimum bit representation(in B). This model is correct
for frequency domain coders, but in closed loop differential coders (DPCM), the decomposition and
quantizationis performedin the sameblock, aswill bedemonstrated later. Usuallythe decomposition
is exact. In fractal coding, the decomposition is replaced by approximate modeling.
Let usconsider thedecoder andintroduce a series expansionas a unifying description of the different
image representation methods:
ˆx(l) =
k
ˆa
k
φ
k
(l) .
(52.3)
The formula represents the recombination of signal components. Here {ˆa
k
} are the coefficients (the
parameters in the representation), and {φ
k
(l)} are the basis functions. A major distinction between
coding methods is their set of basis functions, as will be demonstrated in the next section.
Thecompletedecoderconsistsof three major parts as showninFig. 52.2. The firstblock (I)receives
the bit representation which it partitions into entities representing the different coder parameters
and decodes them. The second block (Q
−1
) is a dequantizer which maps the code to the parametric
approximation. The third block (R) reconstructs the signal from the parameters using the series
representation.
c
1999 by CRC Press LLC
FIGURE 52.2: Block diagram of generic decoder structure. I = bit-representation decoder, Q
−1
=
inverse quantizer, R = signal reconstruction unit.
The second important distinction between compression structures is the coding of the series
expansion coefficients in terms of bits. This is dealt with in section 52.3.
52.2 Signal Decomposition
As introduced in the previous section, series expansion can be viewed as a common tool to describe
signal decomposition. The choice of basis functions will distinguish different coders and influence
such features as coding gain and the types of distortions present in the decoded image for low bit rate
coding. Possible classes of basis functions are:
1. Block-oriented basis functions.
• The basis functions can cover the whole signal length L. L linearly independent
basis functions will make a complete representation.
• Blocks of size N ≤ L can be decomposed individually. Transform coders operate in
this way. If the blocks are small, the decomposition can catch fast transients. On
the other hand, regions with constant features, such as smooth areas or textures,
require long basis functions to fully exploit the correlation.
2. Overlapping basis functions:
The length of the basis functions and the degree of overlap are important parameters.
The issue of reversibility of the system becomes nontrivial.
• In differential coding, one basis function is used over and over again, shifted by one
sample relative to the previous function. In this case, the basis function usually
varies slowly according to some adaptation criterion with respect to the local signal
statistics.
• In subband coding using a uniform filter bank, N distinct basis functions are used.
These are repeated over and over with a shift between each group by N samples.
The length of the basis functions is usually several times larger than the shifts ac-
commodating for handling fast transients as well as long-term correlations if the
basis functions taper off at both ends.
• The basis functions may be finite (FIR filters) or semi-infinite (IIR filters).
Both time domain and frequency domain properties of the basis functions are indicators of the
coder performance. It can be argued that decomposition, whether it is performed by a transform
or a filter bank, represents a spectral decomposition. Coding gain is obtained if the different output
channels are decorrelated. It is therefore desirable that the frequency responses of the different basis
functions are localized and separate in frequency. At the same time, they must cover the whole
frequency band in order to make a complete representation.
c
1999 by CRC Press LLC
The desire to have highly localized basis functions to handle transients, with localized Fourier
transforms to obtain good coding gain, are contradictory requirements due to the Heisenberg uncer-
tainty relation [33] between a function and its Fourier transform. The selection of the basis functions
must be a compromise between these conflicting requirements.
52.2.1 Decomposition by Transforms
When nonoverlapping block transforms are used, the Karhunen-Lo
`
eve transform decorrelates, in
a statistical sense, the signal within each block completely. It is composed of the eigenvectors of
the correlation matrix of the signal. This means that one either has to know the signal statistics in
advance or estimate the correlation matrix from the image itself.
Mathematically the eigenvalue equation is given by
R
xx
h
n
= λ
n
h
n
.
(52.4)
If the eigenvectors are column vectors, the KLT matrix is composed of the eigenvectors h
n
,n=
0, 1,···,N− 1, as its rows:
K =
h
0
h
1
...h
N−1
T
.
(52.5)
The decomposition is performed as
y = Kx .
(52.6)
The eigenvalues are equal to the power of each transform coefficient.
In practice, the so-called Cosine Transform (of type II) is usually used because it is a fixed transform
and it is close to the KLT when the signal can be described as a first-order autoregressive process with
correlation coefficient close to 1.
The cosine transform of length N in one dimension is given by:
y(k) =
2
N
α(k)
N−1
n=0
x(n) cos
(2n + 1)kπ
2N
,k= 0, 1,···,N− 1 ,
(52.7)
where
α(0) =
1
√
2
and α(k) = 1 for k = 0 .
(52.8)
The inverse transform is similar except that the scaling factor α(k) is inside the summation.
Many other transforms have been suggested in the literature (DFT, Hadamard Transform, Sine
Transform, etc.), but none of these seem to have any significance today.
52.2.2 Decomposition by Filter Banks
Uniform analysis and synthesis filter banks are shown in Fig. 52.3.
In the analysis filter bank the input signal is split in contiguous and slightly overlapping frequency
bands denoted subbands. An ideal frequency partitioning is shown in Fig. 52.4.
If the analysis filter bank was able to decorrelate the signal completely, the output signal would be
white. For all practical signals, complete decorrelation requires an infinite number of channels.
In the encoder the symbol ↓ N indicates decimation by a factor of N. By performing this deci-
mation in each of the N channels, the total number of samples is conserved from the system input
to decimator outputs. With the channel arrangement in Fig. 52.4, the decimation also serves as a
demodulator. All channels will have a baseband representation in the frequency range [0,π/N] after
decimation.
c
1999 by CRC Press LLC
FIGURE 52.3: Subband coder system.
FIGURE 52.4: Ideal frequency partitioning in the analysis channel filters in a subband coder.
The synthesis filter bank, as shown in Fig. 52.3, consists of N branches with interpolators indicated
by ↑ N and bandpass filters arranged as the filters in Fig. 52.4.
The reconstruction formula constitutes the following series expansion of the output signal:
ˆx(l) =
N−1
n=0
∞
k=−∞
e
n
(k)g
n
(l − kN) ,
(52.9)
where{e
n
(k), n = 0,1,...,N− 1,k=−∞,...,−1, 0, 1,...,∞} are the expansion coefficients
representing the quantized subband signals and {g
n
(k), n = 0, 1,...,N} are the basis functions,
which are implemented as unit sample responses of bandpass filters.
Filter Bank Structures
Through the last two decades, an extensive literature on filter banks and filter bank structures
has evolved. Perfect reconstruction (PR) is often considered desirable in subband coding systems. It
is not a trivial task to design such systems due to the downsampling required to maintain a minimum
sampling rate. PR filter banks are often called identity systems. Certain filter bank structures
inherently guarantee PR.
It is beyond the scope of this chapter to give a comprehensive treatment of filter banks. We shall
only present different alternative solutions at an overview level, and in detail discuss an important
two-channel system with inherent perfect reconstruction properties.
We can distinguish between different filter banks based on several properties. In the following,
five classifications are discussed.
1. FIR vs. IIR filters — Although IIR filters have an attractive complexity, their inherent
long unit sample response and nonlinear phase are obstacles in image coding. The
unit sample response length influences the ringing problem, which is a main source of
c
1999 by CRC Press LLC
objectionabledistortionin subband coders. Thenonlinearphasemakestheedge mirroring
technique [30] for efficient coding of images near their borders impossible.
2. Uniform vs. nonuniform filter banks — This issue concerns the spectrum partioning
in frequency subbands. Currently it is the general conception that nonuniform filter
banks perform better than uniform filter banks. There are two reasons for that. The first
reason is that our visual system also performs a nonuniform partioning, and the coder
should mimic the type of receptor for which it is designed. The second reason is that
the filter bank should be able to cope with slowly varying signals (correlation over a large
region) as well as transients that are short and represent high frequency signals. Ideally,
the filter banks should be adaptive (and good examples of adaptive filter banks have been
demonstrated in the literature [2, 11]), but without adaptivity one filter bank has to be a
good compromise between the two extreme cases cited above. Nonuniform filter banks
can give the best tradeoff in terms of space-frequency resolution.
3. Parallel vs. tree-structured filter banks — The parallel filter banks are the most general,
but tree-structured filter banks enjoya large popularity, especially foroctave band (dyadic
frequency partitioning) filter banks as they are easily constructed and implemented. The
popular subclass of filter banks denoted wavelet filter banks or wavelet transforms belong
to this class. For octave band partioning, the tree-structured filter banks are as general as
the parallel filter banks when perfect reconstruction is required [4].
4. Linear phase vs. nonlinear phase filters — There is no general consensus about the
optimality of linear phase. In fact, the traditional wavelet transforms cannot be made
linear phase. There are, however, three indications that linear phase should be chosen.
(1) The noise in the reconstructed image will be antisymmetrical around edges with
nonlinear phase filters. This does not appear to be visually pleasing. (2) The mirror
extension technique [30] cannot be used for nonlinear phase filters. (3) Practical coding
gain optimizations have given better results for linear than nonlinear phase filters.
5. Unitaryvs. nonunitary systems—A unitary filterbank has thesame analysisand synthesis
filters(except fora reversalof theunit sample responsesin thesynthesis filterswith respect
to the analysis filters to make the overall phase linear). Because the analysis and synthesis
filters play different roles, it seems plausible that they, in fact, should not be equal. Also,
the gain can be larger, as demonstrated in section 52.2.3, for nonunitary filter banks as
long as straightforward scalar quantization is performed on the subbands.
Several other issues could be taken into consideration when optimizing a filter bank. These are,
among others, the actual frequency partitioning including the number of bands, the length of the
individual filters, and other design criteria than coding gain to alleviate coding artifacts, especially
at low rates. As an example of the last requirement, it is important that the different phases in the
reconstruction process generate the same noise; in other words, the noise should be stationary rather
than cyclo-stationary. This maybe guaranteedthrough requirementson the normsof theunit sample
responses of the polyphase components [4].
The Two-Channel Lattice Structure
A versatile perfect reconstruction system can be built from two-channel substructures based
on lattice filters [36]. The analysis filter bank is shown in Fig. 52.5. It consists of delay-free blocks
given in matrix forms as
η =
ab
cd
,
(52.10)
and single delays in the lower branch between each block. At the input, the signal is multiplexed into
the two branches, which also constitutes the decimation in the analysis system.
c
1999 by CRC Press LLC
FIGURE 52.5: Multistage two-channel lattice analysis lattice filter bank.
FIGURE 52.6: Multistage two-channel polyphase synthesis lattice filter bank.
A similar synthesis filter structure is shown in Fig. 52.6. In this case, the lattices are given by the
inverse of the matrix in Eq. 52.10:
η
−1
=
1
ad − bc
d −b
−ca
,
(52.11)
and the delays are in the upper branches. It is not hard to realize that the two systems are inverse
systems provided ad − bc = 0,exceptforasystemdelay.
As the structure can be extended as much as wanted, the flexibility is good. The filters can be
made unitary or they can have a linear phase. In the unitary case, the coefficients are related through
a = d = cosφ and b =−c = sinφ, whereas in the linear phase case, the coefficients are a = d = 1
and b = c. In the linear phase case, the last block (η
L
) must be a Hadamard transform.
Tree Structured Filter Banks
In tree-structured filter banks, the signal is first split in two channels. The resulting outputs
are input to a second stage with further separation. This process can go on as indicated in Fig. 52.7
for a system where at every stage the outputs are split further until the required resolution has been
obtained.
Tree-structured systems have a rather high flexibility. Nonuniform filter banks are obtained by
splitting only some of the outputs at each stage. To guarantee perfect reconstruction, each stage in
the synthesis filter bank (Fig. 52.7) must reconstruct the input signal to the corresponding analysis
filter.
52.2.3 Optimal Transforms/Filter Banks
The gain in subband and transform coders depends on the detailed construction of the filter bank as
well as the quantization scheme.
Assumethattheanalysisfilterbankunitsampleresponsesaregivenby{h
n
(k), n = 0, 1,...,N−1}.
The corresponding unit sample responses of the synthesis filters are required to have unit norm:
L−1
k=0
g
2
n
(k) = 1 .
c
1999 by CRC Press LLC
FIGURE 52.7: Left: Tree structured analysis filter bank consisting of filter blocks where the signal
is split in two and decimated by a factor of two to obtain critical sampling. Right: Corresponding
synthesis filter bank for recombination and interpolation of the signals.
The coding gain of a subband coder is defined as the ratio between the noise using scalar quan-
tization (PCM) and the subband coder noise incorporating optimal bit-allocation as explained in
section 52.3:
G
SBC
=
N−1
n=0
σ
2
x
n
σ
2
x
−1/N
(52.12)
Here σ
2
x
is the variance of the input signal while{σ
2
x
n
,n= 0, 1 ...,N−1} are the subband variances
given by
σ
2
x
n
=
∞
l=−∞
R
xx
(l)
∞
j=−∞
h
n
(j)h
n
(l + j)
(52.13)
=
π
−π
S
xx
(e
jω
)|H
n
(e
jω
)|
2
dω
2π
.
(52.14)
The subband variances depend both on the filters and the second order spectral information of the
input signal.
For images, the gain is often estimated assuming that the image can be modeled as a first order
Markov source (also called an AR(1) process) characterized by
R
xx
(l) = σ
2
x
0.95
|l|
.
(52.15)
(Strictly speaking, the model is valid only after removal of the image average).
We consider the maximum gain using this model for three special cases. The first is the transform
coder performance, which is an important reference as all image and video coding standards are
based on transform coding. The second is for unitary filter banks, for which optimality is reached by
using ideal brick-wall filters. The third case is for nonunitary filter banks, often denoted biorthogonal
when the perfect reconstruction property is guaranteed. In the nonunitary case, halfwhitening is
obtained within each band. Mathematically this can be seen from the optimal magnitude response
for the filter in channel n:
|H
n
(e
jω
)|=
c
2
S
xx
(e
jω
)
σ
2
x
−1/4
for ω ∈±[
πn
N
,
π(n+1)
N
]
0 otherwise,
(52.16)
c
1999 by CRC Press LLC