ĐIỆN tử VIỄN THÔNG bài 3 lossless compression khotailieu

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (767.01 KB, 35 trang )

Multimedia Engineering
--------Lecture 3: Lossless Compression
Techniques

Lecturer: Dr. Đỗ Văn Tuấn
Department of Electronics and
Telecommunications
Email:

Lecture contents

1.
2.
3.
4.
5.
6.

Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
Arithmetic Coding
Dictionary-based Coding

2

Introduction
 Compression: the process of coding that will effectively reduce the total

number of bits needed to represent certain information.
 If the compression and decompression processes induce no information loss,
then the compression scheme is lossless; otherwise, it is lossy.
 Compression ratio:

B0
compression ratio =
B1

B0 − number of bits before compression
B1 − number of bits after compression

3

Lecture contents

1.
2.
3.
4.
5.
6.

Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
Arithmetic Coding
Dictionary-based Coding

4

Basics of Information Theory
 The entropy

η of an information source with alphabet

S = s1 , s2 ,...sn

is

pi – probability that symbol si will occur in S.
1
– indicates the amount of information (self-information)
pi
contained in si , which corresponds to the number of bits needed to
encode si .

log 2

5

Distribution of Grey Level Intensities
 The figure below shows the histogram of an image with uniform distribution
of gray-level intensities, i.e., pi = 1/256 (i=1:256). Hence, the entropy of this
image is

log2256 = 8.

Figure: Histograms for Two Gray-level Images
6

Entropy and Code Length
 The entropy η is a weighted-sum of terms; hence it represents the average
amount of information contained per symbol in the source S.
 The entropy η specifies the lower bound for the average number of bits to
code each symbol in S, i.e., η ≤ ave(len)
ave(len)– the average length (measured in bits) of the codewords produced by
the encoder.
η = 8, the minimum average number of bits to represent each gray-level
intensity is at least 8.
η = (1/3) log23 + (2/3)log2(3/2) = 0.92
 The entropy is greater when the probability distribution is flat and smaller
when it is more peaked.
7

Lecture contents

1.
2.
3.
4.
5.
6.

Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
Arithmetic Coding
Dictionary-based Coding

8

Run-Length Coding
 Rationale for RLC: if the information source has the property that symbols
tend to form continuous groups, then such symbol and the length of the group
can be coded.
 Run-length encoding (RLE) is a very simple form of data compression in
which runs of data (that is, sequences in which the same data value occurs in
many consecutive data elements) are stored as a single data value and count,
rather than as the original run.
 Example:
Text to be coded (length: 67)
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWW
WW

Text after coded (length: 18)
12W1B12W3B24W1B14W

9

Lecture contents

1.
2.
3.
4.
5.
6.

Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
Arithmetic Coding
Dictionary-based Coding

10

Shannon – Fano Algorithm
A top-down approach
 Sort the symbols according to the frequency count of their occurrences.
 Recursively divide the symbols into two parts, each with approximately the
same number of counts, until all parts contain only one symbol.

11

Shannon – Fano Algorithm
1.

2.
3.
4.

5.

For a given list of symbols, develop a corresponding list of probabilities or
frequency counts so that each symbol’s relative frequency of occurrence is
known.
Sort the lists of symbols according to frequency, with the most frequently
occurring symbols at the left and the least common at the right.
Divide the list into two parts, with the total frequency counts of the left part
being as close to the total of the right as possible.
The left part of the list is assigned the binary digit 0, and the right part is
assigned the digit 1. This means that the codes for the symbols in the first part
will all start with 0, and the codes in the second part will all start with 1.
Recursively apply the steps 3 and 4 to each of the two halves, subdividing
groups and adding bits to the codes until each symbol has become a
corresponding code leaf on the tree.

12

Shannon – Fano – Example (wiki)
A sequence of symbols describing the table below:

Output codes

Average bit number

13

Huffman Coding
A bottom-up approach
 Initialization: Put all symbols on a list sorted according to their frequency
counts.
 Repeat until the list has only one symbol left:
 From the list pick two symbols with the lowest frequency counts. Form a
Huffman sub-tree that has these two symbols as child nodes and create a
parent node.
 Assign the sum of the children’s frequency counts to the parent and insert
it into the list such that the order is maintained.
 Delete the children from the list.
 Assign a codeword for each leaf based on the path from the root.

14

Huffman Coding – Example (wiki)
A sequence of symbols describing the table below:

Output codes

Average bit number

15

Lecture contents

1.
2.
3.
4.
5.
6.

Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
Arithmetic Coding
Dictionary-based Coding

16

Arithmetic Coding
 Arithmetic coding is a more modern coding method that usually out-performs
Huffman coding.
 Huffman coding assigns each symbol a codeword which has an integral bit
length. Arithmetic coding can treat the whole message as one unit.
 A message is represented by a half-open interval [a, b) where a and b are
real numbers between 0 and 1. Initially, the interval is [0, 1).
 When the message becomes longer, the length of the interval shortens and the
number of bits needed to represent the interval increases.

17

Arithmetic Coding Encoder

18

Example: Coding in Arithmetic Coding

Table: Probability distribution of symbols

19

Example: Coding in Arithmetic Coding

Figure: Graphical display of shrinking ranges
20

Example: Coding in Arithmetic Coding

Table: Now low, high and range generated

21

Arithmetic Coding Decoder

22

Arithmetic Coding Decoder
 If the alphabet is [A, B, C] and the probability distribution is
pA = 0.5, pB = 0.4, pC = 0.1, then for sending BBB,
 Huffman coding: 6 bits
 Arithmetic coding: 4 bits
 Arithmetic coding can treat the whole message as one unit. In practice, the
input data is usually broken up into chunks to avoid error propagation.

23

Lecture contents

1.
2.
3.
4.
5.
6.

Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
Arithmetic Coding
Dictionary-based Coding

24

Lempel-Ziv-Welch Algorithm
 LZW uses fixed-length codewords to represent variable-length strings of
symbols/characters that commonly occur together, e.g., words in English text.
 The LZW encoder and decoder build up the same dictionary dynamically
while receiving the data.
 LZW places longer and longer repeated entries into a dictionary, and then
emits the code for an element, rather than the string itself, if the element has
already been placed in the dictionary.
 The predecessors of LZW are LZ77 and LZ78, due to Jacob Ziv and Abraham
Lempel in 1977 and 1978.
 Terry Welch improved the technique in 1984.
 LZW is used in many applications, such as UNIX compress, GIF for image,
V.42 bis for modems, and others.

25

ĐIỆN tử VIỄN THÔNG bài 3 lossless compression khotailieu

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về