Lecture 2 ArithCode

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (157 KB, 14 trang )

Data Compression
Lecture 2

Arithmetic Code
Alexander Kolesnikov

Arithmetic code
Alphabet extension (blocking symbols) can lead to coding
efficiency
How about treating entire sequence as one symbol!
Not practical with Huffman coding
Arithmetic coding allows you to do precisely this
Basic idea - map data sequences to sub-intervals in [0,1)
with lengths equal to probability of corresponding
sequence.
1) Huffman coder:
2) Arithmetic code:

H ≤R
H ≤R

≤H +1
bit/(symbol, pel)
≤ H + 1 bit/message (!)

Arithmetic code: History
Rissanen [1976] : arithmetic code
Pasco [1976]
: arithmetic code

Arithmetic code: Algorithm (1)
0) Start by defining the current interval as [0,1).
1) REPEAT for each symbol s in the input stream
a) Divide the current interval [L, H) into subintervals
whose sizes are proportional to the symbols's
probabilities.
b) Select the subinterval [L, H) for the symbol s
and define it as the new current interval
2) When the entire input stream has been processed,
the output should be any number V that uniquely
identify the current interval [L, H).

Arithmetic code: Algorithm (2)

0.70

Arithmetic code:Algorithm (3)
Probabilities: p1, p2, …, pN.
Cumulants: C1=0; C2=C1+p1=p1; C3=C2+p2 =p1+p2; etc.
CN=p1+p2+…+pN-1; CN+1=1;
0) Current interval [L, H) = [0.0, 1.0):
1) REPEAT for each symbol si in the input stream:
H ← L + (H − L)*C(si+1),
L ← L + (H − L)*C(si);
2) UNTIL the entire input stream has been processed.
The output code V is any number that uniquely identify

the current interval [L, H).

Example 1: Statistics
Message: 'SWISS_MISS'
Char
S
W
I
M
_

Freq
5
1
2
1
1

Prob
5/10=0.5
1/10=0.1
2/10=0.2
1/10=0.1
1/10=0.1

[C(si), C(si+1))
[0.5, 1.0)
[0.4, 0.5)
[0.2, 0.4)

[0.1, 0.2)
[0.0, 0.1)

Example 1: Encoding
S
W
I
M
__

0.5
0.1
0.2
0.1
0.1

[0.5, 1.0)
[0.4, 0.5)
[0.2, 0.4)
[0.1, 0.2)
[0.0, 0.1)

Example 1: Decoding
V ∈ [0.71753375, 0.71753500)
S
W
I
M

__

0.5
0.1
0.2
0.1
0.1

[0.5, 1.0)
[0.4, 0.5)
[0.2, 0.4)
[0.1, 0.2)
[0.0, 0.1)

Example 1: Compression?
V ∈ [0.71753375, 0.71753500)
• How many bits do we need to encode a number V
in the final interval [L, H)?
0

1

00
000

01
001

010

011

10
101

11
101

110

111

0000 0001

1110 1111
0

1

m=4 bits: 16=24 intervals of size ∆=1/16.
• The number of bits m to represent a value in the interval
of size ∆ : m= -Log2(∆) bits.

Example 1: Compression (1)
V ∈ [L, H) = [0.71753375, 0.71753500)
• Interval size (range) r: r =

n

∏p

i

i =1

r=0.5*0.1*0.2*0.5*0.50.1*0.1*0.2*0.5*0.5=0.00000125

• The number of bits to represent a value in the
interval [L, H)=[L, L+r) of size r:

 n

m =  − log 2 r  = − ∑ log 2 pi  =  Entropy
 i =1


m=-log2r = -log2(0.0000125) = 19.6 =20 bits

Example 1: Compression (2)
• Entropy = 1.96 bits/char
• Arithmetic coder:
a) Codeword V: L ≤ V < H
V = (0.71753407…)10 = (0.10110111101100000101)2
... 20 bits ...

0.71753375 < 0.71753407… < 0.71753500
b) Codelength m:

m=-log2(r) = -log2(0.0000125) = 19.6 =20 bits
c) Bitrate: R=20 bits/10 chars = 2.0 bits/char
• Huffman coder: (1+3+2+1+1+4+4+2+1+1)/10=2.2 bits/char

Properties of arithmetic code
In practice, for images, arithmetic coding gives 15-30%
improvement in compression ratios over a simple Huffman
coder. The complexity of arithmetic coding is however
50-300% higher.

Exercise
BE_A_BBE

Lecture 2 ArithCode

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về