Tải bản đầy đủ (.pdf) (2 trang)

Design of integer motion estimator of HEVC for asymmetric motion partitioning mode and 4k UHD

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (259.27 KB, 2 trang )

Design of integer motion estimator of HEVC
for asymmetric motion-partitioning mode
and 4K-UHD

the data flow in direction (a), and the two types of dashed lines show
the data flow in direction (b) or (c). The grey registers are added to the
registers on the bottom line. By reading the data beforehand, these registers reduce the read cycles to only one clock cycle in direction (b).

J. Byun, Y. Jung and J. Kim
p00_00

p00_01

p00_02

p00_63

SRAM data

SRAM data

p00_01

p00_01

p00__02

p01_63

SRAM data


SRAM data

p63_00

p00_01

p63_02
p00_02

p63_63

SRAM data

SRAM data

p64_00

p64_01

p64_02

p64_63

SRAM data

a
b
c

Fig. 2 Data flow of search area registers

2N

4N

2N

N

next depth

3N

2N
N

4N
3N

N

N

Introduction: To provide a compression ratio higher than the previous
standards, the inter-prediction of high-efficiency video coding
(HEVC) uses the basic unit size of 64 × 64, which is called the coding
tree unit (CTU), the recursive quad-tree coding unit structure and the
asymmetric motion-partitioning (AMP) mode [1, 2]. These features
provide more flexible predictability of size partitioning than previous
standards do, but they make it difficult to implement motion-estimator
hardware. Previous motion-estimator system structures are not suitable

to support these features [3–5]. Therefore, the HEVC requires a
motion-estimator structure that is different from that of the previous
standards.

SRAM data

N

A design for an integer motion estimator of high-efficiency video
coding (HEVC) is presented. HEVC supports the 64 × 64 coding tree
unit, the recursive quad-tree coding unit structure and the asymmetric
motion-partitioning mode in a high compression ratio. These features
require a structure of integer motion estimation that is more complex
than that of H.264/AVC. The new structures of a memory read controller and a sum of absolute difference (SAD) summation block are proposed. The new memory read controller reduces the internal memory
read time, and the new SAD summation block structure supports the
recursive quad-tree coding unit structure and the asymmetric motionpartitioning mode. The proposed design is implemented in Verilog
HDL and synthesised using the 65 nm CMOS technology. The gate
count is 3.56 M, and the internal static random access memory is
about 20 kbyte. The operation frequency is 250 MHz when a 4 KUltra high definition (UHD) (3840 × 2160P at 30 Hz) sized video is
encoded.

N

3N

4N

2N

Top-level structure: Our system consists of search area memories,

current memory, 256 process elements (PEs), a sum of absolute difference (SAD) summation block, a cost block and a comparison tree.
Search area memories and current memory save the pixel values of
the reference frame and the current coding unit. One PE calculates the
SAD value of a 4 × 4 block. The SAD summation block calculates
various SAD values using the results of PEs. The cost block solves
the cost values of variously sized blocks, and the comparison tree
block decides the best mode that has the smallest cost value. Since
the basic unit of the HEVC is 16 times greater than H.264/AVC and
the HEVC uses a recursive quad-tree coding unit and AMP mode,
new structures of the memory read controller and the SAD summation
block are required.

3N

N

N

4N

a
2N

N

2N

N

2N


N

2N

processing
area
(64x64)

N

b
a
c

b

N=4 SAD sum 0

N=4 SAD sum 8

N=4 SAD sum 1

N=4 SAD sum 9
N=8 SAD sum 0

search area
(127x127)

scan order


N=8 SAD sum 2

N=4 SAD sum 2

N=4 SAD sum 10

N=4 SAD sum 3

N=4 SAD sum 11
N=16 SAD sum

N=4 SAD sum 4

N=4 SAD sum 12

N=4 SAD sum 5

N=4 SAD sum 13
N=8 SAD sum 1

Fig. 1 Scan order of search area memories

N=8 SAD sum 3

N=4 SAD sum 6
N=4 SAD sum 7

N=4 SAD sum 14
N=32 SAD sum


N=4 SAD sum 15

c

Memory read controller: Fig. 1 shows the scan order of the processing
area, which is the region of the search area that is calculated immediately. Since search area memories consist of line memories, each line
memory of the search area reads only 1 byte per one clock cycle.
There is no problem when the scan order is in the direction (a) or (c).
However, when the scan order is in the direction (b), the line memory
of the last search area has to read 64 bytes per one clock cycle. The
memory read cycles increase by four clock cycles when the memory
bit width is 128 bits, which creates 388 800 unnecessary clock cycles
in one 4 K-Ultra high definition (UHD) (3840 × 2160P at 30 Hz) frame.
To solve this problem, we added registers on the bottom line. Fig. 2
shows the data flow in the search area registers. The solid line indicates

Fig. 3 Structure of SAD summation block
a N = 4, 8 or 16
b N = 32
c Hierarchical structure of SAD summation block

SAD summation block: The SAD summation block solves various sizes
of SAD values using 256 4 × 4 SAD values that are calculated by the
PEs. H.264/AVC uses only seven block sizes. However, because
the HEVC uses the recursive quad-tree coding unit structure and the
AMP mode, it needs 27 block sizes [1, 2]. The various block sizes
need a SAD summation block that has a structure different from

ELECTRONICS LETTERS 29th August 2013 Vol. 49 No. 18



H.264/AVC. Fig. 3a shows the structure of the SAD summation block
when N is 4, 8 or 16 and Fig. 3b shows the structure of the SAD summation block when N is 32. Since the HEVC uses the recursive quadtree coding unit structure, the number of structures for N = 4 is 16, for
N = 8 it is 8, for N = 16 and for N = 32 only one is needed. As shown
in Fig. 3c, these structures are connected hierarchically. If N = 32, the
process of the SAD summation block is similar to that of H.264/AVC.
However, the bold lines in Fig. 3a indicate the AMP mode when N is
4, 8 or 16. These parts effectively calculate the SAD values of the
AMP mode, using small SAD values. The proposed SAD summation
block solves the SAD values of every HEVC inter-prediction mode
and depth by adding small neighbour SADs.
Cost block and comparison tree: The cost block calculates the cost
values of every prediction mode and depth using SAD values and a
motion vector. The comparison tree determines the final prediction
mode and depth of the CTU, using a comparison of the results of the
cost block calculation.

Conclusion: This Letter presents a motion-estimator structure that
effectively supports the recursive quad-tree coding unit and the AMP
mode and reduces the number of memory read cycles. The designed
integer-motion-estimator system uses the 65 nm CMOS technology.
The gate count is 3.56 M with 20.23 kb of internal SRAM. It can
encode a 4 K-UHD video in real time at a clock speed of 250 MHz.
Acknowledgment: This work was supported by the IT R&D program
of MOTIE/KEIT (10035389) research on high speed and low power
wireless communication SoC for high resolution video information
mining.
© The Institution of Engineering and Technology 2013
24 March 2013

doi: 10.1049/el.2013.0936
J. Byun, Y. Jung and J. Kim (School of Electrical and Electronic
Engineering, Yonsei University, Seoul, Republic of Korea)
E-mail:

Pipeline process: Fig. 4 shows the pipeline process of the proposed
system. The memory read stage uses only one clock cycle; additional
clock cycles are not required in scan direction (b) by adding registers
on the bottomline. Finally, the proposed integer-motion-estimator
system uses 4105 clock cycles for processing the integer motion estimation of one CTU.
1 clock

1 clock

memory
read_1

PE_1
memory
read_2

2 clock

2 clock

SAD summation_1
PE_2

cost block_1


SAD summation

cost block_2
4 clock

memory
PE_4096 SAD summation_4096
read_4096

cost block_4096

comparison tree

4105 clock cycles

Fig. 4 Pipeline process of proposed system

Synthesised results: The proposed system was implemented in Verilog
HDL and was synthesised using the 65 nm CMOS technology. The gate
count is 3.56 M and the internal static random access memory (SRAM)
is 20 225 bytes. The operation frequency is 250 MHz when a
4 K-UHD-sized video is encoded. Table 1 shows a comparison of the
proposed
system
with
the
previous
H.264/AVC
integer-motion-estimation system [5]. The proposed system supports a
greater variety of block sizes and a higher resolution 4 K-UHD video

than the previous one has.

References
1 Bross, B., Han, W.-J., Sullivan, G.J., Ohm, J.-R., and Wiegand, T.: ‘High
Efficiency Video Coding (HEVC) Text Specification Draft 9’, ITU-T/
ISO/IEC Joint Collaborative Team on Video coding (JCT-VC),
October 2012, JCTVC-K1003
2 Francois, E., Guillo, L., Ichigaya, A., and Yu, H.: ‘TE12: report on AMP
evaluation’, ITU-T/ISO/IEC Joint Collaborative Team On Video coding
(JCT-VC), October 2010, JCTVC-C030
3 Kang, J.S., Lee, Y.T., and Jeon, J.W.: ‘Motion estimator with adaptive
reduction of search points’, Electron. Lett., 2003, 39, (22),
pp. 1584–1586
4 Hsia, S.-C., and Hong, P.-Y.: ‘Very large scale integration (VLSI)
implementation of low-complexity variable block size motion estimation
for H.264/AVC coding’, IET Circuits Devices Syst., 2010, 4, (5),
pp. 414–424
5 Kao, C.Y., and Lin, Y.L.: ‘A memory-efficient and highly parallel architecture for variable block size integer motion estimation in H.264/AVC’,
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2010, 18, (6),
pp. 866–874

Table 1: Comparison of proposed system with previous H.264/
AVC integer-motion-estimation system [5]
Video standard
Process
Gate count (SRAM)
Block size
Search range
Number of
reference frame

Operation frequency

[5]
H.264/AVC
0.18 μm
1.45 M (2.97 kb)
16 × 16 to 4 × 4 (seven kinds,
without AMP)
64 × 64

Proposed
HEVC
65 nm
3.56 M (20.23 kb)
64 × 64 to 8 × 4 (27 kinds,
with AMP)
64 × 64

2

1

130 MHz (FHD)

250 MHz (4 K-UHD)

ELECTRONICS LETTERS 29th August 2013 Vol. 49 No. 18




×