Image and Videl Comoression P118

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (618.92 KB, 19 trang )

20

© 2000 by CRC Press LLC

MPEG System — Video, Audio,
and Data Multiplexing

In this chapter, we present the methods and standards requiring how to multiplex and synchronize
the MPEG-coded video, audio, and other data into a single bitstream or multiple bitstreams for
storage and transmission.

20.1 INTRODUCTION

ISO/IEC MPEG has completed work on the ISO/IEC 11172 and 13818 standards known as MPEG-1
and MPEG-2, respectively, which deal with the coding of digital audio and video signals. Currently,
ISO/IEC is working on ISO/IEC 14496 known as MPEG-4 that is object-based generic coding for
multimedia applications. As mentioned in the previous chapters, the MPEG-1, 2, and 4 standards
are designed as generic standards and as such are suitable for use in a wide range of audiovisual
applications. The coding part of the standards convert the digital visual, audio, and data signals to
the compressed formats that are represented as binary bits. The task of the MPEG system is focused
on multiplexing and synchronizing the coded audio, video, and data into a single bitstream or
multiple bitstreams. In other words, the digital compressed video, audio, and data are all ﬁrst
represented as binary formats which are referred to as bitstreams, and then the function of system
is to mix the bitstreams from video, audio, and data together. For this purpose, several issues have
to be addressed by the system part of the standard:
• Distinguishing different data, such as audio, video, or other data;
• Allocating bandwidth during muxing;
• Reallocating or decoding the different data during demuxing;
• Protecting the bitstreams in error-prone media and detecting the errors;
• Dynamically multiplexing several bitstreams.

Additional requirements for the system should include extensibility issues, such as:
• New service extensions should be possible;
• Existing decoders should recognize and ignore data they cannot understand;
• The syntax should have extension capacity.
It should also be noted that all system-timing signals are included in the bitstream. This is the
big difference between digital systems and traditional analog systems in which the timing signals
are transmitted separately. In this chapter, we will introduce the concept of systems and give detailed
explanations for existing standards such as MPEG-2. However, we will not go through the standards
page by page to explain the syntax, we will pay more attention to the core parts of the standard
and the parts which always cause confusion during implementation. One of the key issues is system
timing. For MPEG-4, we will give a presentation of the current status of the system part of the
standards.

© 2000 by CRC Press LLC

20.2 MPEG-2 SYSTEM

The MPEG-2 system standard is also referred to as ITU-T Rec. H.222.0/ISO/IEC 13818-1
(ISO/IEC, 1996). The ISO document gives a very detailed description of this standard. A simpliﬁed
overview of this system is shown in Figure 20.1.
The MPEG-2 system coding is speciﬁed in two forms: the transport stream and the program
stream. Each is optimized for a different set of applications. The audio and video data are ﬁrst
encoded by an audio and a video encoder, respectively. The coded data are the compressed
bitstreams, which follow the syntax rules speciﬁed by the video-coding standard 13818-2 and audio-
coding standard 13818-3. The compressed audio and video bitstreams are then packetized to the
packetized elementary streams (PES). The video PES and audio PES are coded by system coding
to the transport stream or program stream according to the requirements of the application.
The system coding provides a coding syntax which is necessary and sufﬁcient to synchronize
the decoding and presentation of the video and audio information; at the same time it also has to
ensure that data buffers in the decoders do not overﬂow and underﬂow. Of course, buffer regulation

is also considered by the buffer control or rate control mechanism in the encoder. The video, audio,
and data information are multiplexed according to the system syntax by inserting time stamps for
decoding, presenting, and delivering the coded audio, video, and other data. It should be noted that
both the program stream and the transport stream are packet-oriented multiplexing. Before we
explain these streams, we ﬁrst give a set of parameter deﬁnitions used in the system documents.
Then, we describe the overall picture regarding the basic multiplexing approach for single video
and audio elementary streams.

20.2.1 M

AJOR

T

ECHNICAL

D

EFINITIONS

IN

THE

MPEG-2 S

YSTEM

D

OCUMENT

In this section, the technical deﬁnitions that are often used in the system document are provided.
First, the major packet- and stream-related deﬁnitions are given.

Access unit:

A coded representation of a presentation unit. In the case of audio, an access
unit is the coded representation of audio frame. In the case of video, an access unit indicates
all the coded data for a picture, and any stufﬁng that follows it, up to but not including
the start of the next access unit. In other words, the access unit begins with the ﬁrst byte
of the ﬁrst start code. Except for the end of sequence, all bytes between the last byte of
the coded picture and the sequence end code belong to the access unit.

DSM

-

CC:

Digital storage media command and control.

Elementary stream

(

ES

)

:

A generic term for one of the coded video, coded audio, or other
coded bitstreams in PES packets. One elementary stream is carried in a sequence of PES

FIGURE 20.1

Simpliﬁed overview of system layer scope. (From ISO/IEC 13818-1, 1996. With permission.)

© 2000 by CRC Press LLC

packets with one and only one stream identiﬁcation. This implies that one elementary
stream can only carry the same type of data, such as audio or video.

Packet:

A packet consists of a header followed by a number of contiguous bytes from an
elementary data stream.

Packet identiﬁcation

(

PID

)

:

A unique integer value used to associate elementary streams
of a program in a single- or multiprogram transport stream. It is a 13-bit ﬁeld, which
indicates the type of data stored in the packet payload.

PES packet:

The data structure used to carry elementary stream data. It contains a PES
packet header followed by PES packet payload

.
PES stream:

A PES stream consists of PES packets, all of whose payloads consist of data
from a single elementary steam, and all of which have the same stream identiﬁcation.
Speciﬁc semantic constraints apply.

PES packet header:

The leading ﬁelds in the PES packet up to and not including the PES
packet data byte ﬁelds. Its function will be explained in the section on syntax description.

System target decoder

(

STD

)

:

A hypothetical reference model of a decoding process used
to describe the semantics of the MPEG-2 system-multiplexed bitstream.

Program-speciﬁc information

(

PSI

)

:

PSI includes normal data that will be used for demul-
tiplexing of programs in the transport stream by decoders. One case of PSI, the nonman-
datory network information table, is privately deﬁned.

System header:

The leading ﬁelds of program stream packets.

Transport stream packet header:

The leading ﬁelds of program stream packets.
The following deﬁnitions are related to timing information:

Time stamp:

A term that indicates the time of a speciﬁc action such as the arrival of a byte
or the presentation of a presentation unit.

System clock reference

(

SCR

)

:

A Time stamp in the program stream from which decoder
timing is derived.

Elementary stream clock reference

(

ESCR

)

:

A time stamp in the PES stream from which
decoders of the PES stream may derive timing information.

Decoding time stamp

(

DTS

)

:

A time stamp that may be presented in a PES packet header
used to indicate the time when an access unit is decoded in the system target decoder.

Program clock reference

(

PCR

)

:

A time stamp in the transport stream from which decoder
timing is derived.

Presentation time stamp

(

PTS

)

:

A time stamp that may be presented in the PES packet
header used to indicate the time that a presentation unit is presented in the system target
decoder.

20.2.2 T

RANSPORT

S

TREAMS

The transport stream is a stream deﬁnition that is designed for communicating or storing one or
more programs of coded video, audio, and other kinds of data in lossy or noisy environments where
signiﬁcant errors may occur. A transport stream combines one or more programs with one or more
time bases into a single stream. However, there are some difﬁculties with constructing and delivering
a transport stream containing multiple programs with independent time bases such that the overall
bit rate is variable. As in other standards, the transport stream may be constructed by any method
that results in a valid stream. In other words, the standards just specify the system coding syntax.
In this way, all compliant decoders can decode bitstreams generated according to the standard
syntax. However, the standard does not specify how the encoder generates the bitstreams. It is
possible to generate transport streams containing one or more programs from elementary coded
data streams, from program streams, or from other transport streams, which may themselves contain

© 2000 by CRC Press LLC

one or more programs. An important feature of a transport stream is that the transport stream is
designed in such a way that makes the following operations possible with minimum effort. These
operations include several transcoding requirements, including the following:
• Retrieve the coded data from one program within the transport stream, decode it, and
present the decoded results. In this operation, the transport stream is directly demulti-
plexed and decoded. The data in the transport stream are constructed in two layers: a
system layer and a compression layer. The system decoder decodes the transport streams
and demultiplexes them to the compressed video and audio streams that are further
decoded to the video and audio data by the video decoder and the audio decoder,
respectively. It should be noted that nonaudio/video data is also allowed. The function
of the transport decoder includes demultiplexing, depacketization, and other functions
such as error detection, which will be explained in detail later. This procedure is shown
in Figure 20.2.

•

Extract the transport stream packets from one program within the transport stream and
produce as the output a new transport stream that contains only that one program. This
operation can be seen as system-layer transcoding that converts a transport stream
containing multiple programs to a transport stream containing only a single program. In
this case, the remultiplexing operation may need the correction of PCR values to account
for changes in the PCR locations in the bitstream.
• Extract the transport stream packets of one or more programs from one or more transport
streams and produce as output of a new transport stream. This is another kind of
transcoding that converts selected programs of one transport stream to a different one.
• Extract the contents of one program from the transport stream and produce as output
another program stream. This is a transcoding that converts the transport program to a

program stream for certain applications.

•

Convert a program stream to a transport stream that can be used in a lossy communication
environment.
To answer the question of how to deﬁne the transport stream and then make the above
transcoding simpler and more efﬁcient, we will begin by describing the technical detail of the
systems speciﬁcation in the following section.

20.2.2.1 Structure of Transport Streams

As described earlier, the task of the transport stream coding layer is to allow one or more programs
to be combined into a single stream. Data from each elementary stream are multiplexed together
with timing information, which is used for synchronization and presentation of the elementary

FIGURE 20.2

Example of transport demultiplexing and decoding. (From ISO/IEC 13818-1, 1996. With
permission.)

© 2000 by CRC Press LLC

stream during decoding. Therefore, the transport stream consists of one or more programs such as
audio, video, and data elementary stream access units. The transport stream structure is a layered
structure. All the bits in the transport stream are packetized to the transport packets. The size of
transport packet is chosen to be 188 bytes, among which 4 bytes are used as the transport stream
packet header. In the ﬁrst layer, the header of the transport packets indicates whether or not the
transport packet has an adaptation ﬁeld. If there is no adaptation ﬁeld, the transport payload may
consist of only PES packets or consist of both PES packets and PSI packets. Figure 20.3 illustrates

the case of containing PES packets only. If the transport stream carries both PES and PSI packets,
then the structure of transport stream is as shown in Figure 20.4 would result. If the transport stream
packet header indicates that the transport stream packet includes the adaptation ﬁeld, then the
construct is as shown in Figure 20.5.
In Figure 20.5, the appearance of the optional ﬁeld depends on the ﬂag settings. The function
of adaptation ﬁeld will be explained in the syntax section. Before we go ahead, however, we should
give a little explanation regarding the size of the transport stream packet. More speciﬁcally, why
is a packet size of 188 bytes chosen? Actually, there are several reasons. First, the transport packet
size needs to be large enough so that the overhead due to the transport headers is not too signiﬁcant.
Second, the size should not be so large that the packet-based error correction code becomes
inefﬁcient. Finally, the size 188 bytes is also compatible with ATM packet size which is 47 bytes;
one transport stream packet is equal to four ATM packets. So the size of 188 bytes is not a theoretical
solution but a practical and compromised solution.

FIGURE 20.3

Structure of transport stream containing only PES packets. (From ISO/IEC 13818-1, 1996.
With permission.)

FIGURE 20.4

Structure of transport stream containing both PES packets and PSI packets.

© 2000 by CRC Press LLC

20.2.2.2 Transport Stream Syntax

As we indicated, the transport stream is a layered structure. To explain the transport stream syntax
we start from the transport stream packet header. Since the header part is very important, it is the
highest layer of the stream. We describe it in more detail. For the rest, we do not repeat the standard

document and just indicate the important parts that we think may cause some confusion for readers.
The detail of other parts that are not covered here can be found from the MPEG standard document
(ISO/IEC, 1996).

Transport stream packet header

— This header contains four bytes that are assigned as eight parts:
The mnemonic in the above table means:
• The sync_byte is a ﬁxed 8-bit ﬁeld whose value is 0100 0111 (hexadecimal 47 = 71).
• The transport_error_indicator is a 1-bit ﬂag, when it is set to 1, it indicates that at least
1 uncorrectable bit error exists in the associated transport stream packet. It will not be
reset to 0 unless the bit values in error have been corrected. This ﬂag is useful for error
concealment purpose, since it indicates the error location. When an error exists, either
resynchronization or another concealment method can be used.
• The payload_unit_start_indicator is a 1-bit ﬂag that is used to indicate whether the
transport stream packets carry PES packets or PSI data. If it carries PES packets, then
the PES header starts in this transport packet. If it contains PSI data, then a PSI table
starts in this transport packet.
• The transport_priority is a 1-bit ﬂag which is used to indicate that the associated packet
is of greater priority than other packets having the same PID which do not have the ﬂag

FIGURE 20.5

Structure of transport stream whose header contains an adaptation ﬁeld.

Syntax No. of bits Mnemonic

sync_byte 8 bslbf
transport_error_indicator 1 bslbf
payload_unit_start_indicator 1 bslbf

transport_priority 1 bslbf
PID 13 uimsbf
transport_scrambling_control 2 bslbf
adaptation_ﬁeld_control 2 bslbf
continuity_counter 4 uimsbf

bslbf Bitstream left bit ﬁrst
unimsbf Unsigned integer, most signiﬁcant bit ﬁrst

© 2000 by CRC Press LLC

bit set to 1. The original idea of adding a ﬂag to indicate the priority of packets comes
from video coding. The video elementary bitstream contains mostly bits that are con-
verted from DCT coefﬁcients. The priority indicator can set a partitioning point that can
divide the data into a more important part and a less important part. The important part
includes the header information and low-frequency coefﬁcients, and the less important
part includes only the high-frequency coefﬁcients that have less effect on the decoding
and quality of reconstructed pictures.
• PID is a 13-bit ﬁeld that provides information for multiplexing and demultiplexing by
uniquely identifying which packet belongs to a particular bitstream.
• The transport_scrambling_control is a 2-bit ﬂag. 00 indicates that the packet is not
scrambled, the other three (01, 10, and 11) indicate that the packet is scrambled by a
user-deﬁned scrambling method. It should be noted that the transport packet header and
adaptation ﬁeld (when it is present) should not be scrambled. In other words, only the
payload of transport packets can be scrambled.
• The adaptation_ﬁeld_control is a 2-bit indicator that is used to inform whether or not
there is an adaptation ﬁeld present in the transport packet. 00 is reserved for future use:
01 indicates no adaptation ﬁeld; 10 indicates that there is only an adaptation ﬁeld and
no payload. Finally, 11 indicates that there is an adaptation ﬁeld followed by a payload
in the transport stream packet.

• The continuity_counter is a 4-bit counter which increases with each transport stream
packet having the same PID.
From the header of the transport stream packet we can obtain information about future bits.
There are two possibilities; if the adaptation ﬁeld control value is 10 or 11, then the bits following
the header are adaptation ﬁeld; otherwise, the bits are payload. The information contained in the
adaptation ﬁeld is described as follows.

Adaptation ﬁeld

— The structure of the adaptation ﬁeld data is shown in Figure 20.5. The
functionality of these headers is basically related to the timing and decoding of the elementary bit
steam. Some important ﬁelds are explained below:
• Adaptation ﬁeld length is an 8-bit ﬁeld specifying the number of bytes immediately
following it in the adaptation ﬁeld including stufﬁng bytes.
• Discontinuity indicator is 1-bit ﬂag which when it is set to 1 indicates that the discon-
tinuity state is true for the current transport packet. When this ﬂag is set to 0, the
discontinuity is false. This discontinuity indicator is used to indicate two types of
discontinuities, system time-base discontinuities and continuity-counter discontinuities.
In the ﬁrst type, this transport stream packet is the packet of a PID designed as a PCR-
PID. The next PCR represents a sample of a new system time clock for the associated
program. In the second type, the transport stream packet could be any PID type. If the
transport stream packet is not designated as a PCR-PID, the continuity counter may be
discontinuous with respect to the previous packet with the same PID or when a system
time-base discontinuity occurs. For those PIDs that are not designated as PCR-PIDs, the
discontinuity indicator may be set to 1 in the next transport stream packet with the same
PID, but will not be set to 1 in three consecutive transport stream packet with the same
PID.
• Random access indicator is a 1-bit ﬂag that indicates the current and subsequent transport
stream packets with the same PID, containing some information to aid random access
at this point. Speciﬁcally, when this ﬂag is set to 1, the next PES packet in the payload

of the transport stream packet with the current PID will contain the ﬁrst byte of a video
sequence header or the ﬁrst byte of an audio frame.

Image and Videl Comoression P118

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về