Tải bản đầy đủ (.pdf) (587 trang)

ahmet kondoz - visual media coding and transmission

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.89 MB, 587 trang )

Visual Media Coding and Transmission
Visual Media Coding and Transmission Ahmet Kondoz
© 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-74057-6
Visual Media Coding and Transmission
Ahmet Kondoz
Centre for Communication Systems Research, University of Surrey, UK
This edition first published 2009
# 2009 John Wiley & Sons Ltd.
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for
permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,
Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK
Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and
product names used in this book are trade names, service marks, trademarks or registered trademarks of their
respective owners. The publisher is not associated with any product or vendor mentioned in this book. This
publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is
sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice
or other expert assistance is required, the services of a competent professional should be sought.
#1998, #2001, #2002, #2003, #2004. 3GPP
TM
TSs and TRs are the property of ARIB, ATIS, CCSA, ETSI,
TTA and TTC who jointly own the copyright in them. They are subject to further modifications and are therefore
provided to you ‘as is’ for information purposes only. Further use is strictly prohibited.


Library of Congress Cataloging-in-Publication Data
Kondoz, A. M. (Ahmet M.)
Visual media coding and transmission / Ahmet Kondoz.
p. cm.
Includes bibliographical references and index.
ISBN 978 0 470 74057 6 (cloth)
1. Multimedia communications. 2. Video compression. 3. Coding theory. 4. Data transmission systems. I. Title.
TK5105.15.K65 2009
621.382’1 dc22
2008047067
A catalogue record for this book is available from the British Library.
ISBN 9780470740576 (H/B)
Set in 10/12pt Times New Roman by Thomson Digital, Noida, India.
Printed in Great Britain by CPI Antony Rowe, Chippenham, England
Contents
VISNET II Researchers xiii
Preface xv
Glossary of Abbreviations xvii
1 Introduction 1
2 Video Coding Principles 7
2.1 Introduction 7
2.2 Redundancy in Video Signals 7
2.3 Fundamentals of Video Compression 8
2.3.1 Video Signal Representation and Picture Structure 8
2.3.2 Removing Spatial Redundancy 9
2.3.3 Removing Temporal Redundancy 14
2.3.4 Basic Video Codec Structure 16
2.4 Advanced Video Compression Techniques 17
2.4.1 Frame Types 17
2.4.2 MC Accuracy 19

2.4.3 MB Mode Selection 20
2.4.4 Integer Transform 21
2.4.5 Intra Prediction 22
2.4.6 Deblocking Filters 22
2.4.7 Multiple Reference Frames and Hierarchical Coding 24
2.4.8 Error-Robust Video Coding 24
2.5 Video Codec Standards 28
2.5.1 Standardization Bodies 28
2.5.2 ITU Standards 29
2.5.3 MPEG Standards 29
2.5.4 H.264/MPEG- 4 AVC 31
2.6 Assessment of Video Quality 31
2.6.1 Subjective Performance Evaluation 31
2.6.2 Objective Performance Evaluation 32
2.7 Conclusions 35
References 36
3 Scalable Video Coding 39
3.1 Introduction 39
3.1.1 Applications and Scenarios 40
3.2 Overview of the State of the Art 41
3.2.1 Scalable Coding Techniques 42
3.2.2 Multiple Description Coding 45
3.2.3 Stereoscopic 3D Video Coding 47
3.3 Scalable Video Coding Techniques 48
3.3.1 Scalable Coding for Shape, Texture, and Depth for 3D Video 48
3.3.2 3D Wavelet Coding 68
3.4 Error Robustness for Scalable Video and Image Coding 74
3.4.1 Correlated Frames for Error Robustness 74
3.4.2 Odd Even Frame Multiple Description Coding
for Scalable H.264/AVC 82

3.4.3 Wireless JPEG 2000: JPWL 91
3.4.4 JPWL Simulation Results 94
3.4.5 Towards a Theoretical Approach for Optimal Unequal
Error Protection 96
3.5 Conclusions 98
References 99
4 Distributed Video Coding 105
4.1 Introduction 105
4.1.1 The Video Codec Complexity Balance 106
4.2 Distributed Source Coding 109
4.2.1 The Slepian Wolf Theorem 109
4.2.2 The Wyner Ziv Theorem 110
4.2.3 DVC Codec Architecture 111
4.2.4 Input Bitstream Preparation Quantization and Bit Plane Extraction 112
4.2.5 Turbo Encoder 112
4.2.6 Parity Bit Puncturer 114
4.2.7 Side Information 114
4.2.8 Turbo Decoder 115
4.2.9 Reconstruction: Inverse Quantization 116
4.2.10 Key Frame Coding 117
4.3 Stopping Criteria for a Feedback Cha nnel-based Transform
Domain Wyner Ziv Video Codec 118
4.3.1 Proposed Technical Solution 118
4.3.2 Performance Evaluation 120
4.4 Rate-distortion Analysis of Motion-compensated Interpolation
at the Decoder in Distributed Video Coding 122
4.4.1 Proposed Technical Solution 122
4.4.2 Performance Evaluation 126
4.5 Nonlinear Quantization Technique for Distributed Video Coding 129
4.5.1 Proposed Technical Solution 129

4.5.2 Performance Evaluation 132
vi Contents
4.6 Symmetric Distributed Coding of Stereo Video Sequences 134
4.6.1 Proposed Technical Solution 134
4.6.2 Performance Evaluation 137
4.7 Studying Error-resilience Performance for a Feedback Channel-based
Transform Domain Wyner Ziv Video Codec 139
4.7.1 Proposed Technical Solution 139
4.7.2 Performance Evaluation 140
4.8 Modeling the DVC Decoder for Error-prone Wireless Channels 144
4.8.1 Proposed Technical Solution 145
4.8.2 Performance Evaluation 149
4.9 Error Concealment Using a DVC Approach for Video
Streaming Applications 151
4.9.1 Proposed Technical Solution 152
4.9.2 Performance Evaluation 155
4.10 Conclusions 158
References 159
5 Non-normative Video Coding Tools 161
5.1 Introduction 161
5.2 Overview of the State of the Art 162
5.2.1 Rate Control 162
5.2.2 Error Resilience 164
5.3 Rate Control Architecture for Joint MVS Encoding and Transcoding 165
5.3.1 Problem Definition and Objectives 165
5.3.2 Proposed Technical Solution 166
5.3.3 Performance Evaluation 169
5.3.4 Conclusions 171
5.4 Bit Allocation and Buffer Control for MVS Encoding Rate Control 171
5.4.1 Problem Definition and Objectives 171

5.4.2 Proposed Technical Approach 172
5.4.3 Performance Evaluation 177
5.4.4 Conclusions 179
5.5 Optimal Rate Allocation for H.26 4/AVC Joint MVS Transcoding 179
5.5.1 Problem Definition and Objectives 179
5.5.2 Proposed Technical Solution 180
5.5.3 Performance Evaluation 181
5.5.4 Conclusions 182
5.6 Spatio-temporal Scene-level Error Concealment for Segmented Video 182
5.6.1 Problem Definition and Objectives 182
5.6.2 Proposed Technical Solution 183
5.6.3 Performance Evaluation 187
5.6.4 Conclusions 188
5.7 An Integrated Error-resilient Object-based Video
Coding Architecture 189
5.7.1 Problem Definition and Objectives 189
5.7.2 Proposed Technical Solution 189
Contents vii
5.7.3 Performance Evaluation 195
5.7.4 Conclusions 195
5.8 A Robust FMO Scheme for H.264/AVC Video Transcoding 195
5.8.1 Problem Definition and Objectives 195
5.8.2 Proposed Technical Solution 195
5.8.3 Performance Evaluation 197
5.8.4 Conclusions 198
5.9 Conclusions 199
References 199
6 Transform-based Multi-view Video Coding 203
6.1 Introduction 203
6.2 MVC Encoder Complexity Reduction using a Multi-grid

Pyramidal Approach 205
6.2.1 Problem Definition and Objectives 205
6.2.2 Proposed Technical Solution 205
6.2.3 Conclusions and Further Work 208
6.3 Inter-view Prediction using Reconstructed Disparity
Information 208
6.3.1 Problem Definition and Objectives 208
6.3.2 Proposed Technical Solution 208
6.3.3 Performance Evaluation 210
6.3.4 Conclusions and Further Work 211
6.4 Multi-view Coding via Virtual View Generation 212
6.4.1 Problem Definition and Objectives 212
6.4.2 Proposed Technical Solution 212
6.4.3 Performance Evaluation 215
6.4.4 Conclusions and Further Work 216
6.5 Low-delay Random View Access in Multi-view Coding Using
a Bit Rate-adaptive Downsampling Approach 216
6.5.1 Problem Definition and Objectives 216
6.5.2 Proposed Technical Solution 216
6.5.3 Performance Evaluation 219
6.5.4 Conclusions and Further Work 222
References 222
7 Introduction to Multimedia Communications 225
7.1 Introduction 225
7.2 State of the Art: Wireless Multimedia Communications 228
7.2.1 QoS in Wireless Networks 228
7.2.2 Constraints on Wireless Multimedia Communications 231
7.2.3 Multimedia Compression Technologies 234
7.2.4 Multimedia Transmission Issues in Wireless Networks 235
7.2.5 Resource Management Strategy in Wireless Multimedia

Communications 239
viii Contents
7.3 Conclusions 244
References 244
8 Wireless Channel Models 247
8.1 Introduction 247
8.2 GPRS/EGPRS Channel Simulator 247
8.2.1 GSM/EDGE Radio Access Network (GERAN) 247
8.2.2 GPRS Physical Link La yer Model Description 250
8.2.3 EGPRS Physical Link Layer Model Description 252
8.2.4 GPRS Physical Link La yer Simulator 256
8.2.5 EGPRS Physical Link Layer Simulator 261
8.2.6 E/GPRS Radio Interface Data Flow Model 268
8.2.7 Real-time GERAN Emulator 270
8.2.8 Conclusion 271
8.3 UMTS Channel Simulator 272
8.3.1 UMTS Terrestrial Radio Access Network (UTRAN) 272
8.3.2 UMTS Physical Link Layer Model Description 279
8.3.3 Model Verification for Forward Link 290
8.3.4 UMTS Physical Link Layer Simulator 298
8.3.5 Performance Enhancement Techniques 307
8.3.6 UMTS Radio Interface Data Flow Model 309
8.3.7 Real-time UTRAN Emulator 312
8.3.8 Conclusion 313
8.4 WiMAX IEEE 802.16e Modeling 316
8.4.1 Introduction 316
8.4.2 WIMAX System Description 317
8.4.3 Physical Layer Simulation Results and Analysis 323
8.4.4 Error Pattern Files Generation 324
8.5 Conclusions 328

8.6 Appendix: E
b
/N
o
and DPCH E
c
/I
o
Calculation 329
References 330
9 Enhancement Schemes for Multimedia Transmission over
Wireless Networks 333
9.1 Introduction 333
9.1.1 3G Real-time Audiovisual Requirements 333
9.1.2 Video Transmission over Mobile Communication Systems 335
9.1.3 Circuit-switched Bearers 339
9.1.4 Packet-switched Bearers 348
9.1.5 Video Communications over GPRS 350
9.1.6 GPRS Traffic Capacity 351
9.1.7 Error Performance 354
9.1.8 Video Communications over EGPRS 357
9.1.9 Traffic Characteristics 357
9.1.10 Error Performance 358
9.1.11 Voice Communication over Mobile Channels 359
Contents ix
9.1.12 Support of Voice over UMTS Networks 360
9.1.13 Error-free Performance 361
9.1.14 Error-prone Performance 362
9.1.15 Support of Voice over GPRS Networks 362
9.1.16 Conclusion 363

9.2 Link-level Quality Adaptation Techniques 365
9.2.1 Performance Modeling 365
9.2.2 Probability Calculation 367
9.2.3 Distortion Modeling 368
9.2.4 Propagation Loss Modeling 368
9.2.5 Energy-optimized UEP Scheme 369
9.2.6 Simulation Setup 370
9.2.7 Performance Analysis 372
9.2.8 Conclusion 373
9.3 Link Adaptation for Video Services 373
9.3.1 Time-varying Channel Model Design 374
9.3.2 Link Adaptation for Real-time Video Communications 379
9.3.3 Link Adaptation for Streaming Video Communications 389
9.3.4 Link Adaptation for UMTS 396
9.3.5 Conclusion 402
9.4 User-centric Radio Resource Management in UTRAN 403
9.4.1 Enhanced Call-adm ission Control Scheme 403
9.4.2 Implementatio n of UTRAN System-level Simulator 403
9.4.3 Performance Evaluation of Enhanced CAC Scheme 410
9.5 Conclusions 411
References 413
10 Quality Optimization for Cross-network Media Communications 417
10.1 Introduction 417
10.2 Generic Inter-networked QoS-optimization Infrastructure 418
10.2.1 State of the Art 418
10.2.2 Generic of QoS for Heterogeneous Networks 420
10.3 Implementation of a QoS-optimized Inter-networked Emulator 422
10.3.1 Emulation System Physical Link Layer Simulation 426
10.3.2 Emulation System Transmitter/Receiver Unit 428
10.3.3 QoS Mapping Architecture 428

10.3.4 General User Interface 438
10.4 Performances of Video Transmission in Inter-networked Systems 442
10.4.1 Experimental Setup 442
10.4.2 Test for the EDGE System 443
10.4.3 Test for the UMTS System 445
10.4.4 Tests for the EDGE-to-UMTS S ystem 445
10.5 Conclusions 452
References 453
x Contents
11 Context-based Visual Media Content Adaptation 455
11.1 Introduction 455
11.2 Overview of the State of the Art in Context-aware Content Adaptation 457
11.2.1 Recent Developments in Context-aware Systems 457
11.2.2 Standardization Efforts on Contextual Information for
Content Adaptation 467
11.3 Other Standardization Efforts by the IETF and W3C 476
11.4 Summary of Standardization Activities 479
11.4.1 Integrating Digital Rights Management (DRM) with Adaptation 480
11.4.2 Existing DRM Initiatives 480
11.4.3 The New ‘‘Adaptation Authorization’’ Concept 481
11.4.4 Adaptation Decision 482
11.4.5 Context-based Content Adaptation 488
11.5 Generation of Contextual Information and Profiling 492
11.5.1 Types and Representations of Contextual Information 492
11.5.2 Context Providers and Profiling 494
11.5.3 User Privacy 497
11.5.4 Generation of Contextual Information 498
11.6 The Application Scenario for Context-based Adaptation
of Governed Media Contents 499
11.6.1 Virtual Classroom Application Scenario 500

11.6.2 Mechanisms using Contextual Information in a Virtual
Collaboration Application 502
11.6.3 Ontologies in Context-aware Content Adaptation 503
11.6.4 System Architecture of a Scalable Platform for Context-aware
and DRM-enabled Content Adapta tion 504
11.6.5 Context Providers 507
11.6.6 Adaptation Decision Engine 510
11.6.7 Adaptation Authorization 514
11.6.8 Adaptation Engines Stack 517
11.6.9 Interfaces between Modules of the Content Adaptation Platform 544
11.7 Conclusions 552
References 553
Index 559
Contents xi

VISNET II Researchers
UniS
Omar Abdul-Hameed
Zaheer Ahmad
Hemantha Kodikara Arac hchi
Murat Badem
Janko Calic
Safak Dogan
Erhan Ekmekcioglu
Anil Fernando
Christine Glaser
Banu Gunel
Huseyin Hacihabiboglu
Hezerul Abdul Karim
Ahmet Kondoz

Yingdong Ma
Marta Mrak
Sabih Nasir
Gokce Nur
Surachai Ongkittikul
Kan Ren
Daniel Rodriguez
Amy Tan
Eeriwarawe Thushara
Halil Uzuner
Stephane Villette
Rajitha Weerakkody
Stewart Worrall
Lasith Yasakethu
HHI
Peter Eisert
J

urgen Rurainsky
Anna Hilsmann
Benjamin Prestele
David Schneider
Philipp Fechteler
Info Feldmann
Jens G

uther
Karsten Gr

uneberg

Oliver Schreer
Ralf Tanger
EPFL
Touradj Ebrahimi
Frederic Dufaux
Thien Ha-Minh
Michael Ansorge
Shuiming Ye
Yannick Maret
David Marimon
Ulrich Hoffmann
Mourad Ouaret
Francesca De Simone
Carlos Bandeirinha
Peter Vajda
Ashkan Yazdani
Gelareh Mohammadi
Alessandro Tortelli
Luca Bonardi
Davide Forzati
IST
Fernando Pereira
Jo
~
ao Ascenso
Catarina Brites
Luis Ducla Soares
Paulo Nunes
Paulo Correia
Jos


e Diogo Areia
Jos

e Quintas Pedro
Ricardo Martins
UPC-TSC
Pere Joaquim Mindan
Jos

e Luis Valenzuela
Toni Rama
Luis Torres
Francesc Tarr

es
UPC-AC
Jaime Delgado
Eva Rodrı
´
guez
Anna Carreras
Rub

en Tous
TRT-UK
Chris Firth
Tim Masterton
Adrian Waller
Darren Price

Rachel Craddock
Marcello Goccia
Ian Mockford
Hamid Asgari
Charlie Attwood
Peter de Waard
Jonathan Dennis
Doug Watson
Val Millington
Andy Vooght
TUB
Thomas Sikora
Zouhair Belkoura
Juan Jose Burred
Michael Droese
Ronald Glasberg
Lutz Goldmann
Shan Jin
Mustafa Karaman
Andreas Krutz
Amjad Samour
TiLab
Giovanni Cordara
Gianluca Francini
Skjalg Lepsoy
Diego Gibellino
UPF
Enric Peig

´

ctor Torres
Xavier Perramon
PoliMi
Fabio Antonacci
Calatroni Alberto
Marco Marcon
Matteo Naccari
Davide Onofrio
Giorgio Prandi
Riva Davide
Francesco Santagata
Marco Tagliasacchi
Stefano Tubaro
Giuseppe Valenzise
IPW
Stanisław Badura
Lilla Bagin
´
ska
Jarosław Baszun
Filip Borowski
Andrzej Buchowicz
Emil Dmoch
Edyta D ˛a browska
Grzegorz Galin
´
ski
Piotr Garbat
Krystian Ignasiak
Mariusz Jakubowski

Mariusz Leszczyn
´
ski
Marcin Morgos
´
Jacek Naruniec
Artur Nowakowski
Adam Ołdak
Grzegorz Pastuszak
Andrzej Pietrasiewicz
Adam Pietrowcew
Sławomir Rymaszewski
Radosław Sikora
Władysław Skarbek
Marek Sutkowski
Michał Tomaszewski
Karol Wnukowicz
INECS Porto
Giorgiana Ciobanu
Filipe Sousa
Jaime Cardoso
Jaime Dias
Jorge Mamede
Jos

e Ruela
Luı
´
s Corte-Real
Luı

´
s Gustavo Martins
Luı
´
s Filipe Teixeira
Maria Teresa Andrade
Pedro Carvalho
Ricardo Duarte

´
tor Barbosa
xiv VISNET II Researchers
Preface
VISNET II is a European Union Network o f Excellence (NoE) in the 6th Framew ork Programme,
which brings together 12 leading European organizations in the field of N etworked Audiovisual
Media Techn ologies. The consortium consists of organizations with a proven track record and
strong national a nd international reputations in audiovisual i n formation t echnolo gies. VIS NET II
integrates o ver 100 r esearchers who hav e m ade significant c ontributions to this field of
technology, through standardization activities, in ternational publications, c onferences workshop
activities, patents, and man y other prestigious achie v ements. The 12 integrated or ganizations
represent 7 European states s panning across a major part o f Europe, thereby promising ef fi cient
dissemination and exploitation of t he resulting technological de v elopment t o l ar ger c ommunities.
This book contains some of the research output of VISNET II in the area of Advanced
Video Coding and Networking. The book contains details of video coding principles, which
lead to advanced video coding developments in the form of scalable coding, distributed
video coding, non-normative video coding tools, and transform-based multi-view coding.
Having detailed the latest work in visual media coding, the networking aspects of video
communication are presented in the second part of the book. Various wireless channe l
models are presented, to form the basis for following chapters. Both link-level quality of
service (QoS) and cross-network transmission of compressed visual data are considered.

Finally, context-based visual media content adaptation is discussed with some examples.
It is hoped that this book will be used as a reference not only for some of the advanced
video coding techniques, but also for the transmission of video across various wireless
systems with well-defined channel models .
Ahmet Kondoz
University of Surrey
VISNET II Coordinator

Glossary of Abbreviations
3GPP 3rd Generation Partnership Project
AA Adaptation Authorizer
ADE Adaptation Decision Engine
ADMITS Adaptation in Distributed Multimedia IT Systems
ADTE Adaptation Decision Taking Engine
AE Adaptation Engine
AES Adaptation Engine Stack
AIR Adaptive Intra Refresh
API Application Programming Interface
AQoS Adaptation Quality of Service
ASC Aspect-Scale-Context
AV Audiovisual
AVC Advanced Video Coding
BLER Block Error Rate
BSD Bitstream Syntax Description
BSDL Bitstream Syntax Description Language
CC Convolutional Coding
CC Creative Commons
CC/PP Composite Capabilities/Preferences Profile
CD Coefficient Dropping
CDN Content Distribution Networks

CIF Common Intermediate Format
CoBrA Context Broker Architecture
CoDAMoS Context-Driven Adaptation of Mobile Services
CoOL Context Ontology Language
CoGITO Context Gatherer, Interpreter and Transformer using Ontologies
CPU Central Processing Unit
CROSLOCIS Creation of Smart Local City Services
CS/H.264/AVC Cropping and Scaling of H.264/AVC Encoded Video
CxP Context Provider
DAML Directory Access Markup Language
DANAE Dynamic and distributed Adaptation of scalable multimedia content
in a context-Aware Environment
dB Decibel
DB Database
DCT Discrete Cosine Transform
DI Digital Item
DIA Digital Item Adaptation
DID Digital Item Declaration
DIDL Digital Item Declaration Language
DIP Digital Item Processing
DistriNet Distributed Systems and Computer Networks
DPRL Digital Property Rights Language
DRM Digital Rights Management
DS Description Schemes
EC European Community
EIMS ENTHRONE Integrated Management Supervisor
FA Frame Adaptor
FD Frame Dropping
FMO Flexible Macroblock Ordering
FP Framework Program

gBS Generic Bitstream Syntax
HCI Human Computer Interface
HDTV High-Definition Television
HP Hewlett Packard
HTML HyperText Markup Language
IEC International Electrotechnical Commission
IETF Internet Engineering Task Force
IBM International Business Machines Corporation
iCAP Internet Content Adaptation Prot ocol
IPR Intellectual Property Rights
IROI Interactive Region of Interest
ISO International Organization for Standardization
IST Information Society Technologies
ITEC Department of Information Technology, Klagenfurt University
JPEG Joint Photographic Experts Group
JSVM Joint Scalable Video Model
MDS Multimedia Description Schemes
MB Macroblock
xviii Glossary of Abbreviations
MDS Multimedia Description Schemes
MIT Massachusetts Institute of Technology
MOS Mean Opinion Score
MP3 Moving Picture Experts Group Layer-3 Audio (audio file format/extension)
MPEG Motion Picture Experts Group
MVP Motion Vector Predictor
NAL Network Abstract Layer
NALU Network Abstract Layer Unit
NoE Network of Excellence
ODRL Open Digital Rights Language
OIL Ontology Interchange Language

OMA Open Mobile Alliance
OSCRA Optimized Source and Channel Rate Allocation
OWL Web Ontology Language
P2P Peer-to-Peer
PDA Personal Digital Assistance
PSNR Peak Signal-to-Noise Ratio
QCIF Quarter Common Intermedi ate Format
QoS Quality of Service
QP Quantization Parameter
RD Rate Distortion
RDF Resource Description Framework
RDB Reference Data Base
RDD Rights Data Dictionary
RDOPT Rate Distortion Optimization
REL Rights Expression Language
ROI Region of Interest
SECAS Simple Environment for Context-Aware Systems
SNR Signal-to-Noise Ratio
SOAP Simple Object Access Protocol
SOCAM Service-Oriented Context-Aware Middleware
SVC Scalable Video Coding
TM5 Test Model 5
UaProf User Agent Profile
UCD Universal Constraints Descriptor
UED Usage Environment Descriptions
UEP Unequal Error Protection
UF Utility Function
Glossary of Abbreviations xix
UI User Item
UMA Universal Multimedia Access

UMTS Universal Mobile Telecommunications System
URI Uniform Resource Identifiers
UTRAN UMTS Terrestrial Radio Access Network
VCS Virtual Collaboration System
VoD Video on Demand
VOP Video Object Plane
VQM Video Quality Metric
W3C World Wide Web Consortium
WAP Wireless Access Protocol
WCDMA Wideband Code Division Multiple Access
WDP Wireless Datagram Protocol
WLAN Wireless Local Area Network
WML Website Meta Language
WiFi Wireless Fidelity (IEEE 802.11b Wireless Networking)
XML eXtensible Markup Language
XrML eXtensible rights Markup Language
XSLT eXtensible Stylesheet Language Transformations
xx Glossary of Abbreviations
1
Introduction
Networked Audio-Visual Technologies form the basis for the multimedia communication
systems that we currently use. The communication systems that must be supported are diverse,
ranging from fixed wired to mobile wireless systems. In order to enable an efficient and cost-
effective Networked Audio-Visual System, two major technological areas need to be investi-
gated: first, how to process the content for transmission purposes, which involves various media
compression processes; and second, how to transport it over the diverse network technologies
that are currently in use or will be deployed in the near future. In this book, therefore, visual data
compression schemes are presented first, followed by a description of various media trans-
mission aspects, including various channel models, and content and link adaptation techniques.
Raw digital video sign als are very large in size, making it very difficult to transmit or store

them. Video compression techniques are therefore essential enabling technologie s for digital
multimedia applications. Since 1984, a wide range of digital video codecs have been
standardized, each of which represents a step forward either in terms of compression efficiency
or in functionality. The MPEG-x and H.26x video coding standards adopt a hybrid coding
approach, employing block-matching motion estimation/compensation, in addition to the
discrete cosine transform (DCT) and quantization. The reasons are: first, a significant
proportion of the motion trajectories found in natural video can be approximately described
with a rigid translational motion model; second, fewer bits are required to describe simple
translational motion; and finally, the implementation is relatively straightforward and amena-
ble to hardware solutions. These hybrid video systems have provided interoperability in
heterogeneous network systems. Considering that transmission bandwidth is still a valuable
commodity, ongoing developments in video coding seek scalability solutions to achieve a
one-coding multiple-decoding feature. To this end, the Joint Video Team of the ITU-T Video
Coding Expert Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) have
standardized a scalability extension to the existing H.264/AVC codec. The H.264-based
Scalable Video Coding (SVC) allows partial transmission and decoding to the bit stream,
resulting in various options in terms of picture quality and spatial-temporal resolutions.
In this book, several advanced features/techniques relating to scalable video coding are
further described, mostly to do with 3D scalable video coding applications. Applications and
scenarios for the scalable coding systems, advances in scalable video coding for 3D video
applications, a non-standardized scalable 2D model-based video coding scheme applied on the
Visual Media Coding and Transmission Ahmet Kondoz
© 2009 John Wiley & Sons, Ltd. ISBN: 978-0-470-74057-6
texture, and depth coding of 3D video are all discussed. A scalable, multiple description coding
(MDC) application for stereoscopic 3D video is detailed. Multi-view coding and Distributed
Video Coding concepts representing the latest advancements in video coding are also covered
in significant depth.
The definition of video coding standards is of the utmost importance because it guarantees
that video coding equipment from different manufacturers will be able to interoperate.
However, the definition of a standard also represents a significant constraint for manufacturers

because it limits what they can do. Therefore, in order to minimize the restrictions imposed on
manufacturers, only those tools that are essential for interoperability are typically specified in
the standard: the normative tools. The remaining tools, which are not standardized but are also
important in video coding systems, are referred to as non-normative tools and this is where
competition and evolution of the technology have been taking place. In fact, this strategy of
specifying only the bare minimum that can guarantee interoperability ensures that the latest
developments in the area of non-normative tools can be easily incorporated in video codecs
without compromising their standard compatibility, even after the standard has been finalized.
In addition, this strategy makes it possible for manufacturers to compete against each other and
to distinguish between their products in the market. A significant amount of research effort is
being devoted to the development of non-normative video coding tools, with the target of
improving the performance of standard video codecs. In particular, due to their importance, rate
control and error resilience non-normative tools are being researched. In this book, therefore,
the development of efficient tools for the modules that are non-normative in video coding
standards, such as rate control and error concealment, is discussed. For example, multiple video
sequence (MVS) joint rate control addresses the development of rate control solutions for
encoding video scenes formed from a composition of video objects (VOs), such as in the
MPEG-4 standard, and can also be applied to the joint encoding and transcoding of multiple
video sequences (VSs) to be transmitted over bandwidth-limited channels using the H.264/
AVC standard.
The goal of wireless communication is to allow a user to access required services at any time
with no regard to location or mobility. Recent developments in wireless communications,
multimedia technologies, and microelectronics technologies have created a new paradigm in
mobile communications. Third/fourth-generation (3G/4G) wireless communication technol-
ogies provide significantly higher transmission rates and service flexibility over a wide
coverage area, as compared with second-generation (2G) wireless communication systems.
High-compression, error-robust multimedia codecs have been designed to enabl e the support
of multimedia application over error-prone bandwidth-limited channels. The advances of
VLSI and DSP technologies are enabling lightweight, low-cost, portable devices capable of
transmitting and viewing multimedia streams. The above technological developments have

shifted the service requirements of mobile communication from conventional voice telephony
to business- and entertainment-oriented multimedia services in wireless communication
systems. In order to successfully meet the challenges set by the latest current and future
audiovisual communication requirements, the International Telecommunication Union-Radio
communications (ITU-R) sector has elaborated on a framework for global 3G standards by
recognizing a limited number of radio access technologies. These are: Universal Mobile
Telecommunications System (UMTS), Enhanced Data rates for GSM Evolution (EDGE), and
CDMA2000. UMTS is based on Wideband CDMA technol ogy and is employed in Europe and
Asia using the frequency band around 2 GHz. EDGE is based on TDMA technology and uses
2 Visual Media Coding and Transmission
the same air interface as the successful 2G mobile system GSM. General Packet Radio Service
(GPRS) and High-Speed Circuit Switched Data (HSCSD) are introduced by Phase 2 þ of the
GSM standardization process. They support enhanced services with data rates up to 144 kbps in
the packet-switched and circuit-switched domains, respectively. EDGE, which is the evolution
of GPRS and HSCSD, provides 3G services up to 500 kbps within GSM carrier spacing of
200 kHz. CDMA2000 is based on multi-carrier CDMA technology and provides the upgraded
solution for existing IS-95 operators, mainly in North America. EDGE and UMTS are the most
widely accepted 3G radio access technologies. They are standardised by the 3
rd
Generation
Partnership Project (3GPP). Even though EDGE and UMTS are based on two different
multiple-access technologies, both systems share the same core network. The evolved GSM
core network serves for a common GSM/UMTS core network that supports GSM/GPRS/
EDGE and UMTS access. In addition, Wireless Local Area Networks (WLAN) are becoming
more and more popular for communication at homes, offices and indoor public areas such as
campus environments, airports, hotels, shopping centres and so on. IEEE 802.11 has a number
of physical layer specifications with a common MAC operation. IEEE 802.11 includes two
physical layers a frequency-hopping spread-spectrum (FHSS) physical layer and a direct-
sequence spread-spectrum (DSSS) physical layer and operates at 2 Mbps. The currently
deployed IEEE 802.11b standard provides an additional physical layer based on a high-rate

direct-sequence spread-spectrum (HR/DSSS). It operates in the 2.4 GHz unlicensed band and
provides bit rates up to 11 Mbps. IEEE 802.11a standard for 5 GHz band provides high bit rates
up to 54 Mbps and uses a physical layer based on orthogonal frequency division multiplexing
(OFDM). Recently, IEE E 802.11g standard has also been issued to achieve such high bit rates
in the 2.4 GHz band.
The Worldwide Interoperability for Microwave Access (WiMAX) is a telecommunications
technology aimed at providing wireless data over long distances in different ways, from point-
to-point links to full mobile cellular access. It is based on the IEEE 802.16 standard, which is
also called WirelessMAN. The name WiMAX was created by the WiMAX Forum, which was
formed in June 2001 to promote conformance and interoperability of the standard. The forum
describes WiMAX as “a standards-based technology enabling the delivery of last mile wireless
broadband access as an alternative to cable and DSL”. Mobile WiMAX IEEE 802.16e provides
fixed, nomadic and mobile broadband wireless access systems with superior throughput
performance. It enables non-line-of-sight reception, and can also cope with high mobility
of the receiving station. The IEEE 802.16e enables nomadic capabilities for laptops and other
mobile devices, allowing users to benefit from metro area portability of an xDSL-like service.
Multimedia services by definition require the transmission of multiple media streams, such
as video, still picture, music, voice, and text data. A combination of these media types provides
a number of value-added services, including video telephony, E-commerce services, multi-
party video conferencing, virtual office, and 3D video. 3D video, for example, provides more
natural and immersive visual information to end users than standard 2D video. In the near
future, certain 2D video application scenarios are likely be replaced by 3D video in order to
achieve a more involving and immersive representation of visual information and to provide
more natural methods of communication. 3D video transmission, however, requires more
resources than the conventional video communication applications.
Different media types have different quality-of-service (QoS) requirements and enforce
conflicting constraints on the communication networks. Still picture and text data are
categorized as background services and require high data rates but have no constraints on
Introduction 3
the transmission delay. Voice services, on the other hand, are characterized by low delay.

However, they can be coded using fixed low-rate algorithms operating in the 5 24 kbps range.
In contrast to voice and data services, low-bit-rate video coding involves rates at tens to
hundreds of kbps. Moreover, video applications are delay sensitive and impose tight constraints
on system resources. Mobile multimedia applications, consisting of multiple signal types, play
an important role in the rapid penetration of future communication services and the success of
these communication systems. Even though the high transmission rates and service flexibility
have made wireless multimedia communication possible over 3G/4G wireless communication
systems, many challenges remain to be addressed in order to support efficient communications
in multi-user, multi-service environments. In addition to the high initial cost associated with the
deployment of 3G systems, the move from telephony and low-bit-rate data services to
bandwidth-consuming 3G services implies high system costs, as these consume a large
portion of the available resources. However, for rapid market evolvement, these wideband
services should not be substantially more expensive than the services offered today. Therefore,
efficient system resource (mainly the bandwidth-limited radio resource) utilization and QoS
management are critical in 3G/4G systems.
Efficient resource management and the provision of QoS for multimedia applications are in
sharp conflict with one another. Of course, it is possible to provide high-quality multimedia
services by using a large amount of radio resources and very strong channel protection.
However, this is clearly inefficient in terms of system resource allocation. Moreover, the
perceptual multimedia quality received by end users depends on many factors, such as source
rate, channel protection, channel quality, error resilience techniques, transmission/processing
power, system load, and user interference. Therefore, it is difficult to obtain an optimal source
and network parameter combination for a given set of source and channel characteristics. The
time-varying error characteristics of the radio access channel aggravate the problem. In this
book, therefore, various QoS-based resource management systems are detailed. For compari-
son and validation purposes, a number of wireless channel models are described. The key QoS
improvement techniques, including content and link-adaptation techniques, are covered.
Future media Internet will allow new applications with support for ubiquitous media-rich
content service technologies to be realized. Virtual collaboration, extended home platforms,
augmented, mixed and virtual realities, gaming, telemedicine, e-learning and so on, in which

users with possibly diverse geographical locations, terminal types, connectivity, usage
environments, and preferences access and exchange pervasive yet protected and trusted
content, are just a few examples. These multiple forms of diversity requires content to be
transported and rendered in different forms, which necessitates the use of context-aware
content adaptation. This avoids the alternative of predicting, generating and storing all the
different forms required for every item of content. Therefore, there is a growing need for
devising adequate concepts and functionalities of a context-aware content adaptation platform
that suits the requirements of such multimedia application scenarios. This platform needs to be
able to consume low-level contextual information to infer higher-level contexts, and thus
decide the need and type of adaptation operations to be performed upon the content. In this way,
usage constraints can be met while restrictions imposed by the Digital Rights Management
(DRM) governing the use of protected content are satisfied.
In this book, comprehensive discu ssions are presented on the use of contextual information
in adaptation decision operations, with a view to managing the DRM and the authorization
for adaptation, consequently outlining the appropriate adaptation decision techniques and
4 Visual Media Coding and Transmission
adaptation mechanisms. The main challenges are found by identifying integrated tools and
systems that support adaptive, context-aware and distributed applications which reac t to the
characteristics and conditions of the usage environment and provide transparent access and
delivery of content, where digital rights are adequately managed. The discussions focus on
describing a scalable platform for context-aware and DRM-enabled adaptation of multimedia
content. The platform has a modular architecture to ensure scalability, and well-defined
interfaces based on open standards for interoperability as well as portability. The modules are
classified into four categories, namely: 1. Adaptation Decision Engine (ADE); 2. Adaptation
Authoriser (AA); 3. Context Providers (CxPs); and 4. Adaptation Engine Stacks (AESs),
which comprise Adaptation Engines (AEs). During the adaptation decision-taking stage the
platform uses ontologies to enable semantic description of real-world situations. The decision-
taking process is triggered by low-level contextual information and driven by rules provided by
the ontologies. It supports a variety of adaptations, which can be dynamically configured. The
overall objective of this platform is to enable the efficient gathering and use of context

information, ultimately in order to build content adaptation applications that maximize user
satisfaction.
Introduction 5

×