Tải bản đầy đủ (.pdf) (599 trang)

Information theory and network coding

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.36 MB, 599 trang )



Raymond W. Yeung

Information Theory and
Network Coding
SPIN Springer’s internal project number, if known

May 31, 2008

Springer



To my parents and my family



Preface

This book is an evolution from my book A First Course in Information Theory
published in 2002 when network coding was still at its infancy. The last few
years have witnessed the rapid development of network coding into a research
field of its own in information science. With its root in information theory,
network coding not only has brought about a paradigm shift in network communications at large, but also has had significant influence on such specific
research fields as coding theory, networking, switching, wireless communications, distributed data storage, cryptography, and optimization theory. While
new applications of network coding keep emerging, the fundamental results
that lay the foundation of the subject are more or less mature. One of the
main goals of this book therefore is to present these results in a unifying and
coherent manner.
While the previous book focused only on information theory for discrete


random variables, the current book contains two new chapters on information
theory for continuous random variables, namely the chapter on differential
entropy and the chapter on continuous-valued channels. With these topics
included, the book becomes more comprehensive and is more suitable to be
used as a textbook for a course in an electrical engineering department.
What is in this book
Out of the twenty-one chapters in this book, the first sixteen chapters
belong to Part I, Components of Information Theory, and the last five chapters
belong to Part II, Fundamentals of Network Coding. Part I covers the basic
topics in information theory and prepare the reader for the discussions in
Part II. A brief rundown of the chapters will give a better idea of what is in
this book.
Chapter 1 contains a high level introduction to the contents of this book.
First, there is a discussion on the nature of information theory and the main
results in Shannon’s original paper in 1948 which founded the field. There are
also pointers to Shannon’s biographies and his works.


VIII

Preface

Chapter 2 introduces Shannon’s information measures for discrete random
variables and their basic properties. Useful identities and inequalities in information theory are derived and explained. Extra care is taken in handling
joint distributions with zero probability masses. There is a section devoted
to the discussion of maximum entropy distributions. The chapter ends with a
section on the entropy rate of a stationary information source.
Chapter 3 is an introduction to the theory of I-Measure which establishes
a one-to-one correspondence between Shannon’s information measures and
set theory. A number of examples are given to show how the use of information diagrams can simplify the proofs of many results in information theory.

Such diagrams are becoming standard tools for solving information theory
problems.
Chapter 4 is a discussion of zero-error data compression by uniquely decodable codes, with prefix codes as a special case. A proof of the entropy
bound for prefix codes which involves neither the Kraft inequality nor the
fundamental inequality is given. This proof facilitates the discussion of the
redundancy of prefix codes.
Chapter 5 is a thorough treatment of weak typicality. The weak asymptotic equipartition property and the source coding theorem are discussed. An
explanation of the fact that a good data compression scheme produces almost
i.i.d. bits is given. There is also an introductory discussion of the ShannonMcMillan-Breiman theorem. The concept of weak typicality will be further
developed in Chapter 10 for continuous random variables.
Chapter 6 contains a detailed discussion of strong typicality which applies
to random variables with finite alphabets. The results developed in this chapter will be used for proving the channel coding theorem and the rate-distortion
theorem in the next two chapters.
The discussion in Chapter 7 of the discrete memoryless channel is an enhancement of the discussion in the previous book. In particular, the new definition of the discrete memoryless channel enables rigorous formulation and
analysis of coding schemes for such channels with or without feedback. The
proof of the channel coding theorem uses a graphical model approach that
helps explain the conditional independence of the random variables.
Chapter 8 is an introduction to rate-distortion theory. The version of the
rate-distortion theorem here, proved by using strong typicality, is a stronger
version of the original theorem obtained by Shannon.
In Chapter 9, the Blahut-Arimoto algorithms for computing the channel
capacity and the rate-distortion function are discussed, and a simplified proof
for convergence is given. Great care is taken in handling distributions with
zero probability masses.
Chapter 10 and Chapter 11 are the two chapters devoted to the discussion of information theory for continuous random variables. Chapter 10 introduces differential entropy and related information measures, and their basic
properties are discussed. The asymptotic equipartion property for continuous
random variables is proved. The last section on maximum differential entropy


Preface


IX

distributions echos the section in Chapter 2 on maximum entropy distributions.
Chapter 11 discusses a variety of continuous-valued channels, with the
continuous memoryless channel being the basic building block. In proving the
capacity of the memoryless Gaussian channel, a careful justification is given
for the existence of the differential entropy of the output random variable.
Based on this result, the capacity of a system of parallel/correlated Gaussian channels is obtained. Heuristic arguments leading to the formula for the
capacity of the bandlimited white/colored Gaussian channel are given. The
chapter ends with a proof of the fact that zero-mean Gaussian noise is the
worst additive noise.
Chapter 12 explores the structure of the I-Measure for Markov structures.
Set-theoretic characterizations of full conditional independence and Markov
random field are discussed. The treatment of Markov random field here maybe
too specialized for the average reader, but the structure of the I-Measure and
the simplicity of the information diagram for a Markov chain is best explained
as a special case of a Markov random field.
Information inequalities are sometimes called the laws of information theory because they govern the impossibilities in information theory. In Chapter 13, the geometrical meaning of information inequalities and the relation
between information inequalities and conditional independence are explained
in depth. The framework for information inequalities discussed here is the
basis of the next two chapters.
Chapter 14 explains how the problem of proving information inequalities
can be formulated as a linear programming problem. This leads to a complete
characterization of all information inequalities provable by conventional techniques. These inequalities, called Shannon-type inequalities, can be proved by
the World Wide Web available software package ITIP. It is also shown how
Shannon-type inequalities can be used to tackle the implication problem of
conditional independence in probability theory.
Shannon-type inequalities are all the information inequalities known during the first half century of information theory. In the late 1990’s, a few new
inequalities, called non-Shannon-type inequalities, were discovered. These inequalities imply the existence of laws in information theory beyond those laid

down by Shannon. In Chapter 15, we discuss these inequalities and their applications.
Chapter 16 explains an intriguing relation between information theory
and group theory. Specifically, for every information inequality satisfied by
any joint probability distribution, there is a corresponding group inequality
satisfied by any finite group and its subgroups, and vice versa. Inequalities
of the latter type govern the orders of any finite group and their subgroups.
Group-theoretic proofs of Shannon-type information inequalities are given. At
the end of the chapter, a group inequality is obtained from a non-Shannontype inequality discussed in Chapter 15. The meaning and the implication of
this inequality are yet to be understood.


X

Preface

Chapter 17 starts Part II of the book with a discussion of the butterfly
network, the primary example in network coding. Variations of the butterfly
network are analyzed in detail. The advantage of network coding over storeand-forward in wireless and satellite communications is explained through a
simple example. We also explain why network coding with multiple information sources is substantially different from network coding with a single
information source.
In Chapter 18, the fundamental bound for single-source network coding,
called the max-flow bound, is explained in detail. The bound is established
for a general class of network codes.
In Chapter 19, we discuss various classes of linear network codes on acyclic
networks that achieve the max-flow bound to different extents. Static network codes, a special class of linear network codes that achieves the max-flow
bound in the presence of channel failure, is also discussed. Polynomial-time
algorithms for constructing these codes are presented.
In Chapter 20, we formulate and analyze convolutional network codes on
cyclic networks. The existence of such codes that achieve the max-flow bound
is proved.

Network coding theory is further developed in Chapter 21. The scenario
when more than one information source are multicast in a point-to-point
acyclic network is discussed. An implicit characterization of the achievable
information rate region which involves the framework for information inequalities developed in Part I is proved.
How to use this book

1

4
11
7

10
Part I

2

3

6
5

9
8

12

Part II

17


13

14

15

16

18

19

20

21


Preface

XI

Part I of this book by itself may be regarded as a comprehensive textbook
in information theory. The main reason why the book is in the present form
is because in my opinion, the discussion of network coding in Part II is incomplete without Part I. Nevertheless, except for Chapter 21 on multi-source
network coding, Part II by itself may be used satisfactorily as a textbook on
single-source network coding.
An elementary course on probability theory and an elementary course
on linear algebra are prerequisites to Part I and Part II, respectively. For
Chapter 11, some background knowledge on digital communication systems

would be helpful, and for Chapter 20, some prior exposure to discrete-time
linear systems is necessary. The reader is recommended to read the chapters
according to the above chart. However, one will not have too much difficulty
jumping around in the book because there should be sufficient references to
the previous relevant sections.
This book inherits the writing style from the previous book, namely that all
the derivations are from the first principle. The book contains a large number
of examples, where important points are very often made. To facilitate the
use of the book, there is a summary at the end of each chapter.
This book can be used as a textbook or a reference book. As a textbook,
it is ideal for a two-semester course, with the first and second semesters covering selected topics from Part I and Part II, respectively. A comprehensive
instructor’s manual is available upon request. Please contact the author at
for information and access.
Just like any other lengthy document, this book for sure contains errors
and omissions. To alleviate the problem, an errata will be maintained at the
book homepage book2/.

Hong Kong, China
December, 2007

Raymond W. Yeung



Acknowledgments

The current book, an expansion of my previous book A First Course in Information Theory, is written within the year 2007. Thanks to the generous
support of the Friedrich Wilhelm Bessel Research Award from the Alexander von Humboldt Foundation of Germany, I had the luxury of working on
the project full-time from January to April when I visited Munich University
of Technology. I would like to thank Joachim Hagenauer and Ralf Koetter

for nominating me for the award and for hosting my visit. I also would like
to thank Department of Information Engineering, The Chinese University of
Hong Kong, for making this arrangement possible.
There are many individuals who have directly or indirectly contributed to
this book. First, I am indebted to Toby Berger who taught me information
theory and writing. I am most thankful to Zhen Zhang, Ning Cai, and Bob Li
for their friendship and inspiration. Without the results obtained through our
collaboration, the book cannot possibly be in its current form. I would also like
to thank Venkat Anantharam, Vijay Bhargava, Dick Blahut, Agnes and Vincent Chan, Tom Cover, Imre Csisz´ar, Tony Ephremides, Bob Gallager, Bruce
Hajek, Te Sun Han, Jim Massey, Prakash Narayan, Alon Orlitsky, Shlomo
Shamai, Sergio Verd´
u, Victor Wei, Frans Willems, and Jack Wolf for their
support and encouragement throughout the years. I also would like to thank
all the collaborators of my work for their contribution and all the anonymous
reviewers for their useful comments.
I would like to thank a number of individuals who helped in the project.
I benefited tremendously from the discussions with David Tse who gave a lot
of suggestions for writing the chapters on differential entropy and continuousvalued channels. Terence Chan, Ka Wo Cheung, Bruce Hajek, Siu-Wai Ho,
Siu Ting Ho, Tat Ming Lok, Prakash Narayan, Will Ng, Sagar Shenvi, XiangGen Xia, Shaohua Yang, Ken Zeger, and Zhixue Zhang gave many valuable
comments at different stages of the writing. My graduate students Silas Fong,
Min Tan, and Shenghao Yang proofread the chapters on network coding in
great detail. Silas Fong also helped compose the figures throughout the book.


XIV

Acknowledgments

On the domestic side, I am most grateful to my wife Rebecca for her love.
During our stay in Munich, she took good care of the whole family so that I

was able to concentrate on my writing. We are most thankful to our family
friend Ms. Pui Yee Wong for taking care of Rebecca when she was ill during the
final stage of the project, and to my sister Georgiana for her moral support.
In this regard, we are indebted to Dr. Yu Lap Yip for his timely diagnosis.
I also would like to thank my sister-in-law Ophelia Tsang who comes over
during the weekend to help taking of our daughter Shannon, who continues
to the sweetheart of the family and was most supportive during the time her
mom was ill.


Contents

1

The Science of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Part I Components of Information Theory
2

3

Information Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Independence and Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Shannon’s Information Measures . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Continuity of Shannon’s Information Measures for Fixed
Finite Alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Chain Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Informational Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.6 The Basic Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Some Useful Information Inequalities . . . . . . . . . . . . . . . . . . . . . . .
2.8 Fano’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Maximum Entropy Distributions . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Entropy Rate of a Stationary Source . . . . . . . . . . . . . . . . . . . . . . .
Appendix 2.A: Approximation of Random Variables with
Countably Infinite Alphabets by Truncation . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The I-Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 The I-Measure for Two Random Variables . . . . . . . . . . . . . . . . . .
3.3 Construction of the I-Measure µ* . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 µ* Can be Negative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Information Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Examples of Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix 3.A: A Variation of the Inclusion-Exclusion Formula . . . . .

7
7
12
18
21
23
26
28
32
36
38

41
43
45
49
51
52
53
55
59
61
67
74


XVI

Contents

Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4

Zero-Error Data Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 The Entropy Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Prefix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Definition and Existence . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Redundancy of Prefix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81
82
86
86
88
93
97
98
99

5

Weak Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1 The Weak AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 The Source Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Efficient Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 The Shannon-McMillan-Breiman Theorem . . . . . . . . . . . . . . . . . . 107
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6

Strong Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.1 Strong AEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2 Strong Typicality Versus Weak Typicality . . . . . . . . . . . . . . . . . . 121
6.3 Joint Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.4 An Interpretation of the Basic Inequalities . . . . . . . . . . . . . . . . . . 131
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7

Discrete Memoryless Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.1 Definition and Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.2 The Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.3 The Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.4 Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.5 A Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.6 Feedback Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.7 Separation of Source and Channel Coding . . . . . . . . . . . . . . . . . . 172
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


Contents

XVII

8

Rate-Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.1 Single-Letter Distortion Measures . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.2 The Rate-Distortion Function R(D) . . . . . . . . . . . . . . . . . . . . . . . 187
8.3 The Rate-Distortion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

8.4 The Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.5 Achievability of RI (D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

9

The Blahut-Arimoto Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.1 Alternating Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
9.2 The Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.2.1 Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.2.2 The Rate-Distortion Function . . . . . . . . . . . . . . . . . . . . . . . 219
9.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3.1 A Sufficient Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3.2 Convergence to the Channel Capacity . . . . . . . . . . . . . . . . 225
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

10 Differential Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
10.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
10.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10.3 Joint and Conditional Differential Entropy . . . . . . . . . . . . . . . . . . 238
10.4 The AEP for Continuous Random Variables . . . . . . . . . . . . . . . . 245
10.5 Informational Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.6 Maximum Differential Entropy Distributions . . . . . . . . . . . . . . . . 248
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

11 Continuous-Valued Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
11.1 Discrete-Time Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
11.2 The Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.3 Proof of the Channel Coding Theorem . . . . . . . . . . . . . . . . . . . . . 262
11.3.1 The Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
11.3.2 Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.4 Memoryless Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.5 Parallel Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.6 Correlated Gaussian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.7 The Bandlimited White Gaussian Channel . . . . . . . . . . . . . . . . . . 280
11.8 The Bandlimited Colored Gaussian Channel . . . . . . . . . . . . . . . . 287
11.9 Zero-Mean Gaussian Noise is the Worst Additive Noise . . . . . . . 289


XVIII Contents

Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
12 Markov Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
12.1 Conditional Mutual Independence . . . . . . . . . . . . . . . . . . . . . . . . . 300
12.2 Full Conditional Mutual Independence . . . . . . . . . . . . . . . . . . . . . 309
12.3 Markov Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
12.4 Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
13 Information Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.1 The Region Γ ∗n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
13.2 Information Expressions in Canonical Form . . . . . . . . . . . . . . . . . 326

13.3 A Geometrical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
13.3.1 Unconstrained Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 329
13.3.2 Constrained Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
13.3.3 Constrained Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
13.4 Equivalence of Constrained Inequalities . . . . . . . . . . . . . . . . . . . . 333
13.5 The Implication Problem of Conditional Independence . . . . . . . 336
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
14 Shannon-Type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
14.1 The Elemental Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
14.2 A Linear Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 341
14.2.1 Unconstrained Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 343
14.2.2 Constrained Inequalities and Identities . . . . . . . . . . . . . . . 344
14.3 A Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
14.4 Machine Proving – ITIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
14.5 Tackling the Implication Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.6 Minimality of the Elemental Inequalities . . . . . . . . . . . . . . . . . . . . 353
Appendix 14.A: The Basic Inequalities and the Polymatroidal
Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
15 Beyond Shannon-Type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . 361
15.1 Characterizations of Γ ∗2 , Γ ∗3 , and Γ ∗n . . . . . . . . . . . . . . . . . . . . . . 361
15.2 A Non-Shannon-Type Unconstrained Inequality . . . . . . . . . . . . . 369
15.3 A Non-Shannon-Type Constrained Inequality . . . . . . . . . . . . . . . 374


Contents


XIX

15.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
16 Entropy and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
16.1 Group Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
16.2 Group-Characterizable Entropy Functions . . . . . . . . . . . . . . . . . . 393
16.3 A Group Characterization of Γ ∗n . . . . . . . . . . . . . . . . . . . . . . . . . . 398
16.4 Information Inequalities and Group Inequalities . . . . . . . . . . . . . 401
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Part II Fundamentals of Network Coding
17 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
17.1 The Butterfly Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
17.2 Wireless and Satellite Communications . . . . . . . . . . . . . . . . . . . . . 415
17.3 Source Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
18 The Max-Flow Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
18.1 Point-to-Point Communication Networks . . . . . . . . . . . . . . . . . . . 421
18.2 Examples Achieving the Max-Flow Bound . . . . . . . . . . . . . . . . . . 424
18.3 A Class of Network Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
18.4 Proof of the Max-Flow Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
19 Single-Source Linear Network Coding: Acyclic Networks . . 435
19.1 Acyclic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
19.2 Linear Network Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
19.3 Desirable Properties of a Linear Network Code . . . . . . . . . . . . . . 442
19.4 Existence and Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
19.5 Generic Network Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
19.6 Static Network Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
19.7 Random Network Coding: A Case Study . . . . . . . . . . . . . . . . . . . 473
19.7.1 How the System Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
19.7.2 Model and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478


XX

Contents

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
20 Single-Source Linear Network Coding: Cyclic Networks . . . . 485
20.1 Delay-Free Cyclic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
20.2 Convolutional Network Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
20.3 Decoding of Convolutional Network Codes . . . . . . . . . . . . . . . . . . 498
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
21 Multi-Source Network Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
21.1 The Max-Flow Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
21.2 Examples of Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

21.2.1 Multilevel Diversity Coding . . . . . . . . . . . . . . . . . . . . . . . . . 508
21.2.2 Satellite Communication Network . . . . . . . . . . . . . . . . . . . 510
21.3 A Network Code for Acyclic Networks . . . . . . . . . . . . . . . . . . . . . 510
21.4 The Achievable Information Rate Region . . . . . . . . . . . . . . . . . . . 512
21.5 Explicit Inner and Outer Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . 515
21.6 The Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
21.7 Achievability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
21.7.1 Random Code Construction . . . . . . . . . . . . . . . . . . . . . . . . 524
21.7.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561


1
The Science of Information

In a communication system, we try to convey information from one point to
another, very often in a noisy environment. Consider the following scenario. A
secretary needs to send facsimiles regularly and she wants to convey as much
information as possible on each page. She has a choice of the font size, which
means that more characters can be squeezed onto a page if a smaller font size
is used. In principle, she can squeeze as many characters as desired on a page
by using a small enough font size. However, there are two factors in the system
which may cause errors. First, the fax machine has a finite resolution. Second,
the characters transmitted may be received incorrectly due to noise in the
telephone line. Therefore, if the font size is too small, the characters may not
be recognizable on the facsimile. On the other hand, although some characters

on the facsimile may not be recognizable, the recipient can still figure out the
words from the context provided that the number of such characters is not
excessive. In other words, it is not necessary to choose a font size such that
all the characters on the facsimile are recognizable almost surely. Then we are
motivated to ask: What is the maximum amount of meaningful information
which can be conveyed on one page of facsimile?
This question may not have a definite answer because it is not very well
posed. In particular, we do not have a precise measure of meaningful information. Nevertheless, this question is an illustration of the kind of fundamental
questions we can ask about a communication system.
Information, which is not a physical entity but an abstract concept, is hard
to quantify in general. This is especially the case if human factors are involved
when the information is utilized. For example, when we play Beethoven’s
violin concerto from an audio compact disc, we receive the musical information
from the loudspeakers. We enjoy this information because it arouses certain
kinds of emotion within ourselves. While we receive the same information
every time we play the same piece of music, the kinds of emotions aroused
may be different from time to time because they depend on our mood at
that particular moment. In other words, we can derive utility from the same
information every time in a different way. For this reason, it is extremely


2

1 The Science of Information

difficult to devise a measure which can quantify the amount of information
contained in a piece of music.
In 1948, Bell Telephone Laboratories scientist Claude E. Shannon (19162001) published a paper entitled “The Mathematical Theory of Communication” [322] which laid the foundation of an important field now known as
information theory. In his paper, the model of a point-to-point communication
system depicted in Figure 1.1 is considered. In this model, a message is generINFORMATION

SOURCE
TRANSMITTER

RECEIVER

SIGNAL

RECEIVED
SIGNAL

MESSAGE

DESTINATION

MESSAGE

NOISE
SOURCE

Fig. 1.1. Schematic diagram for a general point-to-point communication system.

ated by the information source. The message is converted by the transmitter
into a signal which is suitable for transmission. In the course of transmission,
the signal may be contaminated by a noise source, so that the received signal
may be different from the transmitted signal. Based on the received signal,
the receiver then makes an estimate on the message and deliver it to the
destination.
In this abstract model of a point-to-point communication system, one is
only concerned about whether the message generated by the source can be
delivered correctly to the receiver without worrying about how the message

is actually used by the receiver. In a way, Shannon’s model does not cover all
possible aspects of a communication system. However, in order to develop a
precise and useful theory of information, the scope of the theory has to be
restricted.
In [322], Shannon introduced two fundamental concepts about “information” from the communication point of view. First, information is uncertainty.
More specifically, if a piece of information we are interested in is deterministic,
then it has no value at all because it is already known with no uncertainty.
From this point of view, for example, the continuous transmission of a still
picture on a television broadcast channel is superfluous. Consequently, an
information source is naturally modeled as a random variable or a random
process, and probability is employed to develop the theory of information.
Second, information to be transmitted is digital. This means that the information source should first be converted into a stream of 0’s and 1’s called bits,


1 The Science of Information

3

and the remaining task is to deliver these bits to the receiver correctly with no
reference to their actual meaning. This is the foundation of all modern digital
communication systems. In fact, this work of Shannon appears to contain the
first published use of the term “bit,” which stands for binary digit.
In the same work, Shannon also proved two important theorems. The first
theorem, called the source coding theorem, introduces entropy as the fundamental measure of information which characterizes the minimum rate of a
source code representing an information source essentially free of error. The
source coding theorem is the theoretical basis for lossless data compression1 .
The second theorem, called the channel coding theorem, concerns communication through a noisy channel. It was shown that associated with every noisy
channel is a parameter, called the capacity, which is strictly positive except
for very special channels, such that information can be communicated reliably
through the channel as long as the information rate is less than the capacity.

These two theorems, which give fundamental limits in point-to-point communication, are the two most important results in information theory.
In science, we study the laws of Nature which must be obeyed by any physical systems. These laws are used by engineers to design systems to achieve
specific goals. Therefore, science is the foundation of engineering. Without
science, engineering can only be done by trial and error.
In information theory, we study the fundamental limits in communication regardless of the technologies involved in the actual implementation of
the communication systems. These fundamental limits are not only used as
guidelines by communication engineers, but they also give insights into what
optimal coding schemes are like. Information theory is therefore the science
of information.
Since Shannon published his original paper in 1948, information theory
has been developed into a major research field in both communication theory
and applied probability.
For a non-technical introduction to information theory, we refer the reader
to Encyclopedia Britannica [49]. In fact, we strongly recommend the reader to
first read this excellent introduction before starting this book. For biographies
of Claude Shannon, a legend of the 20th Century who had made fundamental
contribution to the Information Age, we refer the readers to [56] and [340].
The latter is also a complete collection of Shannon’s papers.
Unlike most branches of applied mathematics in which physical systems are
studied, abstract systems of communication are studied in information theory.
In reading this book, it is not unusual for a beginner to be able to understand
all the steps in a proof but has no idea what the proof is leading to. The best
way to learn information theory is to study the materials first and come back
at a later time. Many results in information theory are rather subtle, to the
extent that an expert in the subject may from time to time realize that his/her
1

A data compression scheme is lossless if the data can be recovered with an arbitrarily small probability of error.



4

1 The Science of Information

understanding of certain basic results has been inadequate or even incorrect.
While a novice should expect to raise his/her level of understanding of the
subject by reading this book, he/she should not be discouraged to find after
finishing the book that there are actually more things yet to be understood.
In fact, this is exactly the challenge and the beauty of information theory.


Part I

Components of Information Theory


×