Tải bản đầy đủ (.pdf) (585 trang)

speech coding algorithms foundation and evolution of standardized coders

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.44 MB, 585 trang )

SPEECH CODING
ALGORITHMS
Foundation and Evolution
of Standardized Coders
WAI C. CHU
Mobile Media Laboratory
DoCoMo USA Labs
San Jose, California
A JOHN WILEY & SONS, INC., PUBLICATION

SPEECH CODING
ALGORITHMS

SPEECH CODING
ALGORITHMS
Foundation and Evolution
of Standardized Coders
WAI C. CHU
Mobile Media Laboratory
DoCoMo USA Labs
San Jose, California
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright # 2003 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400,


fax 978-750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken,
NJ 07030, (201) 748-6011, fax (201) 748-6008, e-mail:
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be suitable
for your situation. You should consult with a professional where appropriate. Neither the publisher nor
author shall be liable for any loss of profit or any other commercial damages, including but not limited to
special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care Department
within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data:
Chu, Wai C. —
Speech coding algorithms: Foundation and evolution of standardized coders
ISBN 0-471-37312-5
Printed in the United States of America
10987654321
Intelligence is the fruit of industriousness
Accretion of knowledge creates genii
A Chinese proverb

CONTENTS
PREFACE xiii
ACRONYMS xix
NOTATION xxiii
1 INTRODUCTION 1

1.1 Overview of Speech Coding / 2
1.2 Classification of Speech Coders / 8
1.3 Speech Production and Modeling / 11
1.4 Some Properties of the Human Auditory System / 18
1.5 Speech Coding Standards / 22
1.6 About Algorithms / 26
1.7 Summary and References / 31
2 SIGNAL PROCESSING TECHNIQUES 33
2.1 Pitch Period Estimation / 33
2.2 All-Pole and All-Zero Filters / 45
2.3 Convolution / 52
2.4 Summary and References / 57
Exercises / 57
vii
3 STOCHASTIC PROCESSES AND MODELS 61
3.1 Power Spectral Density / 62
3.2 Periodogram / 67
3.3 Autoregressive Model / 69
3.4 Autocorrelation Estimation / 73
3.5 Other Signal Models / 85
3.6 Summary and References / 86
Exercises / 87
4 LINEAR PREDICTION 91
4.1 The Problem of Linear Prediction / 92
4.2 Linear Prediction Analysis of Nonstationary Signals / 96
4.3 Examples of Linear Prediction Analysis of Speech / 101
4.4 The Levinson–Durbin Algorithm / 107
4.5 The Leroux–Gueguen Algorithm / 114
4.6 Long-Term Linear Prediction / 120
4.7 Synthesis Filters / 127

4.8 Practical Implementation / 131
4.9 Moving Average Prediction / 137
4.10 Summary and References / 138
Exercises / 139
5 SCALAR QUANTIZATION 143
5.1 Introduction / 143
5.2 Uniform Quantizer / 147
5.3 Optimal Quantizer / 149
5.4 Quantizer Design Algorithms / 151
5.5 Algorithmic Implementation / 155
5.6 Summary and References / 158
Exercises / 158
6 PULSE CODE MODULATION AND ITS VARIANTS 161
6.1 Uniform Quantization / 161
6.2 Nonuniform Quantization / 166
6.3 Differential Pulse Code Modulation / 172
6.4 Adaptive Schemes / 175
6.5 Summary and References / 180
Exercises / 181
viii CONTENTS
7 VECTOR QUANTIZATION 184
7.1 Introduction / 185
7.2 Optimal Quantizer / 188
7.3 Quantizer Design Algorithms / 189
7.4 Multistage VQ / 194
7.5 Predictive VQ / 216
7.6 Other Structured Schemes / 219
7.7 Summary and References / 221
Exercises / 222
8 SCALAR QUANTIZATION OF LINEAR

PREDICTION COEFFICIENT 227
8.1 Spectral Distortion / 227
8.2 Quantization Based on Reflection Coefficient and
Log Area Ratio / 232
8.3 Line Spectral Frequency / 239
8.4 Quantization Based on Line Spectral Frequency / 252
8.5 Interpolation of LPC / 256
8.6 Summary and References / 258
Exercises / 260
9 LINEAR PREDICTION CODING 263
9.1 Speech Production Model / 264
9.2 Structure of the Algorithm / 268
9.3 Voicing Detector / 271
9.4 The FS1015 LPC Coder / 275
9.5 Limitations of the LPC Model / 277
9.6 Summary and References / 280
Exercises / 281
10 REGULAR-PULSE EXCITATION CODERS 285
10.1 Multipulse Excitation Model / 286
10.2 Regular-Pulse-Excited–Long-Term Prediction / 289
10.3 Summary and References / 295
Exercises / 296
11 CODE-EXCITED LINEAR PREDICTION 299
11.1 The CELP Speech Production Model / 300
CONTENTS ix
11.2 The Principle of Analysis-by-Synthesis / 301
11.3 Encoding and Decoding / 302
11.4 Excitation Codebook Search / 308
11.5 Postfilter / 317
11.6 Summary and References / 325

Exercises / 326
12 THE FEDERAL STANDARD VERSION OF CELP 330
12.1 Improving the Long-Term Predictor / 331
12.2 The Concept of the Adaptive Codebook / 333
12.3 Incorporation of the Adaptive Codebook to
the CELP Framework / 336
12.4 Stochastic Codebook Structure / 338
12.5 Adaptive Codebook Search / 341
12.6 Stochastic Codebook Search / 344
12.7 Encoder and Decoder / 346
12.8 Summary and References / 349
Exercises / 350
13 VECTOR SUM EXCITED LINEAR PREDICTION 353
13.1 The Core Encoding Structure / 354
13.2 Search Strategies for Excitation Codebooks / 356
13.3 Excitation Codebook Searches / 357
13.4 Gain Related Procedures / 362
13.5 Encoder and Decoder / 366
13.6 Summary and References / 368
Exercises / 369
14 LOW-DELAY CELP 372
14.1 Strategies to Achieve Low Delay / 373
14.2 Basic Operational Principles / 375
14.3 Linear Prediction Analysis / 377
14.4 Excitation Codebook Search / 380
14.5 Backward Gain Adaptation / 385
14.6 Encoder and Decoder / 389
14.7 Codebook Training / 391
14.8 Summary and References / 393
Exercises / 394

x CONTENTS
15 VECTOR QUANTIZATION OF LINEAR
PREDICTION COEFFICIENT 396
15.1 Correlation Among the LSFs / 396
15.2 Split VQ / 399
15.3 Multistage VQ / 403
15.4 Predictive VQ / 407
15.5 Summary and References / 418
Exercises / 419
16 ALGEBRAIC CELP 423
16.1 Algebraic Codebook Structure / 424
16.2 Adaptive Codebook / 425
16.3 Encoding and Decoding / 433
16.4 Algebraic Codebook Search / 437
16.5 Gain Quantization Using Conjugate VQ / 443
16.6 Other ACELP Standards / 446
16.7 Summary and References / 451
Exercises / 451
17 MIXED EXCITATION LINEAR PREDICTION 454
17.1 The MELP Speech Production Model / 455
17.2 Fourier Magnit udes / 456
17.3 Shaping Filters / 464
17.4 Pitch Period and Voicing Strength Estimation / 466
17.5 Encoder Operations / 474
17.6 Decoder Operations / 477
17.7 Summary and References / 481
Exercises / 482
18 SOURCE-CONTROLLED VARIABLE BIT-RATE CELP 486
18.1 Adaptive Rate Decision / 487
18.2 LP Analysis and LSF-Related Operations / 494

18.3 Decoding and Encoding / 496
18.4 Summary and References / 498
Exercises / 499
19 SPEECH QUALITY ASSESSMENT 501
19.1 The Scope of Quality and Measuring Conditions / 501
CONTENTS xi
19.2 Objective Quality Measurements for Waveform Coders / 502
19.3 Subjective Quality Measures / 504
19.4 Improvements on Objective Quality Measures / 505
APPENDIX A MINIMUM-PHASE PROPERTY OF THE
FORWARD PREDICTION-ERROR FILTER 507
APPENDIX B SOME PROPERTIES OF LINE
SPECTRAL FREQUENCY 514
APPENDIX C RESEARCH DIRECTIONS IN
SPEECH CODING 518
APPENDIX D LIN EAR COMBINER FOR
PATTERN CLASSIFICATION 522
APPENDIX E CELP: OPTIMAL LONG-TERM PREDICTOR TO
MINIMIZE THE WEIGHTED DIFFERENCE 531
APPENDIX F REVIEW OF LINEAR ALGEBRA:
ORTHOGONALITY, BASIS, LINEAR
INDEPENDENCE, AND THE
GRAM–SCHMIDT ALGORITHM 537
BIBLIOGRAPHY 542
INDEX 553
xii CONTENTS
PREFACE
My first contact with speech coding was in 1993 when I was a Field Application
Engineer at Texas Instruments, Inc. Soon after joining the company I was assigned
to design a demo prototype for the digital telephone answering device project.

Initially I was in charge of hardware including circuit design and printed circuit
board layout. The core of the board consisted of a microcontroller sending
commands to a mixed signal processor, where all the signal processing tasks—
including speech coding—were performed. In those days a major concern was the
excessive cost associated with random-access memory (RAM), and compressing
the digital speech before storing was almost a mandatory requirement, as this
greatly improved cost-effectiveness.
Soon after the hardware was finished, the focus switched to software (or firmware)
design, mainly dealing with the control of various on-board peripheral devices. My
true interest, however, was the program code inside the mixed signal processor,
which was developed by a separate team of ‘‘advanced’’ engineers. I was told that
voice signals were compressed using a code-excited linear prediction (CELP)
algorithm. Also, it was possible to play back fixed announcement messages—such
as numbers and days of the week—with the messages stored in the linear prediction
coding (LPC) format. I had no idea what these algorithms were, nor how they
worked to compress speech. However, I was eager to learn the details, and decided
to g o back to school and pursue a PhD with concentration in speech coding.
This book is the result of my personal experience as a researcher and practitioner
in the field of speech coding. Four years ago I decided to put in extra hours, usually
late nights and early mornings as well as weekends, to organize the literature in
speech coding and develop it into a logical presentation in terms of content and
terminology. Speech coding has evolved into a highly matured branch of signal
xiii
processing, with deployment in a plethora of products, such as, cellular phones,
answering machines, communication devices, and more recently, voice over
internet protocol (VoIP). It is obvious that a thorough textbook is necessary
for students, professors, and engineering professionals to handle the subject
appropriately. My sincere hope is that the availability of a book that collects
many of the techniques used in speech coding and presents them in an accessible
fashion will create excitement and enthusiasm, ensuring continuous rapid advances

in the field.
Philosophy and Approach
Speech Coding Algorithms reflects the core subject of the book, since most coding
techniques are implemented as algorithms, or computational procedures performed
by a processor. However, this is by no means an exhaustive docum entation of all
methods developed in this field; it is rather the study of the most successful
techniques, defined as those incorporated in a standard. By doing so we concentrate
our effort on understanding the mos t influential ideas, which is a rather efficient
manner to navigate this vast territory of knowledge.
In my own personal learning curve, I found that there is a different and
refreshing lesson to be found in every standard. To understand a new standard it
is often necessary to look back into the developed techniques adopted by past
standards or studies. Attempting to lea rn by reading the official documentation
describing the standard is very often a frustrating experience, since the assumption
made in preparing those materials is that the audience consists of experts in the
subject, and hence the logical order and justification of a given approach is
routinely omitted. Therefore the origin and the reason behind a certain practice
cannot be fully understood. This might not be a problem if one’s objective is to
implement the algorithm without comprehending it. However, for those researchers
eager to delve deeply into its roots, alternative reference sources must be explored,
which can be a strenuous and prolonged process. In this book I have summarized
the knowledge acquired over an extended period of time, with the intention of
filling the void between principles and implementations.
In writing this book, a balance is sought between theory and practice, and
between intuition and rigor. Theoretical ideas are included only if they are used to
solve practical problems, and thorough proofs are provided. Speech coding is
related to human perception, and therefore a degree of fuzziness exists, in the sense
that no absolute right or wrong can be established for certain situations; in other
words, no mathematical proofs are obtainable. In these cases, solutions are often
found and justified on an intuitive basis. For the most part, the book is meant to be

pragmatic, since the discussed techniques are widely used in industry.
Prerequisites
The minimum background required to understand the book is explained, with
reference to popular textbooks where the relevant subjects can be found.
xiv PREFACE
 Advanced calculus, including complex variables [Churchill and Brown, 1990].
 Discrete-time signals and systems, Fourier transforms, z-transforms, filtering,
and convolution [Oppenheim and Schaf er, 1989; Stearns and Hush, 1990].
 Random variables and stochastic processes, expectation, probability, and
wide-sense stationarity [Papoulis, 1991; Peebles 1993].
 Linear algebra, including linear equations, matrices, and vectors [Str ang,
1988].
 Experience with high-level programming using a language such as C.
The above list is covered in most undergraduate Electrical Engineering curricula;
with this background, the book is self-contained.
Organization
The text is divided into 19 chapters. Cha pter 1 provides an overview of the subjects
covered, with references to various aspects of speech coding, standards, algorithms,
and comments on notation and terminology. Chapter 2 is a review of some signal
processing techniques, some are very general, but others are less known outside
speech coding literature. Chapter 3 contains some foundation for stochastic
processes and mode ls, which are important for an understanding of the theoretical
aspects. Chapter 4 is about linear prediction, the integral part of almost all modern
speech coders. Chapter 5 reviews the various aspects of scalar quantization, which
are utilized routinely by many speech coding algorithms. One of the earliest digital
coding techniques is pulse code modulation (PCM); it and its variants are the topic
of Chapter 6. Chapte r 7 deals with vector quantization, which has become more and
more important for the achievement of high efficiency in coding systems. Linear
prediction coefficients (LPC) are normally quantized for transmission as part of the
compressed bit-stream; Chapter 8 covers the various methods for scalar quantiza-

tion of these coefficients. One of the landmarks in low bit-rate speech coding is the
linear prediction coding (LPC) algorithm, discussed in Chapter 9. Chapter 10
is devoted to regular pulse excitation coders, with a thorough description of the
GSM 6.10 standard. Principles of code-excited linear prediction (CELP) are given
in Chapter 11, covering the various aspects of analysis-by-synthesis, signal
calculation, postfilter design, and efficiency. Chapters 12 and 13 present the
structure of two standardized CELP coders: FS1016 and IS54, respectively; these
are both milestones in speech coding development. Chapter 14 is dedicated to
the G.728 low-delay CELP standard, with thorough explanations of strategies for
delay reduction and detailed structures of the coder. Vector quantization of LPC
is included in Chapter 15, representing a huge advance with respect to scalar
quantization techniques covered in Chapter 8, and methods used by various
standardized coders are analyzed. The highly influential algebraic CELP (ACELP)
algorithm is covered in Chapter 16, where several ACELP-based standards are
described, with focus on the G.729 standard. The mixed excitation linear prediction
(MELP) algorithm is discussed in Chapter 17, and is shown to be an improvement
PREFACE xv
upon the LPC coder, covered in Chapter 9. Chapter 18 is devoted to the IS96
variable bit-rate CELP algorithm, which is a source-controlled multimode coder
with the operating mode selected by the input characteristics of the speech signal.
Finally, Chapter 19 is concerned with various methods to assess the quality of
speech signals, especially those processed by a speech coding algorithm.
The following table summarizes the chapters and their prerequisites.
Chapter Title Prerequisites
1 Introduction
2 Signal Processing Techniques 1
3 Stochastic Processes and Models
4 Linear Prediction 1, 2, 3
5 Scalar Quantization
6 Pulse Code Modulation and its Variants 4, 5

7 Vector Quantization 5
8 Scalar Quantization of Linear Prediction Coefficients 4, 5
9 Linear Prediction Coding 4, 8
10 Regular-Pulse Excitation Coders 4, 8
11 Code-Excited Linear Prediction 2, 4
12 The Federal Standard Version of CELP 2, 8, 11
13 Vector Sum Excited Linear Prediction 8, 12
14 Low-Delay CELP 4, 11
15 Vector Quantization of Linear Prediction Coefficients 7 , 8
16 Algebraic CELP 7, 12, 15
17 Mixed Excitation Linear Prediction 9, 15
18 Source-Controlled Variable Bit-Rate CELP 11
19 Speech Quality Assessment 1
Acknowledgments
Throughout my professional career, I have had the opportunity to work with and
learn from a number of people whom I should like to publicly acknowledge. My
former advisor Dr. Nirmal K. Bose at the Pennsylvania State University had
provided me with invaluable instruction, trust, and friendship during my graduate
studies; his methodical style, hard-working spirit, and commitment toward educa-
tion have served as a role model to follow. I am grateful to my former supervisor
Dr. Tandhoni S. Rao at Texas Instruments Inc., who had guided me through projects
involving adaptive filters, speech coding, and programming of digital signal
processors.
I would like to dedicate this book to my parents who have always encouraged
my academic interests and provided the moral support throughout my life and
career. I am deeply indebted to my cousin Chi-Ming Chu and wife Kam-Chi Chu
for their help and support during my graduate studies at Stevens Tech; their
industriousness and candid spirit have given me a great deal of positive influence.
xvi PREFACE
I am particularly indebted to my wife Laura for her love and patience, and for

thoroughly reviewing and proofreading the first version of the manuscript.
I am grateful to the Wiley team for their professionalism and help during the
production of this book; special thanks to George Telecki (Executive Editor) and
Rosalyn Farkas (Associate Editor). I am also most grateful to Dr. Andreas Spanias
and Dr. Allen Levesque for their encouraging com ments and constructive
critiques—both early reviewers of the manuscript. I also wish to thank my former
colleague at Texas Instrum ents Inc., Wai-Ming Lai for her help in examining
some chapters of the text.
Last but not least, this book is dedicated to Universidad Simo
´
n Bolivar, the
school where I received most of my early engineering education. Universidad
Simo
´
n Bolivar me ha dado generosamente el vigor, la fortaleza, y la sabiduria
necesaria para conquistar obsta
´
culos y dominar dificultades tanto en la ingenieria
como en la vida. Espero dar con este libro a los aspirantes en esta rama de la
ingenieria lo mismo que me ha dado la respectuosa universidad.
Feedback
A book of this length is certain to contain errors and omissions. While attempts
were made to provide a highly understandable and correct content, there are
doubtless many places where improvements are possible. Feedback is welcome to
the author via email at Please note that a personal reply to all
messages might not be possible.
W
AI C. CHU
PREFACE xvii


ACRONYMS
2-D Two-dimensional
3GPP Third generation partnership project
AbS Analysis-by-synthesis
ACELP Algebraic code-excited linear prediction
ACR Absolute category rating
ADPCM Adaptive differential pulse code modulation
AES Audio Engineering Society
ANSI American National Standards Institute
APCM Adaptive pulse code modulation
AR Autoregressive
ARMA Autoregressive moving average
CCITT International Telegraph and Telephone Consultative Committee
(replaced by ITU-T)
CCR Comparison category rating
CDMA Code division multiple access
CELP Code-excited linear prediction
CEPT Conference of European Posts and Telephones
CS-ACELP Conjugate structure algebraic code-excited linear prediction
DC Direct current
DCR Degradation category rating
DFT Discrete Fourier transform
DMOS Degradation mean opinion score
DoD U.S. Department of Defense
DPCM Differential pulse code modulation
DSP Digital signal processing/processor
xix
DTAD Digital telephone answering device
DTFT Discrete time Fourier transform
DTMF Dual-tone multifrequency

EFR Enhanced full rate
ETSI European Telecommunications Standards Institute
FFT Fast Fourier transform
FIR Finite impulse response
FM Frequency modulation
FS Federal Standard
GLA Generalized Lloyd algorithm
GSM Groupe Speciale Mobile
ICASSP International Conference on Acoustics, Speech, and Signal
Processing
IDFT Inverse discrete Fourier transform
IEC International Electrotechnical Commission
IEEE Institute of Electrical and Electronics Engineers
IIR Infinite impulse response
ISO International Organization for Standardization
ITU International Telecommunications Union
ITU–R ITU–Radiocommunication Sector
ITU–T ITU–Telecommunications Standardization Sector
LAR Log area ratio
LD-CELP Low-delay code-excited linear prediction
LMS Least mean square
LP Linear prediction
LPC Linear prediction coding/coefficient
LSF Line spectral frequency
LSP Line spectral pair
LTI Linear time-invariant
MA Moving average
MIPS Millions of instructions per second
MNB Measuring normalizing block
MOS Mean opinion score

MP–MLQ Multipulse–maximum likelihood quantization
MPEG Moving Picture Expert Group
MSE Mean square error
MSVQ Multistage vector quantization
NCS National Communications System
PAQM Perceptual audio quality measure
PC Personal computer
PCM Pulse code modulation
PDF Probability density function
PESQ Perceptual evaluation of speech quality
PG Prediction gain
PMF Probability mass function
xx ACRONYMS
PSD Power spectral density
PSQM Perceptual speech quality measure
PVQ Predictive vector quantization
QCELP Qualcomm code-excited linear prediction
RAM Random access memory
RC Reflection coefficient
RCR Research and Development Center for Radio Systems of Japan
RMS Root mean square
ROM Read only memory
RPE–LTP Regular pulse excited–long-term prediction
RV Random variable
SD Spectral distortion
SNR Signal to noise ratio
SPG Segmental prediction gain
SSE Sum of squared error
SSNR Segmental signal to noise ratio
TDMA Time division multiple access

TI Texas Instruments
TIA Telecommunications Industry Association
TTS Text to speech
UMTS Universal Mobile Telecommunications System
VBR Variable bit-rate
VoIP Voice over internet protocol
VQ Vector quantization
VSELP Vector sum excited linear prediction
WSS Wide sense stationary
ACRONYMS xxi

×