Performance evaluation of speech quality for VOIP on the internet

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (906.89 KB, 69 trang )

PERFORMANCE EVALUATION OF SPEECH
QUALITY FOR VOIP ON THE INTERNET

LIU XIAOMIN

A THESIS SUBMITTED FOR THE DEGREE OF
MASTER OF SCIENCE
SCHOOL OF COMPUTING
NATIONAL UNIVERSITY OF SINGAPORE

2011

Acknowledgement

First, I would like to express my most profound gratitude to my advisor, Professor Roger Zimmermann, for his guidance and support. Working with him has
been an invaluable experience in my life. Professor Zimmermann is a brilliant
computer scientist and a great man with a gentle heart. It has been a huge
honor for me to be his student.
Second, I would like to extend my appreciation to my peers who gave me
great support during my life at NUS.

1

Abstract

This thesis reports on a measurement study to evaluate the speech quality for two
widely used VoIP codecs, Speex and SILK, on the Internet using the Perceptual
Evaluation of Speech Quality (PESQ). To obtain realistic results, we developed
our testbed on PlanetLab1 , so that all the experiments were conducted on a

shared network. We chose diﬀerent sets of parameters for each experiment for
the two codecs to evaluate the speech quality under diﬀerent conditions. Overall,
we found that the SILK codec performs slightly better than the Speex codec.

1

/>

Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1 Introduction

5

1.1

Motivation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.2

Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.3

Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.4

Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . .

8

2 Background and Literature Survey

9

2.1

Transmission Protocols . . . . . . . . . . . . . . . . . . . . . . . .

9

2.2

Audio Coding Introduction . . . . . . . . . . . . . . . . . . . . .

11

2.2.1

Audio File Formats . . . . . . . . . . . . . . . . . . . . . .

12

2.2.2

Audio Coding Algorithms . . . . . . . . . . . . . . . . . .

13

2.2.3

The Speex Codec . . . . . . . . . . . . . . . . . . . . . . .

16

2.2.4

The SILK Codec . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

Error Control Mechanisms . . . . . . . . . . . . . . . . . . . . . .

21

2.3

1

2.4

2.3.1

ARQ-based Error Control Mechanisms . . . . . . . . . . .

21

2.3.2

FEC-based Error Control Mechanisms . . . . . . . . . . .

23

2.3.3

Referential Loss Recovery for Application Level Multicast

25

2.3.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Speech Quality Evaluation for VoIP Communications . . . . . . .

28

2.4.1

Subjective Speech Quality Measurements . . . . . . . . .

28

2.4.2

Objective Speech Quality Measurements . . . . . . . . . .

29

2.4.3

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

3 Experimental Setup and Results

34

3.1

Box Plot Graph Overview . . . . . . . . . . . . . . . . . . . . . .

35

3.2

System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.2.1

Test Audio Files . . . . . . . . . . . . . . . . . . . . . . .

36

3.2.2

Peer-to-Peer Topology . . . . . . . . . . . . . . . . . . . .

37

3.2.3

The Client/Server Model . . . . . . . . . . . . . . . . . .

37

Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.3.1

Tests of Narrowband Mode on PlanetLab . . . . . . . . .

40

3.3.2

Tests on Wideband Mode on PlanetLab . . . . . . . . . .

50

3.3.3

Tests on Narrowband Mode on LAN Testbed . . . . . . .

52

3.3.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

3.3

4 Conclusions and Future Work

56

4.1

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

2

List of Tables
2.1

Comparison of the control parameters for the Speex and the SILK
codecs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.2

Comparison of diﬀerent error control mechanisms. . . . . . . . .

27

3.1

PESQ values for burst length of packet loss with the Gilbert Model
on LAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2

PESQ values when the complexity stays the same in narrow band
mode for LAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.3

54

PESQ values when the bit rate stays the same in narrow band
mode for LAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.4

53

54

PESQ values when the complexity stays the same in wide band
mode for LAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

55

List of Figures
2.1

Example of a protocol stack: application data is transported via
RTP, UDP, IP and Ethernet/ATM [18]. . . . . . . . . . . . . . .

10

2.2

Speech quality versus the bit rate for speech codec types [33]. . .

14

2.3

Structure of the PESQ model [31]. . . . . . . . . . . . . . . . . .

33

3.1

Elements and thresholds of a box plot. . . . . . . . . . . . . . . .

36

3.2

The overall structure of the client/server test model. . . . . . . .

38

3.3

PESQ values for all the combinations of bit rate and complexity
parameters for the Speex and the SILK codecs. . . . . . . . . . .

3.4

PESQ values for loss rate ≤ 0.01 when the bit rate is 18,200 kbps
for the Speex and the SILK codecs. . . . . . . . . . . . . . . . . .

3.5

43

PESQ values for 0.05 < loss rate ≤ 0.1 when the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

3.7

43

PESQ values for 0.01 < loss rate ≤ 0.05 when the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

3.6

41

43

PESQ values for 0.1 < loss rate ≤ 1.0 when the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

2

44

3.8

PESQ values for loss rate ≤ 0.01 for the Speex and the SILK codecs. 45

3.9

PESQ values for 0.01 < loss rate ≤ 0.05 for the Speex and the
SILK codecs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.10 PESQ values for 0.05 < loss rate ≤ 0.1 for the Speex and the

SILK codecs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.11 PESQ values for 0.1 < loss rate ≤ 1.0 for the Speex and the SILK
codecs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

3.12 PESQ values for loss rate ≤ 0.01 with the bit rate is 18,200 kbps
for the Speex and the SILK codecs. . . . . . . . . . . . . . . . . .

47

3.13 PESQ values for 0.01 < loss rate ≤ 0.05 with the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

47

3.14 PESQ values for 0.05 < loss rate ≤ 0.1 with the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

48

3.15 PESQ values for 0.1 < loss rate ≤ 1.0 with the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

48

3.16 PESQ values for loss rate ≤ 0.01 with the bit rate is 18,200 kbps

for the Speex and the SILK codecs. . . . . . . . . . . . . . . . . .

49

3.17 PESQ values for 0.01 < loss rate ≤ 0.05 with the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

49

3.18 PESQ values for 0.05 < loss rate ≤ 0.1 with the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

50

3.19 PESQ values for 0.1 < loss rate ≤ 1.0 with the bit rate is 18,200
kbps for the Speex and the SILK codecs. . . . . . . . . . . . . . .

3

51

3.20 PESQ value without packet loss for the Speex and the SILK codecs in wide band mode. . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.21 The two-state Gilbert Model [13]. . . . . . . . . . . . . . . . . . .

53

4

Chapter 1

Introduction
1.1

Motivation

In recent years, the growth of the Internet has not only made our work lives
much easier – for example, we can conduct a video or audio conference at home
instead of physically meeting each other – but it has also aﬀected and redeﬁned
many areas of entertainment, in particular how we consume video and audio.
Now, we can conveniently enjoy real-time Internet music streaming services like
Jango (jango.com). Furthermore, the population of people who enjoy online
video and audio is growing very quickly. Streaming technology has become a very
hot topic. Since many Internet services are commercial entities, there has been a
growing interest in the quality of online media delivery including video and audio
streaming. This thesis reports on a detailed study on speech quality evaluation
measurements for two modern, important Voice-over-IP (VoIP) codecs, Speex1
1

/>
5

and SILK2 .
Speex is a free and open source codec which is often used for free IP audio
communication applications. Speex is quite ﬂexible in that it has been designed

for packet networks and VoIP applications, as well as ﬁle-based compression.
Due to these important characteristics, it has drawn signiﬁcant attention from
researchers. We chose this codec as one of the candidates to study the achievable
VoIP speech quality on a real network. SILK has been developed by a company
called Skype for their VoIP application. The Skype software is very popular
and it has users world-wide. At the end of 2009, there were already 500 million
registered Skype users. Recently the company was acquired by Microsoft and
the number of users is expected to further increase. Our goal is to explore
the performances of these two codecs to see how well they work in realistic
environments. In the literature survey section we will introduce these two codecs
in detail.
As there are many conﬁguration parameters for both Speex and SILK, our
objective was to explore the diﬀerences between these two codecs over a range of
diﬀerent encoding and decoding parameters. To achieve realistic results we have
built a testbed software for the PlanetLab environment and conducted extensive
experiments.
2

/>
6

1.2

Thesis Objectives

The objective in this thesis is to study the speech quality for two modern,
widely used VoIP codecs, Speex and SILK. As there are many conﬁguration
parameters for both Speex and SILK, our methodology is to select common
parameters which will be changed over a wide range of settings during our experiments to investigate how they aﬀect the speech quality when running in a

common environment.
As our testbed we selected PlanetLab, which is a global research network that
provides resources to researchers at academic institutions and industrial research
labs to develop new network services. With the PlanetLab resources, we designed
a point-to-point testing software to simulate real-time speech streaming and
the resulting output was then evaluated for its speech quality using the PESQ
standard. Our testbed is built on an public, shared network, which means that
packet loss is unpredictable. Due to this reason, we also conducted some local
lab experiments, where the packet loss rate can be controlled. We will use the
Gilbert Model to simulate packet losses for our local tests.

1.3

Thesis Contributions

The main contributions of my thesis are summarized as follows:
• First, we implemented an n-way point-to-point testing system on PlanetLab and conducted a set of experiments for speech quality evaluations.
• Second, we studied the characteristics of the Speex and SILK codecs. We
7

compared the performance of two codecs under diﬀerent sets of parameters
using the PESQ metric.

1.4

Organization of the Thesis

To better describe my work, I have organized my thesis into four chapters.
Chapter 1 Introduction explains the motivation, objective and the contributions for my thesis.

Chapter 2 Background and Literature Survey ﬁrst introduces the transmission protocols which will be used for our testbed. Next this section discusses
audio coding algorithms and error control mechanisms. Lastly, prior work in the
ﬁeld of VoIP measurements are studied.
Chapter 3 Experimental Setup and Results describes our testbed architecture as well as our test results.
Chapter 4 Conclusions concludes the work performed in the thesis.

8

Chapter 2

Background and Literature
Survey
Presently, digital media are well established as an integral part of many applications. A considerable amount of research has focused on the audio streaming
over the Internet. In this chapter, we will introduce the techniques commonly
used in audio streaming, including transmission protocols for audio streaming,
audio coding algorithms, and packet loss recovery mechanisms. We will also
study the previous work in the area of Voice-over-IP (VoIP) measurements.

2.1

Transmission Protocols

End-system applications often do not implement all the detailed communication features; instead, they make use of existing communication protocols. There
exist a number of protocols which can be used for audio streaming. For example,
a network protocol can be used to forward datagrams across a physical channel,
9

and a transport protocol can be used for end-to-end services. The combination

of protocols is called a protocol stack. The typical protocol stack used for data
transmission over the Internet is TCP or UDP (the user datagram protocol) [28]
on top of IP (the Internet Protocol) [27]. Figure 2.1 shows an example of a
protocol stack.

Figure 2.1: Example of a protocol stack: application data is transported via
RTP, UDP, IP and Ethernet/ATM [18].

TCP is a connection-oriented, reliable and full-duplex protocol. It uses an
acknowledgement and retransmission scheme to make sure that every packet is
received by the receiver. Moreover, TCP ensures an ordered reception of packets
by delivering packet pm only when all the previous packets, pj , j < m, have been
received. Because of these mechanisms provided by TCP, it is very suitable for
applications such as ftp, telnet and web servers, etc., while it is less suitable for
real-time media delivery, because of its potentially long delay of packets. Thus,
real-time applications tend to use UDP, which is connectionless, best-eﬀort and
without ﬂow control mechanism. However, it can provide low latency service for
real-time audio streaming.
The Real-time Transport Protocol (RTP) [35] is a transport layer protocol
framework which was developed by the Internet Engineering Task Force (IETF)

10

Audio/Video Transport working group in order to deliver streamed media over
the Internet. Each packet contains time information, packet sequence numbers
and optional parameters. Diﬀerent payload formats have been developed according to diﬀerent audio and video compression standards. It has been proposed to
combine RTP with the receiver-initiated retransmission scheme mentioned in SRM [25]. RTP can also cooperate with RTCP (the Real-Time Transport Control
Protocol) [15] which allows the collection of feedback from receivers. It provides
end-to-end network transport functions for real-time audio streaming [42]. Its

speciﬁcation states that “RTP is intended to be malleable to provide the information required by a particular application and will often be integrated into the
application processing rather than being implemented as a separate layer.” In
practice, RTP usually runs on top of UDP [22].
RTCP is an accompanying protocol of RTP designed to exchange control information related to real time data transmissions. Either UDP or TCP can be used
as the underlying transmission protocol, depending on the requirements of the
application. Since RTCP was designed with large-scale multimedia applications
in mind, the protocol can oﬀer considerable control information.
For our testing system, we employed the RTP-over-UDP protocol stack.

2.2

Audio Coding Introduction

In this section, we will introduce a few audio formats which are used in conjunction with diﬀerent codecs and audio coding algorithms. The details are
described in the following paragraphs.

11

2.2.1

Audio File Formats

An audio ﬁle format is a ﬁle format for storing audio data on a storage media.
The general method for storing digital audio is to sample the audio waveform
(i.e., voltage) which, on playback, corresponds to a certain level of signal in an
individual channel with a certain resolution. The data can be stored uncompressed or compressed to reduce the ﬁle size. There are three groups of audio
ﬁle formats:
• Uncompressed audio formats such as WAV, AIFF, AU or raw header-less
PCM.

• Audio formats with lossless compression such as FLAC, Monkey’s Audio
(ﬁlename extension APE), TTA, Apple Lossless, MPEG-4 SLS, MPEG-4
ALS, MPEG-4 DST and Windows Media Audio Lossless (WMA Lossless).
• Audio formats with lossy compression such as MP3, Vorbis, AAC, ATRAC
and lossy Windows Media Audio (WMA).

There is one major uncompressed audio format, Pulse-Code Modulation (PCM),
which is usually stored as .WAV ﬁles on Windows or as .AIFF on Mac OS X.
WAV and AIFF ﬁles are suitable for storing and archiving original recordings due
to their ﬂexible ﬁle formats to store more or less any combination of sampling
rates and sample resolutions. In our system, we use WAV ﬁles as high-quality
inputs for the audio streaming codecs.

12

2.2.2

Audio Coding Algorithms

Data compression can convert an input data stream into another data stream
which is of smaller size compared to the original. This is very useful for transmissions when the network bandwidth is limited, especially for real-time audio
or video streaming, which may require considerable bandwidth. There exist
several types of data compression algorithms such as methods for text compression, image compression, simple dictionary compression, video compression and
audio compression. Here, we will only introduce a few algorithms for audio compression. Many types of codecs have been developed for audio encoding such
as µ-Law and A-Law companding, ADPCM, MLP, speech compression, FLAC,
Monkey’s audio, etc. Here, we only introduce a few of commonly used ones.
Two important, distinguishing characteristics for audio compression algorithms are:
• Whether the compression is lossy or lossless; and
• Whether the encoding and decoding complexities are symmetric or not,

i.e., how fast the decompression is.

There exist both lossy and lossless algorithms for audio compression. Audio
is often stored in compressed form which is then decompressed in real-time and
played back to listeners. Thus, most audio compression methods are asymmetric.
The encoder can be slow, but the decoder must be fast. As there are many kinds
of compression algorithms, we only introduce a few of them in the following
sections.
13

Speech Compression
Some audio codecs are speciﬁcally designed for speech signals. As this kind
of audio is human speech, it has many properties which can be exploited for
eﬃcient compression. There exist considerable research on this topic such as the
codecs introduced in the book by Jayant et al. [17].
There are three main types of speech codecs. Waveform speech codecs produce
good to excellent quality of speech after compression and decompression, but
generate bit rates of 10 to 64 kbps. Source codecs (vocoders) generally produce
poor to fair quality of speech, but can compress the bit rate to a very low level,
for example 2 kbps. Hybrid codecs combine these two methods and can generate
fair to good quality speech with bit rates between 2 and 16 kbps. Figure 2.2
shows the qualitative speech quality versus the bit rate of these three codec
types.

Figure 2.2: Speech quality versus the bit rate for speech codec types [33].

Waveform codecs. This codec type is not speciﬁcally concerned about how
the original sound was generated, but tries to produce the decompressed

14

audio signal as closely-matching as possible to the original signals. It is not
designed for speech speciﬁcally and can be used for other kinds of audio
data. The simplest waveform encoder is pulse code modulation (PCM).
Enhanced versions are the diﬀerential PCM and ADPCM encoders. Waveform coders may also operate in the frequency domain.
Source codecs. In general, a source encoder uses a mathematical model of the
source of the data. The model depends on certain parameters, which are
obtained through the input data. After the parameters are computed, they
are written into the compressed stream. The decoder uses the parameters
and the mathematical model to rebuild the original data. If the original
data is audio, the source coder is also called a vocoder.
Hybrid codecs. This kind of speech codec combines both of the previously described codecs. The most popular hybrid codecs are Analysis-by-Synthesis
(AbS) time-domain algorithms. An AbS encoder starts with a set of speech
samples (a frame), encodes the samples in a similar way to a LPC (Linear
Predictive Coder) [29], decodes them, and subtracts the decoded samples
from the original ones. The diﬀerences are sent through an error minimization process that outputs improved encoding samples. These samples are
again decoded, subtracted from the original samples, and new diﬀerences
computed. This process is repeated until the diﬀerences satisfy a termination condition. The encoder then proceeds to the next set of speech
samples (i.e., the next frame) [33].

15

We will now describe two modern, state-of-the-art codecs, which we will be
using in our experiments, namely Speex and SILK.

2.2.3

The Speex Codec

The Speex codec is open-source and free from patent royalties. It is designed
for packet networks and Voice-over-IP (VoIP) applications, as well as ﬁle-based
compression. The Speex codec is quite ﬂexible. There are many parameters that
can be selected, such as the bit rate and so on. It is also quite robust to packet
losses. This property is based on the assumption that in VoIP applications the
packets either arrive late or lost, but not corrupted. Below is a list of parameters
that can be adjusted during encoding and decoding for the Speex codec [39]:
• Sampling rate. The sampling rate is expressed in Hertz (Hz). It indicates
the number of samples taken from a signal per second. Speex is mainly
designed for three diﬀerent sampling rates: 8 kHz, 16 kHz, and 32 kHz.
These sampling rates are respectively referred to as narrowband, wideband
and ultra-wideband.
• Bit rate. The bit rate is the speed of the speech signal being encoded. It
is measured in bits per second (bps). When the speech signal is encoded
in narrowband mode, the bit rate can be set from 2.15 kbps to 24.6 kbps;
when the speech signal is encoded in wideband mode, the bit rate can be
changed in the range from 4 kbps to 44.2 kpbs.
• Quality. Speex is a lossy codec. It achieves compression at the expense of
the ﬁdelity of the input speech signal. It is possible to control the tradeoﬀ
16

made between quality and the bit rate. In the Speex encoding process, the
quality parameter can be changed from 0 to 10.
• Complexity. With Speex, it is possible to change the complexity parameter for the encoder. The complexity can be changed from 1 to 10. For
normal use, the noise level at complexity 10 is between 1 and 2 dB lower
than at complexity 1, but the CPU requirements for complexity 10 is about
5 times higher than for complexity 1. Hence, in practice, the best tradeoﬀ

is a setting between 2 and 4.

There exist also other parameters, like discontinuous transmission (DTX),
which can be changed when encoding a speech signal. We currently only consider
the above mentioned, most commonly used parameters in our system.

2.2.4

The SILK Codec

The SILK codec is preferred for Skype-to-Skype calls. It is a speech codec for
real-time, packet-based voice communications. It provides scalability in several
dimensions. It supports four diﬀerent sampling frequencies for encoding the
audio input signal. It can adapt to the network characteristics through the
control of the bit-rate, the packet rate, the packet loss resilience and the use of
DTX. The SILK codec also allows several complexity levels which can be changed
to let it take advantage of the available processing power without relying on it.
All of these properties can be adjusted while the codec is processing data.
The SILK codec consists of an encoder and an decoder[40]. For the encoder,
there exist a number of parameters that can be changed to control the encoding
17

operation.
• Sampling rate. SILK can select one of four modes during call setup:
– Narrowband (NB): 8 kHz sampling rate;
– Mediumband (MB): 8 or 12 kHz sampling rate;
– Wideband (WB): 8, 12 or 16 kHz sampling rate; and
– Super Wideband (SWB): 8, 12, 16 or 24 kHz sampling rate.
The purpose of the modes is to allow the decoder to utilize the highest

sampling rate used by the encoder.
• Packet rate. SILK encodes frames of 20 milliseconds each. It can combine
1, 2, 3, 4 or 5 frames in one payload, so each packet corresponds to 20, 40,
60, 80 or 100 milliseconds of audio data. Sending fewer packets per second
reduces the bit rate, but it increases the latency and the sensitivity to
packet losses since longer packets constitute a bigger fraction of the audio
information. In our system we encode one frame into one packet each time.
• Bit-rate. The bit-rate can be set to the range from 6 to 40 kbps. A
higher bit-rate can improve the audio quality by lowering the amount of
quantization noise in the decoded signal. For the narrowband mode, the
bit-rate can be changed in the range from 6 kbps to 20 kbps, while for the
wideband mode, it can be changed between 8 kbps and 30 kbps.
• Complexity. SILK has three complexity levels which can be chosen. A
low level can reduce the CPU load by a few times at the cost of increasing
18

the bit-rate by a few percentage points. The three complexity levels are
high (2), medium (1) and low (0).
• DTX. The DTX function can reduce the bit-rate during silence or background noise. For our tests it is disabled.
On the decoder side, the received packets are split into the number of frames
contained in the packet. Each of the frames contains the necessary information
to reconstruct the 20 ms frame of the original input signal.

2.2.5

Summary

As described in this section there exist many codecs for audio compression
and we only listed some of them brieﬂy. Diﬀerent codec have diﬀerent features

which make them suitable for diﬀerent conditions and audio formats. For speech
compressors, they can be grouped into three categories as described earlier.
Waveform speech codecs. They produce a good to excellent quality of speech,
and the bit rate is between 10 to 64 kbps;
Source codecs. They produce a poor to fair quality of speech, the bit rate can
reach to 2 kbps;
Hybrid codecs. They are a combination of the waveform speech codec and the
source codec. The speech quality varies from good to fair. The bit rate
ranges from 2 to 16 kbps.
For the Speex and the SILK codecs, there are many parameters which can be
changed during the encoding process. To be more clear, we summarize the listed
19

audio codecs as Table 2.1.

Parameter

Speex codec

SILK codec

Bit rate (kbps) 1. Narrowband: 2.15 - 24.6

1. Narrowband: 6 - 20

2. Wideband: 4 - 44.2

2. Wideband: 8 - 30

Packet Rate

Frame size is 20 ms long.

Frame size is 20 ms long.

Quality

0 - 10.

No such parameter.

Complexity

It can be changed from 1 to 10. 0 (low), 1 (medium) and 2 (high).
Diﬀerence of the noise level be- As the diﬀerence level for these
tween 1 and 10 is only 1 or 2 d- complexity values are not much,
B, while the CPU requirements the tradeoﬀ is set to complexity
is 1/5 at complexity 1 compared 1 ordinarily.
with complexity 10. In practice,
the tradeoﬀ is 3.

Sampling Rate 1. Narrowband: 8 kHz

1. Narrowband: 8 kHz

2. Wideband: 16 kHz

2. Mediumband: 8 or 12 kHz

3. Ultra-wideband: 32 kHz

3. WideBand: 8, 12 or 16 kHz
4.Super Wideband: 8, 12, 16 or
24 kHz

Delay

No more than 30 ms.

Around 30 ms for narrowband
mode and 34 ms for wideband
mode.

Table 2.1: Comparison of the control parameters for the Speex and
the SILK codecs.

20

Performance evaluation of speech quality for VOIP on the internet

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về