Tải bản đầy đủ (.pdf) (10 trang)

Scalable voip mobility intedration and deployment- P8 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (146.66 KB, 10 trang )

Elements of Voice Quality 69
www.newnespress.com
3.2.4 Jitter
Jitter is the variation in delays that the receiver experiences. Jitter is a nuisance that the user
does not hear directly, because the phones employ a jitter buffer to correct for any delays.
Jitter can be defined in a number of ways. One way is to use the standard deviation or
maximum deviation around the mean delay per packet. Another way is to use the known
arrival intervals (such as 20ms), and subtract consecutive delays of packets that were not
lost from the known arrival time, then take the standard deviation or the maximum
deviation. Either way, the jitter, measured in times or percentages against the mean, tells
how variable the network is.
Jitter is introduced by variable queuing delays within network equipment. Phones and PBXs
are well known for having very regular transmission intervals. However, the intervening
network may have variable traffic. As the queue depths change and the network loads
fluctuate, and as contention-based media such as Wi-Fi links clog with density, packets
are forced to wait. Wireless networks are the biggest culprit for introducing delay into an
enterprise private network. This is because wireless packets can be lost and retransmitted, and
the time it takes to retransmit a packet can usually be measured in units of a millisecond.
A jitter buffer’s job is to sit on the receiver and prevent the jitter from causing an underrun
of the voice decoder. An underrun is an awkward period of silence that happens when the
phone has finished playing the previous packet and needs another packet to play, but one
has not yet arrived. These underruns count as a form of error or loss, even if every packet
does make it to the receiver, and loss concealment will work to disguise them. The problem
with jitter becomes that an underrun must be followed by an increase in delay of the same
amount, assuming no packets are lost. This can be seen by realizing that the delayed packet
will hold up the line for packets behind it.
Here, the value of the jitter buffer can be seen. The jitter buffer lets the receiver build up
a slight delay in the output. If this delay is greater than the amount of actual jitter on the
network, the jitter buffer will be able to smooth things out without underruning.
In this sense, the jitter buffer converts jitter directly into delay. If the jitter becomes too
large, the jitter buffer may have limited room, and start dropping earlier samples in the


buffer to let the call catch up to be closer to real time. In this way, the jitter buffer can
convert the jitter directly into loss.
Because jitter is always converted into delay first, then loss, it does not have a direct impact
on the E-model by itself, but instead can be folded in to the other measures. However, the
complication arises because the user or administrator does not usually know the exact
parameters of the jitter buffer. How many samples, how much delay, will the jitter buffer
take before it starts to drop audio? Does the jitter buffer start off with a fixed delay? Does
it build up the delay as jitter forces it to? Or does it try to proactively build in some delay,
70 Chapter 3
www.newnespress.com
which can grow or shrink as the underruns occur? These all have an impact on the E-model
call quality.
As a result, a rule of thumb here is to match the jitter tolerance to the delay tolerance. The
network, at least, should not introduce more than 50ms of jitter.
3.2.5 Non-IP Effects that Should Be Kept in Mind
The E-model makes plenty of room for non-IP effects on voice quality, and we would be
wise to consider them here, even though the previous few sections chose to focus only on
the network effects.
As mentioned earlier, echo is a problem to be tackled whenever calls are being tied together
in conference bridges or are traversing through multiple media gateways. Analog lines
introduce the problem of noise, as well as volume or gain control. Some analog lines may
be tuned softer than others. Most of this requires reasonable end-to-end testing, however.
Then there are the intangibles. Is the network provisioned well enough that calls go through
or are held predictably and reliably? Is the voice mobility network laid out well enough that
users know that every point in the campus is a hot spot, or are some areas weak or dead?
Cellular companies make entire marketing campaigns on the premise of the importance of
coverage and dropped calls. (The number of bars on the phone or people standing behind
the spokesman are both powerful examples of how important the predictability of the call
quality is to callers.) This same concern needs to be applied to voice mobility networks
produced within the enterprise. No amount of modeling will answer how much tolerance

exists, but the general consensus is that voice mobility networks must work better than the
cellular networks, when the callers are in the office. Mobility within the office does not
generally count as a factor that can be used to increase the acceptance of the quality of the
calls, and although mobility is a tremendous driving force to achieve higher productivity
and less frustration, it is the sort of benefit that is hardly noticed until it is gone.
Keep in mind that the codec chosen can make an immediate ten-point difference in the
R-value, in many cases.
3.3 How to Measure Voice Quality Yourself
The final section in this chapter is concerned with the ways in which administrators of voice
mobility networks can directly ascertain the quality of the network.
3.3.1 The Expensive, Accurate Approach: End-to-End Voice Quality Testers
As mentioned in the discussion of PESQ (Section 3.1.2), existing tools can measure the
quality of the voice network by directly pumping in prerecording voice samples and
Elements of Voice Quality 71
www.newnespress.com
comparing the output. These tools are either expensive or home-grown, and are used to test
large networks as a part of a planning or predeployment phase.
This sort of testing is more of a tuning exercise, and—much like how piano tuning is a rare
and complicated enough exercise that it is not performed frequently—direct end-to-end
testing is not diagnostic. Telephone equipment testing companies do make the sort of
equipment to perform this end-to-end inspection, and these tools can be rented.
Unfortunately, it is very difficult to know where to invest in this sort of heavily proactive
effort.
More likely, the voice quality is measured by having administrators walk around the
network with some number of phones in question, ensuring themselves that whatever
problems they may face will likely be manageable. The problem with both forms of
proactive testing is that they normally occur on only lightly loaded networks, and thus are
not able to measure the effect of network load on voice quality. Network load is generally
the largest impact on voice quality, in fact, partly because voice mobility network managers
do a good job of testing their networks before they launch them for basic problems, which

they quickly correct, and partly because voice mobility networks are more likely to be
robust enough out of the box for basic voice connectivity.
3.3.2 Network Specific: Packet Capture Tests
Most of the major packet capture tools, for wireline and for wireless, make modules that are
able to indirectly infer the MOS values using E-model calculations. Sometimes, these work
by tracing the voice setup protocols, such as SIP, and determining what RTP flows map to
phone calls and the properties of the phone calls. Other times, these tools will just look
directly at the RTP streams, and not try to find out what phone numbers the streams map
to In both cases, the tools then use the sequence number and timestamp fields in the RTP
stream to determine values such as loss, delay, and jitter. Using assumed values for the jitter
buffer, with the option of having the user overwrite them, the tools then model the expected
effect and produce a score.
The major issue with these tools is that they show quality only up to the point where they
are inserted. An easy example of the problem is to look at wireless networks. On a Wi-Fi
network, a packet capture tool may be able to directly determine what packets it sees and
come up with a score. By looking at the Wi-Fi protocol, the tool may do a good job of
inferring whether the mobile phone received the packet from the access point, and at what
time, and may produce a reasonably close call quality number. On the other hand, the
upstream flow is likely to look quite good from the point of view of the test tool, because
there is only one network in between the client and the tool. The entirety of the network
upstream from the client goes missing, and the upstream MOS value can be entirely
misleading.
72 Chapter 3
www.newnespress.com
Some network infrastructure devices are able to do these inferences within themselves, as
they pass the data through. This may be a reasonable thing to do, again depending on the
point of insertion and how well they are able to capture information as late into the network
as possible. It is important, when using all of these tools, for you to consult with the vendor
or maker of the tools to find out where the tools are measuring. For a wireless controller
with voice metric capabilities, for example, make sure that the downstream metrics are

measured on the access point, based on what happened over the air, and not just passing
through the controller. For wireless overlay monitoring, make sure that there is an option to
do a similar capture using a wired mirror port on one of the switches, for cases in which
voice quality might begin to suffer and the network needs direct attention. Overall, do not
rely on just one tool, and believe what the users say—no matter what the tool tells you.
3.3.3 The Device Itself
The most accurate and reasonable way to measure voice quality is from the endpoints
themselves. Both some handsets and PBXs offer the ability for the device to produce the
one-way MOS value or R-value for the receive side at the device itself. These numbers are
based entirely on E-model calculations, assuming best-case or known-default scenarios for
the rest of the system, but are likely to be the most accurate. Of course, it is difficult to ask
a user to determine what the voice quality is of a call while on it, especially given that
voice quality is not something a user wants to measure. However, for diagnosing locations
that are having troubles, this tool is valuable for the administrator herself, who is able to
avoid having to guess as to whether the call sounds reasonable, and may be able to detect
variations in the MOS value or R-value.
In the end, keep in mind that the absolute values produced by any of the methods deserve
being taken with a grain of salt. As time goes on, the administrator of a voice mobility
network should be able to learn what the real quality means for any given value the tool
suggests, even when the tool is placing results a half a MOS point too high or too low.
However, the variation of the scores, especially when the network has changed, can be a
valuable tool for point the way towards the solution.
73
CHAPTER 4
Voice Over Ethernet
4.0 Introduction
This chapter introduces the technologies necessary to carry voice over wireline packet
networks. The first half of the chapter is a basic review of the concepts within packet
networks, including IP and Ethernet. The second half takes a look directly at voice over
these networking technologies.

4.1 The IP-Based Voice Network
The previous chapters explored the basics of how calls are set up and voice is carried over
packet-based IP networks. However, the details about what makes the IP network itself
work have not yet been addressed.
Voice started out on analog phone lines. Each pair of copper wires was dedicated to one
specific phone, and to nothing else. This notion of a dedicated circuit has its advantages.
It provides complete isolation of whatever might be going on with that line from the
circumstances and problems of other phones in the network. No amount of calls being
placed on a neighbor’s line can make the original line itself become busy. This isolation
and invariance is necessary for voice networks to function when unexpected circumstances
occur, and ensures that the voice network is reliable in the face of massive fluctuations in
the system. Provisioning is simple, as well, with one line per phone at the edge.
The problem with the concept of the dedicated line is that it is extremely wasteful. When
the phone is not in use, the line stays empty. No other calls can be placed on that line. Even
when a call is in place, the copper wire is fully occupied with carrying the voice traffic, a
small bandwidth application, and a tremendous amount of excess signal capacity exists.
Dedicated wires might make sense for short distances between the phone and some
next-level aggregation equipment, but these dedicated lines were used as trunks between
the aggregators, causing tremendous waste from both idleness and lost bandwidth. But
probably the property that caused the most complications with wireline networking was
that the dedicated line is not robust. If network problems occur—the bundle of cables is cut,
or some intermediate equipment fails and can’t do its job—all lines that are attached along
that path are brought down with it.
©2010 Elsevier Inc. All rights reserved.
doi:10.1016/B978-1-85617-508-1.00001-3.
74 Chapter 4
www.newnespress.com
Digital telephone networks started to eliminate some of the problems inherent to the one-
line dedication of early circuit switching. By having digital processes encode and carry the
voice, more voice calls could be multiplexed onto each line, better using the bandwidth

available on the copper wire. Furthermore, by allowing for hop-by-hop switching with
smarter switches between trunks, failures along one trunk could be accommodated.
However, the network was still circuit-switched. A voice line could be used only for voice.
Even where voice circuits were set aside for data links, the link is either fully in use or
not at all. The granularity of the 64kbps audio line, the DS0, became a burden. Running
applications that are not always on and have massive peak throughput but equally meek
average throughput requirements meant that provisioning was always an expensive
proposition: either dedicate enough lines to cover the peak requirement case, and pay for
all of the unused capacity, or cap the capacity offered to the application. Furthermore, these
circuits needed to be considered, managed, and monitored rather separately. The hard
divisions between two circuits became a hard division between applications. Voice networks
were famous for their reliability, strict clockwork operation—and complexity. They were
not for easy-to-set-up, easy-to-move operations. The wires are drawn once and carefully,
and the switches and intermediate equipment is set up by a team of dedicated and expensive
experts who do nothing but voice all day. If you were serious about voice, you operated
your own little phone company, complete with dedicated operators. If not, your only option
was to have the phone company run your phone network for you.
Along came packet-switched networks. Sending small, self-contained messages between
arbitrary endpoints on a network inherently made sense for computers. The idea of sending
a message quickly, without tying up lines or going through cumbersome setup and teardown
operations removed the restrictions on wasted lines. Although it was still true that lines
could remain idle when not being used, the notion of allowing these packets of information
into the line as the fundamental concept, rather than requiring continuous occupation and
streaming, meant that lines that carried aggregated traffic from multiple users and multiple
messages could be used more efficiently. If the messages were short enough, one line might
do. No concerns about running out of lines and having the needed, or only, path to the
receiver blocked. Instead, these messages could just be queued until space was available.
Along with this whole new way of thinking about occupying the resources came a different
way of thinking about addressing and connecting the resources. In the early days, a phone
number used to encode the exact topological location of the extension. Each exchange, or

switch with switchboard operator, had a name and number, and calls were routed from
exchange to exchange based on that number first. Changes to the structure or layout of the
telephone system would require changes to the numbers. Packet-switching technologies
changed that. Lines themselves lost their names and numbers. Instead, those names and
numbers were moved to the equipment that glued the lines together. Every device itself now
had the address. The binding of the addresses to the topology of the network remained, at
Voice Over Ethernet 75
www.newnespress.com
some level. Devices could not be given any arbitrary address. Rather, they needed to have
addresses that were similar to their neighbors. The notion of exchange-to-exchange routing
was retained.
This notion, though, proved to be a burden. Changes to the network were quite possible, as
either more devices needed addresses, or more new “exchanges” were added to the network.
Either way, the problem of figuring out how to route messages through the network
remained. The original design had each router know which lines needed to be used to send
the messages along their way. The router might not know how the message should get to
the final destination, but it always knew the next step, and could direct traffic along the
right roads to the next intersection, where the next router took over. As the number of
intersections increased, and the number of devices expanded, the complexity of maintaining
these routing tables exploded. A way was needed for neighboring routers to find out about
each other, and more importantly, to find out about what end devices they knew routes to.
Thus, the routing protocol was born. These protocols spoke from router to router,
exchanging information on a regular basis, ensuring that routers always had recent
information on what destinations were valid and how to get there from here. But another
thing happened. This idea of exchanging the routes had another benefit, in that it allowed
the network itself to be restructured, or to fail in spots, and yet still be able to send traffic.
Routers did not need to know the entire path to the destination, only the next hop. If a
router knew two, different next hops for the same message, and one of the routes went
down, the router could try the second one. If the router lost all of its paths to a particular
set of destinations, the router before it could learn about that, and avoid using that path to

get the messages through. If there was a way to get the message there, the network would
find it, through the process of convergence, or agreement over time on the consistency of
whether and how messages could be sent. The network became resilient, and point failures
would not stop traffic from flowing.
This is the story of the Internet, and of all the protocols that make it work. Clearly, the story
is simplified (and perhaps romanticized to highlight the point at hand), but the fundamentals
are there. Circuit switching is difficult to manage, because it is incredibly wasteful and
inflexible. Packet switching is much simpler to manage, and can recover from failures.
The Internet grew up on top of the lines offered by the circuit-switched technologies, but
used a better way to dedicate the resources. It wasn’t long before someone realized that
voice itself could be put over these packet-switched lines. At first, that might sound
wasteful, as using a digital line to carry a packet containing voice can never be more
efficient than using that line to carry the same bits of voice directly because of the packet
overhead. But packet networking technologies matured, and the throughputs offered on
simple point-to-point links grew much faster than did the corresponding uses of the same
copper line for digital voice—at least, in the enterprise. And the advantages of using a
76 Chapter 4
www.newnespress.com
multipurpose technology allowed these voice over IP pioneers to use the network’s
flexibility and lack of dedication to one purpose to add to the voice over IP offerings
quickly, without requiring retooling of physical wires. The ways in which provisioning was
thought about changed, and the idea that voice and data networks can perhaps use the same
resources became a compelling reason to try to save deployment and management costs.
There are a tremendous number of resources available for understanding the intricacies of
how IP networks operate, including details on how to manage routing protocols and large
trunk lines. Here, we will explore how voice fits into the packet-based IP network.
4.1.1 Wireline Networking Technologies and Packetization
The wireline networking technologies range from the most basic definition of how electrical
signals are encoded over the copper line to the higher-level ways that computer software
endpoints ensure that messages do not flood the network.

4.1.1.1 Ethernet
Nearly all wireline voice mobility networks in the enterprise start with Ethernet. Ethernet
is a family of related networking technologies that establish how two machines that are
physically connected can talk to each other. Ethernet was designed to be as simple to deploy
as possible, so that it can be set up as an unmanaged network, where physically connecting
two endpoints together, somehow, through the network is enough to allow them to find each
other and communicate. (Note that this doesn’t mean that higher-level protocols will work
on this network without effort—just Ethernet itself.)
All of the Ethernet protocols belong to the IEEE 802.3 series and are based on the idea of
encoding frames. A frame is a well-defined packet message, with a source, a destination,
a length, and a type. The logical format of the Ethernet frame is shown in Table 4.1.
Table 4.1: Ethernet Frame Format
Destination Source Ethertype Frame Body FCS
6 bytes 6 bytes 2 bytes
n bytes
4 bytes
In Ethernet, links are anonymous. Endpoints, however—the line cards that the Ethernet
cables plug into—are given addresses. These addresses are assigned at the time the device
is built, and are permanently associated with the device. The Ethernet address is a 48-bit
(6-byte) address, as shown in Table 4.2. The first three bytes, or 24 bits, is called the
Organizationally Unique Identifier (OUI). Each manufacturer of Ethernet equipment is
assigned one or more of these OUIs by the Institute of Electrical and Electronics Engineers
(IEEE) Registration Authority. The manufacturer chooses the second 24 bits from a unique
Voice Over Ethernet 77
www.newnespress.com
pool, often in order starting from 00:00:01. Together, the scheme guarantees that this
address will never be accidentally taken by another device.
Ethernet also defines two special flags in the address. The L bit specifies a local address,
which is dynamic and invented by a device for temporary usage. This has an application
in Wi-Fi (see Chapter 5), but is otherwise not common. The G bit is for group-addressed

frames—either broadcast or multicast. A group-addressed frame is meant to go out to
multiple devices at once, for all of them to receive. Multicast transmissions use this
mechanism. The special group address FF:FF:FF:FF:FF:FF (all 1s) is the broadcast address,
and specifically requests to go to every device, whether they are in a multicast group or not.
Table 4.2: The Ethernet Address Format
OUI Manufacturer-Defined
Bit: 0–23 24–47
L G
Bit: 6 7
This is one way by which Ethernet guarantees that it does not require management to add or
remove devices from the network. When a device wants to transmit over a wire to another
device, it has no way of knowing if that second device is there. Ethernet was intentionally
designed to be as simple as possible, so senders have to transmit and hope that the other
device is there. When the sender creates a frame, it places the destination Ethernet address
first in the frame, followed by its own address. Then comes the type of the frame, used to
figure out what network protocol is running on top of Ethernet. An arbitrary frame body
follows, subject to size restrictions: the body of the frame cannot be greater than 1500
bytes, usually, and cannot be less than 64 bytes. (Shorter frames must be padded.) Finally,
Ethernet provides a way to determine whether noise on the Ethernet line causes any bit
errors, by using a frame check sequence (FCS), a mathematical checksum of the bits in the
frame that will generally not match the contents of a frame if there are any errors. Ethernet
uses a CRC-32 checksum.
Ethernet itself is a serial protocol, much like serial lines used to connect modems together,
but operating with much more sophistication and at a faster rate. Most Ethernet types today
fall into two categories: copper and fiber. The commercially available copper Ethernet
technologies all use a modified version of a telephone cable, made out of copper wires.
Each cable carries eight small, insulated copper wires, twisted into pairs as is done for
analog telephone lines. The plastic connectors at each end also look like telephone
connectors, but have eight pins, rather than the usual six. These connectors, often referred
to as RJ45, a specification in which the connectors figure prominently, snap into the

78 Chapter 4
www.newnespress.com
corresponding sockets on all Ethernet devices. Differing numbers of the pairs within the
four-pair cable may be used for different Ethernet technologies.
The first RJ45-based Ethernet is called 10BASE-T, or simply original Ethernet. Devices that
support 10BASE-T run at 10Mbps, across just two of the pairs within the cable, one for
reception, and one for transmission. (The other pairs are not used for data.) These Ethernet
lines run a serial protocol, where the voltage on the line is flipped to signal a one or a zero
in the bits used to encode the frame. However, these serial lines do not constantly transmit.
Instead, the line is usually idle. But when a device wants to transmit on the line, it simply
starts transmitting. The transmission itself is the frame, just described. Before the frame
itself is sent, a few bits are prepended to it. These bits, known as the preamble, are used to
alert the device at the other end that the transmission is going to begin. The preamble is a
64-bit sequence of alternating ones and zeros, except for the last two bits, which are both
ones. The receiving device detects that a transmission comes in, by looking for the sharp
swings in voltage in the line from idle, representing the preamble’s bits. By the time the
preamble is done, the receiver will have figured out the timing of the bit patterns, in case
the receiver’s clock is slightly off from the sender’s. The full bits of the frame proper come
in, including the checksum. At the end of the transmission, the sender and receiver have
to wait for a few microseconds, and then the line becomes idle and ready to be transmitted
on again.
Given that 10BASE-T is a point-to-point physical system, as there can only be one
transmitter on one twisted pair, and the other transmitter on the second, there needed to be
some way to interconnect multiple lines and thus multiple devices together. The solution to
that is the Ethernet hub. The hub works by connecting the twisted pair that is used by a
device to transmit, to every link’s twisted pair used to receive. This connection allows the
transmission by one device to reach all of the others on the same segment, or other devices
attached to the same hub. Hubs are purely electrical, and do not participate in the network
itself. When a device transmits on an Ethernet hub, every device on that hub hears the
signal. A receiver knows that the frame is for it by looking at the destination Ethernet

address. If the address matches, then the frame is kept; otherwise, it is discarded unless the
operating system on that device requests to receive all frames on the line. The use of hubs,
and the definitions for 10BASE-T, require that the transmissions are all half-duplex,
meaning that a reception and transmission cannot occur independently.
Adding multiple devices together on an Ethernet link introduces a problem. Two or more
devices are capable of transmitting at the same time. If two devices do transmit at the same
time, their signals will mix on the wire, and all of the receivers will receive the garbage
created by the interference. Thankfully, there is a solution to avoid this. The overall concept
is known by the unwieldy phrase Carrier Sense Multiple Access with Collision Detection
(CSMA/CD). Let’s break that phrase apart, starting from the end. The collision detection

×