Tải bản đầy đủ (.pdf) (39 trang)

Thực hiện chất lượng dịch vụ trong các mạng IP (P7)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (210.24 KB, 39 trang )

7
Measurements
NOIAYTON
(Know thyself; Inscription on the fa¸cade of Apollo’s temple in Delphi)
The traffic engineering process for IP networks includes obtaining
feedback from the network as the basis of assessing the need
to modify transport network parameters and to optimize its
behaviour. Obtaining information in precise and meaningful form
is imperative for being able to make the right adjustments to
improve the network performance. This chapter deals with the
means of monitoring service quality, the analysis methods applied
to measurement data, and examples of network performance
optimization based on processed measurement information.
A formulation of the goals and methods of traffic engineering
measurements in IP networks has been given in a recent Inter-
net draft [LCT + 02]. This draft will be used as an introduction
to the topic in this chapter, quoting other sources and adding
issues specific to multi-service networks as necessary. In the draft,
ISPs are singled out as one likely user for the methods presented.
The scope of the conceptual framework development therein is
intra-domain operations, but the definitions are intended to be
transferable also across operator domain boundaries. As such,
Implementing Service Quality in IP Networks Vilho R
¨
ais
¨
anen
 2003 John Wiley & Sons, Ltd ISBN: 0-470-84793-X
212 MEASUREMENTS
they should be applicable to different per-domain technologies as
well. The need for consistency, precision and effectiveness of traffic


engineering methods is cited as the reason for applying an over-
arching framework for traffic engineering. The ultimate goal of
measurements is to serve the needs of the traffic engineering pro-
cess, including forecasting, planning, dimensioning, control, and
performance monitoring.
The major tasks of traffic engineering measurements are defined
in [LCT + 02] as
• traffic characterization;
• network monitoring;
• traffic control.
These will be discussed in more detail later in this chapter. Let
us note in passing again that in multi-service networks, the char-
acterization, monitoring, and traffic control tasks may need to be
applied to multiple traffic aggregates on individual links.
Three approximate timescales pertaining to the use of data ob-
tained with measurements are identified in [LCT + 02], namely:
• Months or longer. This timescale relates to network planning and
upgrading. Forecast traffic volumes per service class are impor-
tant here.
• Hours to days. Capacity management is the primary use of mea-
surement data at these timescales. Measurement data could be
used to control default routing of traffic aggregates and resource
allocation in network nodes.
• Minutes or less. This timescale belongs to real-time control of the
network. In [LCT + 02], the example of temporary rerouting of
traffic aggregates to circumvent congestion is cited.
As we shall see later, measurements dealing with different
timescales have different requirements and analyses associated
with them. Some general requirements for measurements are:
• Accuracy in capturing important phenomena at different timescales.

The shortest timescale of relevance to network performance
on the timescale of interest must be known and the
measurement methods should be chosen accordingly. In a multi-
service network, the required accuracy may be different for
7.1 TRAFFIC CHARACTERIZATION 213
different traffic aggregates. For example, millisecond-accuracy
measurements for delay may be desirable for VoIP, but not
necessary for best-effort traffic.
• Network performance should not be degraded by measurements.This
applies both to the elements being measured as well as the effect
of measurements and transferring of measurement data to the
normal operation of a production network. Interestingly, this
is an issue not only for active measurements but passive ones
as well.
• The amount of data generated should be moderate.
Regarding the last two points, these issues are typically more
challenging in a multi-service network than in a best-effort
network, since there may be multiple quality support aggregates
involved for each link. Subsequently, both performing the actual
measurements and transmitting measurement data in such a way
as not to interfere with the normal network operation need to be
carefully planned.
Next, we shall take a look at the three tasks of traffic engineering
measurements as defined by our IETF framework [LCT + 02],
and then will discuss the definition of measured characteristics,
sources of information, measurement methods, and the required
measurement infrastructure in general. The present chapter will
be concluded with case studies.
7.1 TRAFFIC CHARACTERIZATION
Traffic characterization is the first task of traffic engineering mea-

surementsasdefinedby[LCT + 02], having the following goals:
• Identifying variations in traffic patterns using statistical analysis,
including development of traffic profiles to capture daily,
weekly, or seasonal variations.
• Determining traffic distributions in the network on the basis of
flows, interfaces, links, nodes, node-pairs, paths, or destinations.
• Estimation of the traffic load according to service classes in
different routers and the network.
• Observing trends for traffic growth and forecasting of traffic
demands.
214 MEASUREMENTS
The determination of traffic distributions in the network is partly
related to the estimation of traffic matrix discussed in Chapter 4.
In other words, direct measurement can be used to obtain the
topological distribution of traffic in the network. However, per-
link volumes need to be linked to traffic aggregates entering and
exiting the network domain in order to influence the distribution
with routing control.
For a best-effort IP network domain, traffic pattern variations
may relate to changes in the composition of protocol types in the
totality of traffic, as well as information about traffic volumes in
topological context. In multi-service networks, such information
should be available per service quality support aggregate.
In a multi-service network with service mapping onto
service quality support aggregates on the network edge, traffic
engineering benefits from the ability to compare characterizations
of both incoming service types and service quality support
aggregates, side by side. Such a comparison makes it possible to
effectively evaluate the suitability of both the service/aggregate
mapping at the network edge, and the service quality support for

aggregates in the network core (see Figure 7.1).
Depending on the measurement methods, discussed in more
detail below, modelling of data may be needed to interpret
the results in a context relevant for traffic engineering. For
modelling, the alternatives are fitting of measurement results
into an existing model and providing generic modelling for
measurement results without reference to the use context. In
certain situations modelling has a risk associated with it, since
it makes assumptions about what the results should look like.
Thus, when modelling is used, consistency checks should be
constructed to check that the situation in the measured network
ER
Service
distribution
Traffic load for
service support
aggregates
Figure 7.1 Modelling both incoming services at the network edge and
loads of service quality support bearers in the network core
7.1 TRAFFIC CHARACTERIZATION 215
is consistent with the assumptions made in the model. Another
useful technique is computation of the same characteristics using
multiple different methods.
According to [LCT + 02], Internet traffic is bandwidth-limited
but non-stationary; traffic can be heavy-tailed and have strong
correlations at short timescales. This is often the case in best-
effort Internet with no per-flow policing. The suggestion of the
Internet draft is to use decomposition of measurement results
into stationary and trend parts. Obviously the stationary part
also needs to account for diurnal variations in traffic intensity.

Regarding the scope of this book, this breakdown should be
possible per service quality support aggregate. A more fine-
grained temporal classification of the problem area of traffic
decomposition could be as follows:
• trend prediction (days or longer periods of time). This timescale
is relevant for capacity management in traffic engineering.
• busy-hour traffic characterization;
• statistics and correlation at the timescale of seconds.
The reported measurement data should be associated with
information on the scale of applicability, partly stemming from
the details of the actual measurement. This requirement is valid
irrespective of the possible use of modelling as a part of the
measurement. An example used in the cited draft, modelling of
flow burstiness can be performed by fitting measurement results
into a token bucket model as the probability that a flow can
be accommodated by the token bucket, as estimated from a
measurement. The result, in general, can be dependent on the
length of the measurement period. Thus, a full description of
measurements in this case is as follows:
• token bucket rate;
• token bucket depth;
• probability;
• timescale of applicability.
The timescale is of importance especially when dealing with self-
similar traffic patterns, where the average magnitude of variations
ranges with the length of the observation period.
216 MEASUREMENTS
7.2 NETWORK MONITORING
The goals of network monitoring, the second traffic engineering
measurement task, are:

• Determining the operational state of the network, including fault
detection.
• Monitoring the continuity and quality of network services,
to ensure that service quality objectives are met for
traffic aggregates. Further, goals include verification of the
performance of delivered services.
• Evaluating the effectiveness of traffic engineering policies,
or triggering certain policy-based actions (such as alarm
generation, or path pre-emption) upon threshold crossing; this
may make use of past performance data, in addition to latest
measurement results.
• Verifying peering agreements (SLAs) between service providers
by monitoring/measuring the traffic flows over interconnecting
links at border routers. This includes the estimation of inter-
and intra-network traffic, as well as originating, terminating,
and transit traffic that are being exchanged between peers.
Performance monitoring, meaning continuous monitoring of
the performance of network elements to ensure that they
are performing correctly, is a central part of performance
management. Possible reasons for suboptimal service performance
include route flapping, congestion, hardware or software failures,
and element misconfiguration.
In a multi-service network domain, the above goals need to be
implemented for the different service quality support aggregates.
This sets higher requirements for the measurement infrastructure.
For example, one needs to be able to detect the malfunctioning
at the granularity of individual service quality support classes.
Similarly, the triggering of traffic engineering actions needs to
work reliably in the multi-service case.
Obtaining data for various subtasks of network monitoring is a

trade-off between the number of different measurement types and
accuracy for individual monitoring purposes. It is clearly desirable
to derive multiple levels of performance characteristics from a
single set of measurement data, if the methods for performing this
are known and sufficient processing power for this is available.
Alas, both conditions are not always found. Especially for new
7.2 NETWORK MONITORING 217
types of services, the analysis methods for obtaining service
quality description may not yet be available. If services need to be
created quickly for commercial reasons, the very act of configuring
measurements using low-level data sources may be too time-
consuming and it may be easier to perform using separate service-
level measurements. Later, when the behaviour of the network
is understood better and on more quantitative level, the service-
specific performance data can possibly be extracted from a smaller
number of measurements.
The scheduling of measurements is part of the network
monitoring task. One important concept here is the temporal
distance between successive measurements, known as the read-
out period. The length of the read-out period should be sufficiently
short to prevent momentary variations from being averaged out,
but long enough to avoid disturbing the router. In a multi-
service network, the measurement scheduling periods need to
take into account different service quality support types. For
measurements directly addressing the service level performance of
a particular service support aggregate, the actual lengths of read-
out periods may be different from each other. For measurements,
the results of which are used for assessing the performance
of multiple types of services, on the other hand, the length
of the read-out period may be determined by the strictest

reporting requirement.
An important part of network monitoring is presenting the
measurement data within the right context. Taking the load as
an example, in a simple routed IP network, load levels might
be collected per link, out of which a meaningful traffic matrix
needs to be constructed. If MPLS is in use, however, also load
levels per LSP are of interest. If a single LSP is shared by multiple
traffic aggregates, load levels per DSCP/LSP combination would
be useful information for traffic engineering purposes. A router
can belong to multiple such contexts. As an example, an MPLS-
capable DiffServ router could perform monitoring both per LSP,
as well as per network interface.
7.2.1 Troubleshooting measurements for services
Troubleshooting-type measurements for service quality can be
used as necessary to complement data from normal “traffic
218 MEASUREMENTS
engineering” type measurements. Troubleshooting measurements
can be based on testing the actual service performance either
using realistic protocols is an emulation of application stream,
or otherwise measuring the quality of a traffic aggregate
used as a bearer for applications. Application emulation – type
troubleshooting measurements can be performed end-to-end, per-
domain, or narrowed down on smaller areas to identify potential
problem locations. For non-end-to-end measurements, one needs
to be careful in interpreting the measurement results in terms
of impact on end-to-end performance, as has been discussed in
Chapter 5.
Making an ICMP PING test for a route or for a LSP is an example
of a simple test of the ability of an aggregate to transfer any data.
Such tests could be carried out periodically to verify the opera-

tional status of routes or LSPs.
An example of an application emulation troubleshooting mea-
surement is transmission of emulated VoIP stream between two
laptops, and measuring delay time series and packet loss charac-
teristics from the stream. The emulation produces correct stream
metrics (packet sizes and transmission intervals can be chosen
to correspond to a realistic codec) and uses the same four low-
est OSI layers for the measurement stream as a VoIP applica-
tion (UDP/IP/LL/PHY). As with active measurements in general,
ICMP PING and traceroute may not yield correct delays due to
possibly different PDU treatment in routers for ICMP and L3+
traffic. Keeping in mind the related discussion in previous chap-
ters, all measurements also take into account the possible effects
of measurement endpoints.
An example of an end-to-end test is verifying the signalling per-
formance of a SIP proxy by using a SIP client, which is able to time
the individual signalling events. In this case, provided that the
endpoints (client and SIP proxy) are operating correctly, such a test
shows an example of the effect of network performance to end-to-
end application performance. Similarly, assessing end-to-end VoIP
performance could be based on transmission of reference samples
over the network using actual VoIP terminal applications having
hooks to provide application-level performance characteristics.
A possible set-up and its relation to traffic engineering mea-
surements in a domain are illustrated in Figure 7.2. The service
flow emulation measurements can be made on the same route
as the traffic engineering measurements, but may be expressly
7.3 TRAFFIC CONTROL 219
IP domain
ER

ER
Traffic engineering
measurements
IP domain
ER
ER
Testing of aggregate
performance
SIP proxy
Terminal
Testing of signalling
performance
Terminal
Testing of voice quality
Figure 7.2 Traffic engineering measurements (left) and different types of
troubleshooting-type measurements (right)
designed to test the performance relevant to the tested applica-
tion. An example of this is performing active measurements with
20 ms inter-packet separation to probe timescales of performance,
which are relevant for the AMR codec.
7.3 TRAFFIC CONTROL
The third task of traffic engineering measurements, traffic control,
is related to interfacing of the control part of traffic engineering
to measurements. According to [LCT + 02], some of the relevant
subtasks are:
• Adaptively optimizing network performance in response to net-
work events. Rerouting around points of congestion or failure
in the network is given as example of this.
• Providing a feedback mechanism in the reverse flow messaging
of RSVP-TE or CR-LDP signalling in MPLS to report on actual

topology state information such as link bandwidth availability.
• Support of measurement-based admission control, i.e., by pre-
dicting the future demands of the aggregate of existing flows so
that admission decisions can be made on new flows.
The examples listed above relate to routing control and admission
control means of service quality control. More generally, the
examples are partly overlapping with activities in the traffic
engineering process, which take measurements as input. Routing
control above is an example of this. Another example could be us-
ing measurements for triggering configuration optimization and
220 MEASUREMENTS
reconfiguration of DiffServ parameters in the network elements
using policy management. Other aspects of traffic control, as
exemplified by the list above, fall within shorter timescales than
what is relevant for the entire measure-model-reconfigure loop.
Direct measurement-based feedback to admission control is an
example of such shorter timescale processes.
Further examples of traffic control include:
• Triggering of reconfiguration of policing at the network edge.
• Adjustments of link costs in link-state routing protocols based
on measurements. This requires a management interface to link
cost computation in routers.
The first of these two examples is more clearly within the appli-
cation of the traffic engineering loop, requiring configuration of
multiple network elements. The reaction to a certain set of mea-
surement results could still be preconfigured, avoiding the need
to perform a full modelling step. Some aspects of dynamic traf-
fic control falling within the concept of bandwidth brokers are
discussed in the following chapter.
It should be noted that in traffic control, modelling of system

level behaviour is needed. The means for this have been discussed
in Chapter 5. Modelling of system behaviour should be contrasted
with modelling of data in the traffic characterization task, the latter
providing input for the former.
7.4 DEFINITION OF MEASURED CHARACTERISTICS
The measured characteristics, in all, span multiple scopes. A
general goal of Service Level Agreements, the characteristics
should be to use system-level characteristics that are independent
of protocols and platforms, and be uniform across operator
boundaries [LCT + 02]. This goal is rather similar to the Per-
Domain Behaviours [RFC3086] mentioned in the last chapter. More
fine-grained characteristics are needed for detailed performance
information.
The classification used in this chapter is as follows: characteris-
tics can be measured at service level, packet or traffic aggregate
level, or with network element granularity. Examples of charac-
teristics at each of these levels are given below.
7.4 DEFINITION OF MEASURED CHARACTERISTICS 221
• Service level:
– Service availability and continuity.
– Average holding time. Typically computed for a flow at the net-
work edge. Holding time of certain types of long-lived service
instances may also reflect network reliability.
– Service response time. This is also relevant for individual ser-
vice instances.
The above service level characteristics can be measured directly,
when passive monitoring of production streams is feasible and
technically possible. Service-specific data can also be obtained from
appropriate service elements. An example of the latter could be
querying SIP proxy for information about call statistics. When the

SIP proxy is outside of the network operator’s domain, delivery
of such information could be part of the Service Level Agreement.
Service response time can also be estimated with application-level
measurement, in which case modelling may be necessary.
• Packet/aggregate level:
– Bandwidth availability. Can be used for traffic control using mul-
tiple routes for a single traffic aggregate, measurement-based
admission control (MBAC), or adjusting admission control or
service quality support instantiation policy, for example.
– Throughput.[LCT + 02] defines throughput as the maximum
sustainable rate of packet transfer with which the service qual-
ity support objectives are met with.
– Delay. One-way or two-way measurements are possible for
active measurements, two-point sampling using passive mon-
itoring for non-encrypted flows with sequence numbers
possible.
– Delay variation. Different measurements possible (one-point
vs. two-point) [RFC3432, I.380].
– Packet loss. Depends on the protocol layer. Some of the loss
for non-SLA-conformant packets may arise at network edge.
– Statistics on erroneous PDUs.
Model-based measurements may be useful in this category in min-
imizing measurement overhead, provided that the effect of fitting
the results into a model on the results is properly understood.
222 MEASUREMENTS
• Network element/link level
– Traffic volume. The share of offered traffic for the measured
entity minus the effect of network condition, congestion, and
other effects.
– Resource usage. Link utilization, buffer occupancy, CPU load,

and memory usage.
– Traffic aggregate level statistics.
Measurement results in this category are obtained by direct polling
of elements or by a reporting interface to the measuring device
(network element or probe). In multi-service environment, results
for multiple traffic aggregates per network interface are typically
needed.
Measured characteristics are called entities in the terminology of
[LCT + 02]. Above, the list of entities has been complemented by
service-level characteristics.
Characteristics being statistical in nature, only estimates can
be obtained, as discussed in [SMH01]. When the dynamics and
the characteristics of the associated phenomena are well enough
known, model-based measurements can enhance the accuracy of
the estimators.
7.5 SOURCES OF MEASUREMENT DATA
Physically, measurement data can be obtained from network ele-
ments and measurement probes. Network elements, in turn, can
be IP or link layer devices such as routers, or service elements such
as HTTP servers or SIP proxies. For a transport network operator,
direct access to all service elements is not always possible, but
service usage information could be included into a SLA.
7.5.1 Measurement interfaces
Let us next attempt to make overview of the measurement inter-
faces. The interfaces will be discussed in more depth in Section 7.6.
From the network elements, data can be fetched using a vendor-
proprietary interface (for example, command line interface), or a
standardized interface such as Simple Network Management Pro-
tocol (SNMP). The data available in the network element can be in
7.5 SOURCES OF MEASUREMENT DATA 223

the form of proprietary counters, or standard MIBs. The communi-
cation between the management system and network element can
be based on polling of the elements by the management systems,
and on elements sending SNMP trap messages to the management
system in predefined conditions.
Probes can be of an active or passive type, the former monitor-
ing production traffic in the network, the latter transmitting probe
streams in the network. The Remote Monitoring (RMON) MIB
[RFC2819] is a possible interface with regard to passive probes, not
requiring constant connection between the probe and the manage-
ment system. IETF is also currently in the process of standardizing
the IP flow information exporting protocol that could be useful for
this purpose. For active probes and two-point passive measure-
ments, standardization work is ongoing.
7.5.2 Measured characteristics
Typical examples of measured characteristics are given below:
• Element-specific. For example, CPU loads.
• Packet statistics. Lengths of packets, number of packets, cor-
rupted packets. For passive monitoring, also the time the packet
was received.
• Flow. Source and destination addresses, port numbers, proto-
cols, time of start and end of a flow, packet count, IP version
(IPv4/IPv6). When the application flows are encrypted, obtain-
ing of some of the information may be limited.
• Traffic aggregate. Example characteristics are per-aggregate buffer
occupancy levels, offered traffic and goodput, and packet loss
statistics.
• Service event. Example characteristics include end availability,
user identity, invocation time, and service-specific parameters.
• Per link. Relevant characteristics include overall link utilization

statistics and packet loss rate.
• Per node pair. Delay, delay variation, packet loss, packet loss
correlation.
The measured objects are called bases in [LCT + 02]. Table 7.1
relating measured characteristics (entities) and measured objects
(bases) is given there.
Some of the measurement types require information that is ob-
tained by combining per-element or per-probe measurement data.
224 MEASUREMENTS
Table 7.1 Relation of measured characteristics and measured objects
Bases
Entities
Flow
(Passive)
Interface, node
(Passive)
Node pair
(Active &
passive)
Path
(Active &
passive)
Traffic Volume X(1) X X(3) X(3)
Average Holding Time X X(3)
Available Bandwidth X X(3)
Throughput X(4) X(4)
Delay X(2) X(4) X(4)
Delay Variation X(2) X(4) X(4)
Packet Loss X X(5) X(5)
Note: See p. 000 for an explanation of terminology and numbered notes

Source:From[LCT+ 02].
The following notes, indicated by the appropriate number in parentheses, apply to Table 7.1:
(1) This measurement type can be used to derive flow size statistics.
(2) These are one-point measurements.
(3) As a starting point, statistics collected by passive measurement through the MPLS traffic
engineering MIBs may be used.
(4) Active measurements based on IPPM metrics are currently in use for node-pairs; they
may be also applied to paths, but without means of controlling the routing no one-to-
one correspondence necessarily exists.
(5) Besides active measurements based on IPPM methodologies, path loss may possibly
be inferred from the difference between ingress and egress traffic statistics at the two
endpoints of a path. However, such inference for the cumulative losses between a
given node pair over multiple routes may be less useful, since different routes may
have different loss characteristics. Also, loss of even one source (network element) of
packet loss information along the measured path results in incorrect statistics on the
domain level.
In service level elements, relevant counters depend on the ser-
vice in question. Typical generally interesting characteristics are
• total number of service requests per measurement period;
• number of successful service requests per measurement period;
• number of non-successful service requests per measurement
period;
• CPU load of the service element.
MIB support for service depends on service details. Some aspects
of application performance measurements have been specified in
[RFC2564].
7.6 MEASUREMENT METHODS 225
Service level measurements can be performed per service type,
or per service quality support aggregate. In the latter case, the
measurement with a selected service type probes the performance

of the service quality support aggregate. Measuring per service
type end-to-end provides results, which are easiest to relate to end
user experience. When the network operator is providing service
quality support in the form of aggregates, most likely aggregate
level measurements are used.
7.6 MEASUREMENT METHODS
Different measurement methods were enumerated in the context
of traffic engineering earlier. Let us take another look at them now,
from the point of view of what is being measured. Higher-level
performance characterizations can be obtained by combining per-
element data, possibly using some sort of system modelling in
the process.
7.6.1 Obtaining performance data from network
elements
There are two basic ways of obtaining performance data from net-
work elements:
1. Reading performance counters in the devices, also known as
polling.
2. Devices reporting predefined conditions being transpired. This
can include both periodic reporting, and notifications.
Related to the first method, first – and obviously – the relevant
counters must be in place for monitoring. They may also need to
be explicitly enabled. The data to be monitored can relate to packet
statistics, traffic aggregates, flows, links, and service events. The
four first items listed are IP or link level characteristics, whereas
the last one is service-level characteristic.
A subset of IP-level characteristics can be available in any IP
device such as a router, depending on the set of counters available.
The palette of available characteristics (and counters) may depend
on the operating environment of the device – indeed, it is the basic

idea of DiffServ to keep complex per-flow operations at the edge
of the network.

×