Tải bản đầy đủ (.pdf) (10 trang)

The Complete IS-IS Routing Protocol- P19 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (156.4 KB, 10 trang )

Each router in the Figure 6.13 setup forms an adjacency with the other routers, effectively
forming a full-mesh. So far, so good. Now, consider the following scenario: the ATM vir-
tual circuit between Seattle and Los Angeles breaks for some reason, as indicated by the
dotted gray line. Both Seattle and LA notice the break and therefore generate a new LSP
(incrementing the Sequence Number and removing the adjacency between Seattle and
LA). The new LSP is sent according to the flooding rule on all interfaces where there are
adjacencies in the Up state. Thus, both Seattle and LA send four copies (gray arrows) of
their new LSPs into the network. Next, the four other routers will receive the two LSPs
(white arrows). Here is where the trouble starts: because the flooding algorithm is so sim-
ple, the algorithm does not yet know that all the other routers already been have updated
and know that the adjacency between Seattle and LA is down. What follows is a multi-
plication of LSPs due to the simplicity of the flooding algorithm. All of the routers
receive the two new LSPs and re-send the LSP to all the logical interfaces except on the
ones on which they got the LSP (gray arrows). What results is that 32 LSPs are sent for
a single broken ATM VC. This does not sound too stressful for a modern router’s control
plane; however, just think if there are not six routers, but 100 routers in the network. The
problem is that the number of LSPs grows by the square of the number of routers, or in
mathematical speak O(N
2
). Thus, a single failing VC in the network may generate up to
10,000 LSP updates, all flying around in a relatively short amount of time. This is an
awful lot of stress for the control plane of a router, no matter how powerful.
Flooding 167
tx LSP
rx LSP
Seattle
Los Angeles
San Francisco New York
Atlanta
Chicago
FIGURE 6.13. ATM overlay networks and flooding stress


Things get even worse with another failure scenario: what if not a single VC, but an
entire router is going down (due to a reboot, for example)? The amount of LSPs grows
by O(N
3
). In a network of 100 routers spanning a full-mesh, this means that a single fail-
ing router generates up to 1,000,000 LSPs in a short amount of time. Ironically, 99 per
cent of the LSPs hold information that is already known by some other neighbour. So
what can be done to mitigate the dark side of flooding? The answer to this is discussed in
the next section.
6.4.2 Mesh-Groups
Let’s go back to the basic flooding algorithm and change it a little bit. Now the rule
is: Do not send out a received LSP on all the links where we have an adjacency in the Up
state. Rather, send out the LSP on some of these links. Figure 6.14 shows a router that
is not sending out an LSP on all of the possible links. Instead, some links have been pruned
off the flooding topology. The result is that all routers still see LSP updates, but the exces-
sive multiplication of LSPs is avoided. The official name for this kind of functionality is
known as Mesh-Groups and has been documented in RFC 2973. The Mesh-Group pruning
is done based on the topology of the network and is not automatic.
There are two basic concepts behind Mesh-Groups. The first concept is blocking an
interface entirely, as shown in Figure 6.14. Here, one or a set of interfaces is removed
from the flooding list. It is also very straightforward to configure on IOS and JUNOS
software, as shown in the following two configuration snippets. Both vendors share the
same spirit in their implementation of the Mesh-Group functionality. The LSP flooding
in both vendors’implementations is an interface property. In IOS, you configure everything
at the physical/logical interfaces prepended by the keyword isis. In JUNOS software,
all the logical interfaces can be referenced directly under the protocols isis
interface configuration branch, which is very practical, as the relevant information
is then at one place.
168 6. Generating, Flooding and Ageing LSPs
tx LSP

rx LSP
Pruned "flooding" links
FIGURE 6.14. Mesh-Group blocks remove certain links from the flooding topology
IOS configuration
In IOS, LSP flooding can be reduced using the isis mesh-group blocked configuration
command in interface-configuration mode, as shown in the following:
London# show running-config
[… ]
interface atm 1/2.1
ip router isis
isis mesh-group blocked
[… ]
In JUNOS the configurations statement is very similar. The first flavour of Mesh-
Groups can be enabled by use of the mesh-group blocked config-uration directive
under the protocols isis interface <interface-name> configuration
hierarchy, as shown in the following:
JUNOS software configuration
hannes@Frankfurt> show configuration
[… ]
protocols {
isis {
interface at-4/0/0.200 {
mesh-group blocked;
}
}
}
[… ]
You may ask why the word Group is contained in Mesh-Group. So far we have not con-
figured a Group number. What is the Group number related to? This number is related to the
refined version of Mesh-Groups where the flooding is not turned off entirely for an interface.

Some LSPs are still sent. How is this second flavour of Mesh-Groups configured? First, all
the logical interfaces on an IS-IS router have to be organized in groups of interfaces. In
Figure 6.15 you can see that the first three interfaces have been grouped together in Mesh-
Group #11 and the second three interfaces have been grouped together in Mesh-Group #47.
Once an LSP is received over a logical interface (white arrow), the IS-IS router first deter-
mines the Mesh-Group number that the receiving interface belongs to. In our example the
receiving interface belongs to Mesh-Group #11. When this LSP is now flooded to all neigh-
bours, the router does flood the LSP on interfaces belonging to that specific group (Mesh-
Group #11 with the gray arrows). This solves the multiplicative effect of basic flooding.
The second flavour of Mesh-Groups that has just been described can be configured in
a similar way on IOS and in the JUNOS software. The only difference here is that a
Mesh-Group Number replaces the keyword blocked. Similar to the mesh-group
blocked command, this is configured under interface configuration mode.
Flooding 169
In IOS, LSP flooding can be reduced according to the second flavour of Mesh-groups
using the isis mesh-group <group-number> configuration command in interface-
configuration mode, as shown in the following:
IOS configuration
London# show running-config
[… ]
interface atm 1/2.1
ip router isis
isis mesh-group 11
interface atm 1/2.2
ip router isis
isis mesh-group 11
interface atm 1/2.3
ip router isis
isis mesh-group 11
[… ]

In JUNOS, the Mesh-Group Number replaces the blocked statement. The
second flavour of Mesh-Groups can be enabled by use of the mesh-group <group-
number> configuration directive under the protocols isis interface
<interface-name> configuration hierarchy, as shown in the following:
JUNOS software configuration
hannes@Frankfurt> show configuration
[… ]
protocols {
isis {
interface at-4/0/0.100 {
mesh-group 11;
}
170 6. Generating, Flooding and Ageing LSPs
Mesh group #11
Mesh group #47
tx LSP
rx LSP
F
IGURE 6.15. Mesh-Groups relay an LSP only to interfaces inside the same Mesh-Group
Flooding 171
interface at-4/0/0.101 {
mesh-group 11;
}
interface at-4/0/0.102 {
mesh-group 11;
}
}
}
[… ]
Mesh-Groups help to reduce the flooding explosion in densely meshed environments.

However, keep in mind that flooding is a necessity to get information across the internal
network. In a sense, it is “too-much” flooding that causes harm. However, a “too-little”
flooding strategy can cause harm in a different way. Thus, be very careful when setting
up Mesh-Groups. Mesh-Groups cannot be so “tight” that they result in desynchronized
link-state databases. In Chapter 8 you will learn about the impact of desynchronized
link-state databases and what can be done to avoid them. At the end of the chapter, a
refinement of ISO 10589 is presented to make sure that routers that have been acciden-
tally pruned off the flooding topology (due to a wrong Mesh-Group configuration, for
example) still receive good information for synchronization.
Although Mesh-Groups must be hand-configured by a network administrator, it is
easy to determine if Mesh-Groups are needed by looking at the statistics that IOS and the
JUNOS software can provide. For example, the relevant IS-IS statistics can be displayed
using the show clns traffic command, as shown in the following:
IOS command output
Amsterdam# show clns traffic
[… ]
IS-IS: Time since last clear: never
IS-IS: Level-1 Hellos (sent/rcvd): 115/19
IS-IS: Level-2 Hellos (sent/rcvd): 120/14
IS-IS: PTP Hellos (sent/rcvd): 0/0
IS-IS: Level-1 LSPs sourced (new/refresh): 10/0
IS-IS: Level-2 LSPs sourced (new/refresh): 14/0
IS-IS: Level-1 LSPs flooded (sent/rcvd): 2/2
IS-IS: Level-2 LSPs flooded (sent/rcvd): 3/2
IS-IS: LSP Retransmissions: 0
IS-IS: Level-1 CSNPs (sent/rcvd): 0/2
IS-IS: Level-2 CSNPs (sent/rcvd): 3/0
IS-IS: Level-1 PSNPs (sent/rcvd): 0/0
IS-IS: Level-2 PSNPs (sent/rcvd): 0/0
IS-IS: Level-1 DR Elections: 3

IS-IS: Level-2 DR Elections: 2
IS-IS: Level-1 SPF Calculations: 7
IS-IS: Level-2 SPF Calculations: 7
172 6. Generating, Flooding and Ageing LSPs
IS-IS: Level-1 Partial Route Calculations: 0
IS-IS: Level-2 Partial Route Calculations: 0
IS-IS: LSP checksum errors received: 0
IS-IS: Update process queue depth: 0/200
IS-IS: Update process packets dropped: 0
[… ]
In every case, a big disparity between the LSPs being sent and the LSPs being received
is an indication that there is excess flooding in the network that needs to be controlled via
Mesh-Groups.
In the JUNOS software, you can display the global lS-IS statistics using the show isis
statistics command. Watch for a disparity between LSPs being sent and received:
JUNOS software command output
hannes@Frankfurt> show isis statistics
IS-IS statistics for Frankfurt:
PDU type Received Processed Drops Sent Rexmit
LSP 220201 220201 0 152846 381
IIH 5640823 5640823 0 3762071 0
CSNP 5486953 5486953 0 9893412 0
PSNP 32766 32766 0 192857 0
Unknown 0 0 0 0 0
Totals 11380743 11380743 0 14001186 381
Total packets received: 11380743 Sent: 14001567
SNP queue length: 0 Drops: 0
LSP queue length: 0 Drops: 0
SPF runs: 121371
Fragments rebuilt: 336

LSP regenerations: 151
Purges initiated: 0
Mesh-Groups solved a big problem in ATM or Frame-Relay overlay networks of the
mid-1990s. However, today Mesh-Groups are of limited use because ATM and FR trans-
port networks connecting routers have gone away for the most part. Today, routers are
typically interconnected by packet-over-SONET/SDH links in a sparse-meshed fashion.
A typical core router these days has on average no more than four or five interfaces facing
other core routers. In these environments, Mesh-Groups are a nice tuning capability, but
not the necessity they were only a few years ago when networks were melting down in
the absence of a sound LSP flooding scheme.
6.5 Network-wide Purging of LSPs
The flooding of LSP updates the network with the most accurate state information. The
link-state database is therefore continually increasing as new or updated information is
added to it. If a link is down, issue a new LSP. When it comes back up, issue another new
LSP. So far there have been no negative LSPs that make the database shrink in size. But
what if IS-IS wants to remove a router from the distributed link-state database in all of
the other routers in the network? There is always the option to wait until the LSP ages
out, but that can take up to 65,535 seconds (18 hours, 12 minutes). For certain events,
such as router removal, IS-IS needs to have the capability to issue a negative LSP update.
This negative LSP, or purge LSP, exists and is a “crippled” version of the original LSP.
All the purge LSP contains is the LSP header without any further information. The
Header and the Checksum fields of the purge LSP header are set to zero to indicate that
this is a purge. This negative LSP update, which is called a network-wide purge, is used
for a variety of events. One of these events is DIS election.
6.5.1 DIS Election
On IS-IS broadcast links there is at least one router performing a special function. This
IS-IS router is called the Designated Intermediate System (DIS). The role of the DIS
was first discussed in Chapter 5. Each DIS borrows an ID that is unique across the net-
work from the LAN on which it is the DIS. The DIS floods that LAN-ID throughout
the network to tell other routers that there is connectivity to the LAN. Now, if the DIS is

changed (re-elected) due to changes, such as a higher DIS election priority or the
time-out of the old DIS, then the new DIS must generate a new LAN-ID and flood this
throughout the network. The has-been DIS needs to remove the old LAN-ID from
the network in order to ensure that it does not lead to corrupt network information.
Figure 6.16 shows the chain of LSPs that are generated to accomplish this. In order
to remove the stale LSP from the former DIS, the old DIS generates an LSP with the
sequence number incremented by one, but with the Checksum and Lifetime set to
zero. Each router that receives this purge LSP will remove the referenced LSP-ID from
its link-state database.
Network-wide Purging of LSPs 173
Local LAN
Old
pseudonode
Old DIS
New
pseudonode
Old DIS
FIGURE 6.16. At DIS re-election the old pseudo node LSP gets purged
6.5.2 Expiration of LSPs
Whenever a router ages-out an LSP whose Lifetime has become zero, it needs to tell the
other routers that the LSP has been aged out. Recall that each router has an internal clock
and those clocks are subject to clock drifts. At the same time, all the routers in a given IS-
IS level fundamentally rely on the fact that its link-state database is synchronized with all
others. So for further robustness in the face of clock drift, the first router that detects that
an LSP’s Lifetime has gone to zero, initiates a network-wide purge of that expired LSP.
Lifetime expiration of LSPs is common for routers that have been removed from the net-
work for one reason or another. Recall that under normal conditions, each LSP gets
refreshed by the Originator before it expires and therefore should never countdown the
Lifetime field to zero. This should only happen during the purge of an LSP.
If a router purges an LSP from the link-state database, the LSP is not removed imme-

diately. Instead, the LSP is retained for a ZeroAgeLifetime of 60 seconds. Keeping the
purged LSP for 60 seconds ensures that an LSP is not re-learned (for instance) through
an adjacency that has been Down and is now transitioning to Up again.
You can recognize a purged LSP that is still in the database if its Lifetime value is in
brackets. This is similar to the accounting world, where red numbers are in brackets as
well. And this is exactly what the User Interfaces do as well: they essentially show you a
zombie – an LSP that is already dead but we keep it alive for visibility, helping us in the
troubleshooting case.
IOS command output
Amsterdam# show isis database
[… ]
IS-IS Level-1 Link State Database:
LSPID LSP Seq Num LSP Checksum LSP Holdtime ATT/P/OL
New-York.02-00 0x00002fb1 0x6f71 (23) 1/0/0
[… ]
JUNOS software command output
hannes@New-York> show isis database
IS-IS level 1 link-state database:
LSP ID Sequence Checksum Lifetime Attributes
New-York.02-00 0x2fb1 0x6f71 (48) L1 Attached
[… ]
4 LSPs
Typically you do not see much purged LSPs in your database as this is a very rare case
(DIS routers do not change very often). However, if you see a lot of bracketed LPSs or
one LSP always containing a bracketed Lifetime then probably a malicious event like a
flood-purge storm is raging because of duplicate System-IDs.
174 6. Generating, Flooding and Ageing LSPs
6.5.3 Duplicate System-IDs
Whenever a router receives an LSP that contains its own System-ID as Originator, and
the router is sure that it did not generate this LSP, the router must assume that there is

another router on the network that is configured with a duplicate System-ID. All the
receiving router can do is to log this event and generate a purge LSP. The other router will
most likely try to re-originate this LSP with a higher Sequence Number. Of course, this
purge process needs to be carefully paced. Otherwise a flood-purge-storm will start to
rage as the two routers continually try to update and purge each other’s wrong LSP. You
will see in the next section how these storms can be prevented. Actually, the LSP will be
purged because duplicate System-IDs are also an obstacle for a clean SPF calculation.
This ensures that the network itself stays clean.
6.6 Flow Control and Throttling of LSPs
In link-state routing protocols, the implementer needs to make an effort not to over-
whelm neighbours with excessive LSP updates. Excessive LSPs might churn the net-
work. In typical transport protocols such as TCP there is a built-in feedback mechanism
that makes the sender slow down if the receiver feels overwhelmed. This is called flow
control. However, virtually all IGPs (including IS-IS) have no way to tell a neighbour
that the IS-IS router is busy and make the other neighbouring routers throttle down LSP
transmissions. It is beyond the scope of this book as to why the protocol designers did not
address flow control in the IS-IS specification. But this lack of flow control means that
an IS-IS router has to carefully pace (spread out in time) LSPs toward a neighbour. In
good IS-IS implementations there are a lot of built-in throttles that make the IS-IS router
well behaved, even when the network is in a transient stage and several LSP updates are
flying around. Additionally, there are also limits for how frequently a router can originate
LSP updates. A router not only has to take care that it does not overwhelm its directly
connected neighbours, but the router needs to take care that it does not overwhelm all the
routers that are beyond the immediately adjacent neighbouring routers. Recall that all
routers in a given IS-IS level need to dedicate some resources (such as CPU cycles,
bandwidth and so on) to process and relay LSPs farther across the network. So let’s be
nice to these routers and not overload them, as we need them to distribute reachability
information of all types.
Most modern implementations of the IS-IS protocol support a variety of control knobs
that makes an IS-IS router slower instead of faster. Realize that going slower when there

are transient conditions or LSP storms is the only option that a router has left if the router
is to continue running. There are a couple of big “Must-Not’s” that an implementation of
IS-IS should never do.
We must not trash our neighbours. IS-IS Hellos must always be sent. If a router does
not send IS-IS Hellos in time, the adjacency times out. Losing an adjacency in transient
situations will additionally contribute more LSPs to a network that is already shaky to
begin with.
Flow Control and Throttling of LSPs 175
We must not forget to acknowledge LSPs of a neighbour. Even when a router is under
pressure in the form of extreme packet loads, not acknowledging an LSP update means
that after five seconds the LSP will be retransmitted. So it is much better to acknowledge
the LSP the first time before the LSP gets retransmitted. A retransmission consumes the
resources of the neighbouring router as well as the receiving router because an LSP has
to be retransmitted by the neighbour and re-processed on the receiving side as well.
So if making things slower is the only thing a router can do, exactly what kind of
events need to be made slower or throttled? The important events to throttle are in the
areas of:

The LSPs on an interface

Frequency of originating (generating) LSPs per router

Retransmissions on a interface
Each of these is discussed in the following sections.
6.6.1 LSP-transmit-interval
The LSP transmit interval is one form of pacing that was originally mentioned in ISO
10589. The specification says that an implementation of IS-IS should make sure not to send
more than 30 LSPs per second on a given broadcast link. Both IOS and JUNOS software
extended this requirement that LSPs are paced on every IS-IS interface type (broadcast and
point-to-point). You can tweak that throttling timer in both JUNOS software and IOS.

In IOS, LSP throttling can be enabled using the isis lsp-interval <time>
configuration command in interface-configuration mode. The time is a constant
expressed in milliseconds (ms). The default value is 33 ms. This example sets the
LSP pacing so as not to exceed 20 LSPs per second (pacing of 50 ms means 20 LSPs
per second).
IOS configuration
London# show running-config
[… ]
interface atm 1/2.1
ip router isis
isis lsp-interval 50
[… ]
In JUNOS software, the throttling of LSPs can be enabled by use of the isis lsp-
interval <time> configuration directive under the protocols isis interface
<interface-name> configuration hierarchy. The default value is 20 ms and gener-
ates 50 LSPs per second, which means that JUNOS software is contrary to the original
20 LSP-per-second specification, but this limit is fairly old in that respect. Modern routers
should easily handle 50 LSPs per second. This example sets the JUNOS software value to
the specification limit of 50 ms (20 LSPs per second).
176 6. Generating, Flooding and Ageing LSPs

×