Tải bản đầy đủ (.pdf) (128 trang)

Understanding Linux Network Internals 2005 phần 10 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.21 MB, 128 trang )

ValueDescription
RTPROT_STATIC
Route installed by administrator. Not used.
Table 36-9. Values of fib_protocol used by user space
ValueDescription
RTPROT_GATED
The route was added by GateD.
RTPROT_RA
The route was added by RDISC (IPv4) and ND (IPv6) router advertisements. There is a mechanism, the
ICMP Router Discovery Protocol defined in RFC 1256, that lets hosts find neighboring routers. rdisc, which
is part of the iputils package, is the user-space tool that implements ICMP Router Discovery Messages.
RTPROT_MRT
The route was added by the Multi-Threaded Routing Toolkit (MRT).
RTPROT_ZEBRA
The route was added by Zebra.
RTPROT_BIRD
The route was added by BIRD.
RTPROT_DNROUTED
The route was added by the DECnet routing daemon.
RTPROT_XORP
The route was added by the XORP routing daemon.
u32 fib_prefsrc
Preferred source IP address. See the section "Selecting the Source IP Address" in Chapter 35.
u32 fib_priority
Priority of the route. The smaller the value, the higher the priority. Its value can be configured with IPROUTE2 using the
metric/priority/preference keywords. When not explicitly set, it has the default value 0 to which it is initialized by the kernel.
u32 fib_metrics[RTAX_MAX]
When you configure a route, the ip route command allows you to also specify a set of metrics. fib_metrics is a vector used to
store them. Metrics not explicitly configured are initialized to zero. See the section "Essential Elements of Routing" in Chapter
30 for a list of the available metrics. Table 36-10 shows the relationships between the metrics listed in that section and the
associated kernel symbols RTAX_XXX defined in include/linux/rtnetlink.h.


This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
Table 36-10. Routing metrics
MetricKernel symbol
Not a metric
RTAX_LOCK
Path MTU
RTAX_MTU
Maximum Advertised Window
RTAX_WINDOW
Round Trip Time
RTAX_RTT
RTT Variance
RTAX_RTTVAR
Slow Start threshold
RTAX_SSTHRESH
Congestion Window
RTAX_CWND
Maximum Segment Size
RTAX_ADVMSS
Maximal Reordering
RTAX_REORDERING
Default Time To Live (TTL)
RTAX_HOPLIMIT
Initial Congestion Window
RTAX_INITCWND
Not a metric
RTAX_FEATURES
int fib_power
This field is part of the data structure only when the kernel is compiled with support for multipath. See the section "Concepts

Behind Multipath Routing" in Chapter 31.
struct fib_nh fib_nh[0]
int fib_nhs
fib_nh is a variable-length vector of fib_nh structures, and fib_nhs is its size. fib_nhs can be greater than 1 only when the
kernel supports the Multipath feature. See the section "Concepts Behind Multipath Routing" in Chapter 31, and see Figure
34-1 in Chapter 34.
u32 fib_mp_alg
Multipath caching algorithm. The IP_MP_ALG_XXX IDs of the algorithms introduced in the section "Cache Support for
Multipath" in Chapter 31 are listed in include/linux/ip_mp_alg.h. This field is part of the data structure only when the kernel is
compiled with support for multipath caching.
#define fib_dev fib_nh[0].nh_dev
Macro used to access the nh_dev field of the first fib_nh instance of the fib_nh vector. See Figure 34-1 in Chapter 34.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
#define fib_mtu fib_metrics[RTAX_MTU-1]
#define fib_window fib_metrics[RTAX_WINDOW-1]
#define fib_rtt fib_metrics[RTAX_RTT-1]
#define fib_advmss fib_metrics[RTAX_ADVMSS-1]
Macros used to access specific elements of the fib_metrics vector.
36.5.6. fib_nh Structure

For each next hop, the kernel needs to keep more than just the IP address. The fib_nh structure stores that extra information in the
following fields.
struct net_device *nh_dev
This is the net_device data structure associated with the device ID nh_oif (described later). Since both the ID and the pointer
to the net_device structure are needed (in different contexts), both of them are kept in the fib_nh structure, even though either
one could be used to retrieve the other.
struct hlist_node nh_hash
Used to insert the structure into the hash table described in the section "Organization of Next-Hop Router Structures" in
Chapter 34.

struct fib_info *nh_parent
Pointer to the fib_info structure that contains this fib_nh instance. See Figure 34-1 in Chapter 34.
unsigned nh_flags
A set of RTNH_F_XXX flags defined in include/linux/rtnetlink.h and listed in Table 36-7 earlier in this chapter.
unsigned char nh_scope
Scope of the route used to get to the next hop. It is RT_SCOPE_LINK in most cases. This field is initialized by fib_check_nh.
int nh_weight
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
int nh_power
These two fields are part of the fib_nh data structure only when the kernel is compiled with support for multipath, and are
described in detail in the section "Concepts Behind Multipath Routing" in Chapter 31. nh_power is initialized by the kernel;
nh_weight is set by the user with the keyword weight.
_ _u32 nh_tclassid
This field is part of the fib_nh data structure only when the kernel is compiled with support for the routing table based
classifier. Its value is set with the realms keyword. See the section "Policy Routing and Routing Table Based Classifier" in
Chapter 35.
int nh_oif
ID of the egress device. It is set with the keywords oif and dev.
u32 nh_gw
IP address of the next hop gateway provided with the keyword via. Note that in the case of NAT, this represents the address
that the NAT router advertises to the world, and to which replies are sent before the router sends them on to the host on the
internal network. For example, the command ip route add nat 10.1.1.253/32 via 151.41.196.1 would set nh_gw to
151.41.196.1. Note that NAT support in the routing code, known as FastNAT, has been dropped in 2.6 kernels.
36.5.7. fib_rule Structure

Policy routing rules (also called policies) are configured with the ip rule command. If the IPROUTE2 package is installed on your Linux
system, you can type ip rule help to see the syntax of the command. Policies are stored in fib_rule structures, whose fields are described
here:
struct fib_rule *r_next

Links these structures within a global list that contains all fib_rule structures (see Figure 35-8 in Chapter 35).
atomic_t r_clntref
Reference count. It is incremented by fib_lookup (in the Policy Routing version only), which explains why fib_res_put (which
decrements it) is always called after a successful lookup.
u32 r_preference
Priority of the rule. This can be configured using the keywords priority, preference and order when the administrator adds a
policy with IPROUTE2. When not explicitly configured, the kernel assigns a priority that is one unit smaller than the priority of
the last user-added rule (see inet_rtm_newrule). Priorities 0, 0x7FFE, and 0x7FFF are reserved for special rules installed by
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
the kernel (see the section "fib_lookup with Policy Routing" in Chapter 35, and the definitions of the three default rules
local_rule, main_rule, and default_rule in net/ipv4/fib_rules.c).
unsigned char r_table
Routing table identifier. Ranges from 0 to 255. When it is not specified by the user, IPROUTE2 uses the following defaults:
RT_TABLE_MAIN when the user command adds a rule, and RT_TABLE_UNSPEC in other cases (e.g., when deleting a rule).
unsigned char r_action
The values allowed for this field are the rtm_type enum listed in include/linux/rtnetlink.h (RTN_UNICAST, etc.). The meanings
of these values are described in the section "rtable Structure."
This field can be explicitly set by the user using the type keyword when configuring a rule. When it is not explicitly configured
by the user, IPROUTE2 sets it to RTN_UNICAST when adding rules, and RTN_UNSPEC otherwise (e.g., when deleting
rules).
unsigned char r_dst_len
unsigned char r_src_len
Length of the destination and source IP addresses, expressed in bits. They are used to compute r_srcmask and r_dstmask.
When not initialized, they are set to zero.
u32 r_src
u32 r_srcmask
IP address and netmask, respectively, of the source network from which packets must come.
u32 r_dst
u32 r_dstmask

IP address and netmask, respectively, of the destination network to which packets must be directed.
u32 r_srcmap
Field that used to be set with the user-space keywords nat and map-to and was used by the Routing NAT implementation.
Routing NAT support has been removed, so this field is not used anymore. See the section "Recently Dropped Options" in
Chapter 32.
u8 r_flags
Set of flags. Currently not used.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
u8 r_tos
IP header's TOS field. Included because the definition of a rule can include a condition placed on the IP header TOS field.
u32 r_fwmark
When the kernel is compiled with support for the "Use Netfilter MARK value as routing key" feature, it is possible to define
rules in terms of firewall tags. This is the tag specified by the fwmark keyword when the administrator defines a policy rule.
int r_ifindex
char r_ifname[IFNAMSIZ]
r_ifname is the name of the device the policy applies to. Given r_ifname, the kernel finds the associated net_device instance
and copies the value of its ifindex field into r_ifindex. The value -1 for r_ifindex is used to disable the rule (see the section
"Impacts on the policy database" in Chapter 32.
_ _u32 r_tclassid;
This field is included in the data structure only when the kernel is compiled with support for the routing table based classifier.
Its meaning is described in the section "Policy Routing and Routing Table Based Classifier" in Chapter 35.
int r_dead
When a rule is available for use, this field is 0. When the rule is removed with inet_rtm_delrule, this field is set to 1. Every time
a reference to the fib_rule data structure is removed with fib_rule_put, the reference count is decremented, and when it gets
to zero the structure is supposed to be freed. At that point, however, if r_dead is not set, it means that something wrong
happened (for instance, code has set the reference count incorrectly).
36.5.8. fib_result Structure

The fib_result structure is initialized by fib_semantic_match to the result of a routing lookup. See Chapters 33 and 35 (in particular, the

section "Semantic Matching on Subsidiary Criteria") for more details. The fields in the structure are:
unsigned char prefixlen
Prefix length of the matching route. See the description of fz_order in the section "fn_zone Structure."
unsigned char nh_sel
Multipath routes are defined with multiple next hops. This field identifies the next hop that has been selected.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
unsigned char type
unsigned char scope
These two fields are initialized to the values of the fa_type and fa_scope fields of the matching fib_alias instance.
_ _u32 network
_ _u32 netmask
These two fields are included in the data structure definition only when the kernel is compiled with support for multipath
caching. See the section "Weighted Random Algorithm" in Chapter 33 for how they are used by the weighted random
multipath caching algorithm.
struct fib_info *fi
The fib_info instance associated with the matching fib_alias instance.
struct fib_rule *r
Unlike the previous fields, this one is initialized by fib_lookup. This field is included in the data structure definition only when
the kernel is compiled with support for Policy Routing.
36.5.9. rtable Structure

IPv4 uses rtable data structures to store routing table entries in the cache.
[*]
To dump the contents of the routing cache, you can view
/proc/net/rt_cache (see the section "Tuning via /proc Filesystem"), or issue the ip route list cacheor route -C commands. Here is a
field-by-field description of the data structure:
[*]
IPv6 uses rt6_info, and DECnet (not covered in this book) uses dn_route.
union { } u

This union is used to embed a dst_entry structure into the rtable structure (see the section "Hash Table Organization" in
Chapter 33). One of its fields, rt_next, is used to link the rtable instances that collide into the same hash table's bucket.
struct in_device *idev
Pointer to the IP configuration block of the egress device. Note that when the route is used for ingress packets that are to be
delivered locally, the egress device is the loopback device.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
unsigned rt_flags
The flags you can set in this bitmap are the RTCF_XXX values defined in include/linux/in_route.h and listed in Table 36-11.
Table 36-11. Possible values for rt_flags
FlagDescription
RTCF_NOTIFY
Interested user-space applications are notified of any change to the routing entry via Netlink. This option is
not yet completely implemented. The flag is set with commands such as ip route get 10.0.1.0/24 notify.
RTCF_REDIRECTED
The entry has been added in response to a received ICMP_REDIRECTmessage (see ip_rt_redirect and its
caller).
RTCF_DOREDIRECT
This flag is set by ip_route_input_slow when an ICMP_REDIRECT message must be sent back to the source.
ip_forward, described in detail in Chapter 20, decides whether to actually send the ICMP redirect based on
this flag and other information. For instance, if the packet was source routed, no ICMP redirect would be
generated.
RTCF_DIRECTSRC
This flag is used mostly to tell the ICMP code that it should not reply to Address Mask Request Messages.
The flag is set every time a call to fib_validate_source says that the source of the received packet is
reachable with a next hop that has a local scope (RT_SCOPE_HOST). See Chapters 25 and 35 for more
detail.
RTCF_SNAT
RTCF_DNAT
RTCF_NAT

These flags are not used anymore by IPv4. They were used by the FastNAT feature that has been removed
from the 2.6 kernels (see the section "Recently Dropped Options" in Chapter 32).
RTCF_BROADCAST
The destination address of the route is a broadcast address.
RTCF_MULTICAST
The destination address of the route is a multicast address.
RTCF_LOCAL
The destination address of the route is local (i.e., configured on one of the local interfaces). This flag is also
set for local broadcast and multicast addresses (see ip_route_input_mc).
RTCF_REJECT
Not used. According to the syntax of IPROUTE2's ip rule command, there is a reject keyword, but it is not
accepted.
RTCF_TPROXY
Not used.
RTCF_DIRECTDST
Not used.
RTCF_FAST
Not used. This flag is obsolete; it used to be set to mark a route as eligible for Fast Switching, a feature that
has been dropped in the 2.6 kernels.
RTCF_MASQ
Not used anymore by IPv4. The flag was supposed to mark packets coming from masqueraded source
addresses.
unsigned rt_type
Type of route. It indirectly defines the action to take when the route matches on a routing lookup. The possible values for this
field are the RTN_XXX macros defined in include/linux/rtnetlink.h and listed in Table 36-12.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
Table 36-12. Possible values for rt_type
Route typeDescription
RTN_UNSPEC

Defines a noninitialized value. This value is used, for instance, when removing an entry from the routing
table, because that operation does not require the type of entry to be specified.
RTN_LOCAL
The destination address is configured on a local interface.
RTN_UNICAST
The route is a direct or indirect (via a gateway) route to a unicast address. This is the default value set by the
ip route command when no other type is specified by the user.
RTN_MULTICAST
The destination address is a multicast address.
RTN_BROADCAST
The destination address is a broadcast address. Matching ingress packets are delivered locally as
broadcasts, and matching egress packets are sent as broadcasts.
RTN_ANYCAST
Matching ingress packets are delivered locally as broadcasts, and matching egress packets are sent as
unicast. Not used by IPv4.
RTN_BLACKHOLE
RTN_UNREACHABLE
RTN_PROHIBIT
RTN_THROW
These values are associated with specific administrative configurations rather than destination address
types. See the section "Route Types and Actions" in Chapter 30.
RTN_NAT
The source and/or destination IP address must be translated. Not used because the associated feature,
FastNAT, has been dropped in the 2.6 kernels.
RTN_XRESOLVE
An external resolver will take care of this route. This functionality is currently not implemented.
_ _u16 rt_multipath_alg
Multipath caching algorithm. It is initialized based on the algorithm configured on the associated route (see fib_mp_alg in the
section "fib_info Structure").
_ _u32 rt_dst

_ _u32 rt_src
Destination and source IP addresses.
int rt_iif
ID of the ingress device. Its value is extracted from the net_device data structure of the ingress device. For traffic generated
locally (and hence not received on any interface), the field is set to the ifindex field of the outgoing device. Do not confuse this
field with the iif field of the flowi data structure fl described later in this chapter. The latter field is set to zero (loopback_dev) for
locally generated traffic.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
_ _u32 rt_gateway
When the destination host is directly connected (it is on-link), rt_gateway matches the destination address. When a gateway
is needed to reach the destination, rt_gateway is set to the next hop gateway identified by the route.
struct flowi fl
Search key used for the cache lookups, described in the section "flowi Structure."
_ _u32 rt_spec_dst
RFC 1122-specific destination, explained in the section "Preferred Source Address Selection" in Chapter 35.
struct inet_peer *peer
The inet_peer structure, introduced in Chapter 19, stores long-living information about the IP peer, which is the host with the
destination IP address of this cached route. There is an inet_peer structure for each remote IP address to which the local host
has been talking in the recent past.
36.5.10. dst_entry Structure

The data structure dst_entry is used to store the protocol-independent information concerning cached routes. L3 protocols keep their
own, additional private information in separate structures. (For example, IPv4 uses rtable structures.)
Here is the field-by-field description:
struct dst_entry *next
Used to link the dst_entry instances that collide into the same hash table's bucket. See Figure 33-1 in Chapter 33.
struct dst_entry *child
unsigned short header_len
unsigned short trailer_len

struct dst_entry *path
struct xfrm_state *xfrm
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
These fields are used by IPsec code.
atomic_t _ _refcnt
Reference count. See the section "Deleting DST Entries" in Chapter 33.
int _ _use
Number of times this entry has been used (i.e., number of times that a cache lookup has returned it). Do not confuse this
value with rt_cache_stat[smp_processor_id( )].in_hit: the latter (described in the section "Statistics") represents the global
number of cache hits for the device.
struct net_device *dev
Egress device (i.e., where to transmit to reach the destination).
short obsolete
Used to define the usability status of this dst_entry instance: 0 (the default value) means the structure is valid and can be
used, 2 means the structure is being removed and therefore cannot be used, and -1 is used by IPsec and IPv6 but not by
IPv4.
int flags
Set of flags. DST_HOST is used by TCP and means the route leads to a host (i.e., it is not a route to a network or a
broadcast/multicast address). DST_NOXFRM, DST_NOPOLICY, and DST_NOHASH are used only by IPsec.
unsigned long lastuse
Timestamp used to remember the last time this entry was used. It is updated when there is a successful cache lookup and it
is used by the garbage collection routines to select the best structures to free.
unsigned long expires
Timestamp that indicates when the entry will expire. See the section "Expiration Criteria" in Chapter 33.
u32 metrics[RTAX_MAX]
Vector of metrics, used mostly by TCP. This vector is initialized with a copy of the fib_info->fib_metrics vector (if it is defined),
and default values are used where needed. See the function rt_set_nexthop and Chapter 35. See Table 36-10 for a
description of the vector's possible values.
The RTAX_LOCK value needs a little explanation. RTAX_LOCK is not a metric but a bitmap: when the bit in position n is set, it

means that the metric with enum value n has been configured with the lock options/keyword. In other words, a command like ip
route add advmss lock sets the 1<<RTAX_ADVMSS bit. When a metric is locked, it cannot be changed by protocol
events.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
unsigned long rate_last
unsigned long rate_tokens
These two fields are used to rate limit two types of ICMP messages. See the section "Egress ICMP REDIRECT Rate
Limiting" in Chapter 33 and the section "Routing Failure" in Chapter 35.
short error
When the fib_lookup API (used only by IPv4) fails, the error is saved into error (with a positive sign) and used later by ip_error
to decide how to handle the failure (i.e., to decide which ICMP to generate).
struct neighbour *neighbour
struct hh_cache *hh
neighbour is the data structure that contains the L3-to-L2 address mapping for the next hop. hh is the cached L2 header. See
the chapters in Part VI for details.
int (*input)(struct sk_buff*)
int (*output)(struct sk_buff**)
Functions used to process ingress and egress packets, respectively. See the section "Cache Lookup" in Chapter 33.
_ _u32 tclassid
Routing table based classifier's tag. See the section "Policy Routing and Routing Table Based Classifier" in Chapter 35.
struct dst_ops *ops
VFT whose functions are used to manipulate dst_entry structures.
struct rcu_head rcu_head
Takes care of mutual exclusion.
char info[0]
This field can be useful as a pointer to the end of the data structure. It is only a placeholder.
36.5.11. dst_ops Structure
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -


The dst_ops structure is the interface between the protocol-independent cache and L3 protocols that use a routing cache. See the
section "Interface Between the DST and Calling Protocols" in Chapter 33. Here is the field-by-field description:
unsigned short family
Address family. See AF_XXX values in include/linux/socket.h.
unsigned short protocol
Protocol ID. See ETH_P_XXX values in include/linux/if_ether.h.
unsigned gc_thresh
This field, used by the garbage collection algorithm, specifies the size (number of buckets) of the routing cache. The
initialization is done in ip_rt_init (the IPv4 routing subsystem initialization function).
int (*gc)(void)
atomic_t entries
gc is the garbage collection function invoked by dst_alloc when the number of dst_entry instances (enTRies) already allocated
by the protocol is greater than or equal to the threshold gc_thresh.
struct dst_entry * (*check)(struct dst_entry *, _ _u32 cookie)
void (*destroy)(struct dst_entry *)
void (*ifdown)(struct dst_entry *, struct net_device *dev, int how)
struct dst_entry * (*negative_advice)(struct dst_entry *)
void (*link_failure)(struct sk_buff *)
void (*update_pmtu)(struct dst_entry *dst, u32 mtu)
int (*get_mss)(struct dst_entry *dst, u32 mtu)
See the section "Interface Between the DST and Calling Protocols" in Chapter 33.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
int entry_size
Size of the outer L3 routing cache structure (e.g., rtable for IPv4).
kmem_cache_t *kmem_cachep
Memory pool used to allocate routing cache elements.
36.5.12. flowi Structure


With the flowi data structure, it is possible to define classes of traffic based on the combination of fields such as ingress and egress
devices, parameters from the L3 and L4 protocol headers, etc. It is commonly used as a search key for lookups, as a traffic selector for
IPsec policies, and other advanced uses. Here is a brief description of its fields:
int oif
int iif
Egress and ingress device IDs.
union { } nl_u
Union whose fields are structures that can be used to specify the values of L3 parameters. The protocols currently supported
are IPv4, IPv6, and DECnet.
_ _u8 proto
L4 protocol.
_ _u8 flags
The only flag defined in this variable, FLOWI_FLAG_MULTIPATHOLDROUTE, originally was used by the multipath code, but
it is not used anymore.
union { } uli_u
Union whose fields are mainly structures that can be used to specify the values of L4 parameters. The protocols currently
supported are TCP, UDP, ICMP, DECnet, and the IPsec suite.
Because the data structure is not flat, but contains unions and structs, the kernel provides a set of macros that can be used to access
some of its fields.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
36.5.13. rt_cache_stat Structure

rt_cache_stat stores the counters used for the statistics introduced in the section "Statistics." Here are its counters:
in_hit
out_hit
Number of received and locally generated packets, respectively, that have been routed with a successful lookup on the
routing cache (see ip_route_input and ip_route_output_key).
in_slow_tot
in_slow_mc

in_slow_tot is the number of packets that required a lookup on the routing table because the cache lookup failed (see
ip_route_input_slow). Only successful routing table lookups are counted. The counter is called slow because a lookup on the
routing tables can be much slower than a lookup on the routing cache. This counter includes broadcasts, but it does not
include multicast traffic, which is counted in in_slow_mc.
out_slow_tot
out_slow_mc
out_slow_tot and out_slow_mc play the same role as in_slow_tot and in_slow_mc for the egress traffic
in_no_route
Number of ingress packets that could not be forwarded because the routing table did not know how to reach the destination
IP address (which is possible only if no default gateway is configured or usable). See ip_route_input_slow. There is no
counter to keep track of the locally generated packets that could not be sent for lack of a route.
in_brd
Number of broadcast packets received correctly (no sanity check failed). There is no counter for the number of transmitted
broadcasts.
in_martian_dst
in_martian_src
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
These two counters represent the number of packets that were dropped because the sanity check failed on the destination or
source IP addresses, respectively. Examples of sanity checks are that the source IP address cannot be multicast or
broadcast and that the destination address cannot belong to the so-called zero-networkthat is, it cannot look like 0.n.n.n.
gc_total
gc_ignored
gc_goal_miss
gc_dst_overflow
These four fields are updated by rt_garbage_collect, described in the section "rt_garbage_collect Function" in Chapter 33.
gc_total keeps track of the number of times rt_garbage_collect is invoked.
gc_ignored is the number of times rt_garbage_collect returns immediately because it was called too recently.
gc_goal_miss is the number of times the cache has been scanned by rt_garbage_collect without meeting the goal set at the
beginning of the function.

gc_dst_overflow is the number of times gc_garbage_collect fails by not reducing the number of cache entries below the
ip_rt_max_size threshold.
in_hlist_search
out_hlist_search
These are updated by the routines used for the cache lookups, ip_route_input and _ _ip_route_output_key, respectively.
They represent the number of cache elements that have been tested and did not match (not just the number of cache
misses).
36.5.14. ip_mp_alg_ops Structure

ip_mp_alg_ops represents the interface between the routing cache and the Multipath caching feature. It consists of the following function
pointers:
void (*mp_alg_select_route) (const struct flowi *flp, struct rtable *rth, struct rtable **rp)
void (*mp_alg_flush) (void)
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
void (*mp_alg_set_nhinfo) (_ _u32 network, _ _u32 netmask, unsigned char prefixlen, const struct fib_nh *nh)
void (*mp_alg_remove) (struct rtable *rth)
These functions are invoked by the algorithm-independent wrappers described in the section "Interface Between the Routing
Cache and Multipath" in Chapter 33.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
36.6. Functions and Variables Featured in This Part of the Book

Table 36-13 summarizes the main functions, variables, and data structures introduced or referenced in the chapters of this book covering
the routing subsystem. You can find more in the section "Generic Helper Routines and Macros" and "Helper Routines" in Chapter 32, and
the two "Helper Routines", in Chapter 35.
Table 36-13. Functions, variables, and data structures in the routing subsystem
Functions

for_ifa, endfor_ifa

for_primary_ifa, endfor_ifa
Macros used to browse the IPv4 addresses configured on a network device. See the section
"Primary and Secondary IP Addresses" in Chapter 32.
FIB_RES_XXX
Set of macros used to access the fields of the fib_result structure. See the section "Generic
Helper Routines and Macros" in Chapter 32.
LOOPBACK
ZERONET
MULTICAST
LOCAL_MCAST/BADCLASS
Macros used to recognize special IP addresses. See the section "Generic Helper Routines and
Macros" in Chapter 32.
fib_hash_lock
fib_info_lock
fib_rules_lock
rt_flush_lock
fib_multipath_lock
alg_table_lock
Locks used to protect various pieces of data. See the section "Global Locks" in Chapter 32.
ip_rt_init
ip_fib_init
devinet_init
fib_rules_init
fib_hash_init
dst_init
Initialization routines. See the section "Routing Subsystem Initialization" in Chapter 32.
dst_alloc
Allocate an entry for the routing cache. See the section "Cache Entry Allocation and Reference
Counts" in Chapter 33.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.

Simpo PDF Merge and Split Unregistered Version -
Functions

rt_periodic_timer
rt_secret_timer
Timers. See the sections "Garbage Collection" and "Flushing the Routing Cache" in Chapter 33.
fib_netdev_event
fib_inetaddr_event
Handlers for the netdev_chain and inetaddr_chain notification chains. See the section "External
Events" in Chapter 32.
fib_add_ifaddr
fib_del_ifaddr
Used to update the routing table upon the addition or removal of an IP address from the
configuration of a local network device. See the sections "Adding an IP address" and "Removing
an IP address" in Chapter 32.
fib_magic
Used by the kernel to insert routes under specific conditions. See the section "Routes Inserted by
the Kernel: The fib_magic Function."
fib_rules_detach
fib_rules_attach
Enables and disables routing policies when network devices are registered and unregistered,
respectively. See the section "Impacts on the policy database" in Chapter 32.
rtmsg_fib
Used to send notification on a specific Netlink multicast group when routes are added or
removed. See the section "Netlink Notifications" in Chapter 32.
ip_route_input
_ _ip_route_output_key
ip_route_output_flow
ip_route_output_key
ip_route_connect

ip_route_newports
The first two functions are routing cache lookup routines, and the others are wrappers around
them. See the section "Cache Lookup" in Chapter 33.
ip_route_input_slow
ip_route_output_slow
Routing table lookup routines. See Chapter 35.
ip_route_input_mc
Lookup routines used for multicast destinations.
ip_mkroute_input
ip_mkroute_input_def
ip_mkroute_output
ip_mkroute_output_def
fib_select_default
fib_select_multipath
Various support routines used by ip_route_input_slow and ip_route_output_slow. See Chapter 35.
fib_lookup
fn_hash_lookup
fib_semantic_match
Routines called at different stages during a routing table lookup. See the section "High-Level
View of Lookup Functions" in Chapter 35.
fn_hash_insert
Add a new route to a routing table. See the section "Adding a Route" in Chapter 34.
fn_hash_delete
Remove a route from a routing table. See the section "Deleting a Route" in Chapter 34.
rt_intern_hash
Add an entry to the routing cache. See the section "Adding Elements to the Cache" in Chapter 33.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
Functions


multipath_alg_register
multipath_alg_unregister
Register and unregister a multipath caching algorithm. See the section "Registering a Caching
Algorithm" in Chapter 33.
multipath_select_route
multipath_flush
multipath_set_nhinfo
multipath_remove
Various routines used to manage cache entries associated with multipath routes. See the section
"Interface Between the Routing Cache and Multipath" in Chapter 33. More routines are listed in
the section "Helper Routines" in the same chapter.
rt_free
dst_free
Free an rtable and a dst_entry structure, respectively.
rt_garbage_collect
rt_may_expire
Garbage collection routines used for the routing cache. See the section "rt_garbage_collect
Function" in Chapter 33.
dst_input
dst_output
Complete the reception and transmission of a packet, respectively. See the section "Cache
Lookup" in Chapter 33. See also the section "Setting Functions for Reception and Transmission"
in Chapter 35.
rt_garbage_collect
dst_destroy
dst_ifdown
dst_negative_advice
dst_link_failure
dst_set_expires
Routines used for the initialization of the dst_ops instance associated with the IPv4 protocol. See

the section "Interface Between the DST and Calling Protocols" in Chapter 33.
dst_dev_event
Handler used by the DST subsystem to process notifications from the neTDev_chain notification
chain. See the section "External Events" in Chapter 32.
RT_CACHE_STAT_INC
Update per-CPU statistics. See the section "Statistics."
Variables

ip_fib_local_table
ip_fib_main_table
Routing tables. See the section "The Two Default Routing Tables: ip_fib_main_table and
ip_fib_local_table" in Chapter 34.
rt_hash_table
Routing cache. See Chapter 33.
rt_hash_mask
Size of the routing cache (i.e., number of buckets of the hash table).
dst_garbage_list
List of dst_entry instances that cannot be removed because they are still referenced. See Chapter
33.
fib_tables
List of fib_table instances. See Figure 34-1 in Chapter 34.
fib_rules
List of routing policies. See the section "fib_lookup with Policy Routing" in Chapter 35.
fib_info_cnt
Number of outstanding fib_info instances. See the section "Dynamic resizing of global hash
tables" in Chapter 34.
fib_info_hash
fib_info_laddrhash
Hash tables used to search fib_info instances. See the section "Organization of fib_info
Structures" in Chapter 34.

This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
Functions

fib_info_devhash
Hash table used to search fib_nh instances. See the section "Organization of Next-Hop Router
Structures" in Chapter 34.
fib_props
Vector whose elements are used by the lookup routine fib_semantic_match to map route types to
return values. See the section "Return value from fib_semantic_match" in Chapter 35.
Data structures

fib_table structure
fn_zone structure
fib_node structure
fib_alias structure
fib_info structure
fib_nh structure
fib_rule structure
rtable structure
dst_entry structure
dst_ops structure
flowi structure
rt_cache_stat structure
ip_mp_alg_ops structure
Key data structures used by the routing code. They are described in detail in the section "Data
Structures Featured in This Part of the Book."
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
36.7. Files and Directories Featured in This Part of the Book

Figure 36-6 lists the files and directories referred to in the chapters in Part VII.
Figure 36-6. Files and directories featured in this part of the book
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
About the Authors
Christian Benvenuti received his masters degree in computer science at the University of Bologna in Italy. He collaborated for a few
years with the International Center for Theoretical Physics (ICTP) in Trieste, where he developed ad-hoc software based on the Linux
kernel, was a scientific consultant for a project on remote collaboration, and served as an instructor for several training sessions on
networking. The trainings, held mainly in Europe, Africa, and South America were all based on Linux systems and addressed to
scientists from developing countries, where the ICTP has been promoting Linux for many years. He occasionally collaborates with a
nonprofit organization founded by ICTP members, Collaborium.org, to continue promoting Linux on developing countries.
In the past few years he worked as a software engineer for Cisco Systems in the Silicon Valley, where he focused on Layer 2 switching,
high availability, and network security.
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
Colophon
Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers
complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects.
Philip Dangler was the production editor, and Audrey Doyle was the copyeditor for Understanding Linux Network Internals. Sada Preisch
proofread the book. Mary Brady and Colleen Gorman provided quality control. Rachel Monaghan, Lydia Onofrei, and Laurel Ruma
provided production assistance. Angela Howard wrote the index.
Karen Montgomery designed the cover of this book, based on a series design by Hanna Dyer and Edie Freedman. The cover image is a
19th-century engraving from Men: A Pictorial Archive from 19th Century Sources). Karen Montgomery produced the cover layout with
Adobe InDesign CS using Adobe's ITC Garamond font.
David Futato designed the interior layout. The chapter opening images are from Men: A Pictorial Archive from 19th Century Sources.
This book was converted by Keith Fahlgren to FrameMaker 5.5.6 with a format conversion tool created by Erik Ray, Jason McIntosh,
Neil Walls, and Mike Sierra that uses Perl and XML technologies. The text font is Linotype Birka; the heading font is Adobe Myriad
Condensed; and the code font is LucasFont's TheSans Mono Condensed. The illustrations that appear in the book were produced by
Robert Romano, Jessamyn Read, and Lesley Borash using Macromedia FreeHand MX and Adobe Photoshop CS. The tip and warning
icons were drawn by Christopher Bing.

This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -
Index
[SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z]
This document was created by an unregistered ChmMagic, please go to to register it. Thanks.
Simpo PDF Merge and Split Unregistered Version -

×