Tải bản đầy đủ (.pdf) (448 trang)

CRC press packet forwarding technologies dec 2007 ISBN 084938057x pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (14.59 MB, 448 trang )


PACKET FORWARDING
TECHNOLOGIES

AU8057_C000.indd i

11/14/2007 6:15:31 PM


OTHER TELECOMMUNICATIONS BOOKS FROM AUERBACH
Architecting the Telecommunication
Evolution: Toward Converged Network
Services
Vijay K. Gurbani and Xian-He Sun
ISBN: 0-8493-9567-4
Business Strategies for the
Next-Generation Network
Nigel Seel
ISBN: 0-8493-8035-9
Chaos Applications in Telecommunications
Peter Stavroulakis
ISBN: 0-8493-3832-8
Context-Aware Pervasive Systems:
Architectures for a New Breed of
Applications
Seng Loke
ISBN: 0-8493-7255-0
Fundamentals of DSL Technology
Philip Golden, Herve Dedieu, Krista S Jacobsen
ISBN: 0-8493-1913-7
Introduction to Mobile Communications:


Technology,, Services, Markets
Tony Wakefield, Dave McNally, David Bowler,
Alan Mayne
ISBN: 1-4200-4653-5
IP Multimedia Subsystem: Service
Infrastructure to Converge NGN,
3G and the Internet
Rebecca Copeland
ISBN: 0-8493-9250-0
MPLS for Metropolitan Area Networks
Nam-Kee Tan
ISBN: 0-8493-2212-X
Performance Modeling and Analysis of
Bluetooth Networks: Polling,
Scheduling, and Traffic Control
Jelena Misic and Vojislav B Misic
ISBN: 0-8493-3157-9
A Practical Guide to Content
Delivery Networks
Gilbert Held
ISBN: 0-8493-3649-X

Security in Distributed, Grid, Mobile,
and Pervasive Computing
Yang Xiao
ISBN: 0-8493-7921-0
TCP Performance over
UMTS-HSDPA Systems
Mohamad Assaad and Djamal Zeghlache
ISBN: 0-8493-6838-3

Testing Integrated QoS of VoIP:
Packets to Perceptual Voice Quality
Vlatko Lipovac
ISBN: 0-8493-3521-3
The Handbook of Mobile Middleware
Paolo Bellavista and Antonio Corradi
ISBN: 0-8493-3833-6
Traffic Management in IP-Based
Communications
Trinh Anh Tuan
ISBN: 0-8493-9577-1
Understanding Broadband over
Power Line
Gilbert Held
ISBN: 0-8493-9846-0
Understanding IPTV
Gilbert Held
ISBN: 0-8493-7415-4
WiMAX: A Wireless Technology
Revolution
G.S.V. Radha Krishna Rao, G. Radhamani
ISBN: 0-8493-7059-0
WiMAX: Taking Wireless to the MAX
Deepak Pareek
ISBN: 0-8493-7186-4
Wireless Mesh Networking:
Architectures, Protocols and
Standards
Yan Zhang, Jijun Luo and Honglin Hu
ISBN: 0-8493-7399-9

Wireless Mesh Networks
Gilbert Held
ISBN: 0-8493-2960-4

Resource, Mobility, and Security
Management in Wireless Networks
and Mobile Communications
Yan Zhang, Honglin Hu, and Masayuki Fujise
ISBN: 0-8493-8036-7

AUERBACH PUBLICATIONS
www.auerbach-publications.com
To Order Call: 1-800-272-7737 • Fax: 1-800-374-3401
E-mail:

AU8057_C000.indd ii

11/14/2007 6:15:32 PM


PACKET FORWARDING
TECHNOLOGIES

WEIDONG WU

New York

AU8057_C000.indd iii

London


11/14/2007 6:15:32 PM


Auerbach Publications
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2008 by Taylor & Francis Group, LLC
Auerbach is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-13: 978-0-8493-8057-0 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted
with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to
publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of
all materials or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or
other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Wu, Weidong.
Packet forwarding technologies / by Weidong Wu.
p. cm.
Includes bibliographical references and index.

ISBN-13: 978-0-8493-8057-0
ISBN-10: 0-8493-8057-X
1. Packet switching (Data transmission) 2. Routers (Computer networks) I. Title.
TK5105.W83 2008
621.39’81--dc22

2007026355

Visit the Taylor & Francis Web site at

and the Auerbach Web site at


AU8057_C000.indd iv

11/14/2007 6:15:32 PM


Contents
Preface ......................................................................................................... xiii
Acknowledgments ......................................................................................... xv
About the Author ......................................................................................... xvii
Chapter 1

Introduction ............................................................................... 1

1.1 Introduction .........................................................................................................
1.2 Concept of Routers ..............................................................................................
1.3 Basic Functionalities of Routers ...........................................................................
1.3.1 Route Processing ......................................................................................

1.3.2 Packet Forwarding ...................................................................................
1.3.3 Router Special Services ............................................................................
1.4 Evolution of Router Architecture .........................................................................
1.4.1 First Generation—Bus-Based Router Architectures with Single Processor ......
1.4.2 Second Generation—Bus-Based Router Architectures with
Multiple Processors ..................................................................................
1.4.2.1 Architectures with Route Caching .............................................
1.4.2.2 Architectures with Multiple Parallel Forwarding Engines .........
1.4.3 Third Generation—Switch Fabric-Based Router Architecture .................
1.4.4 Fourth Generation—Scaling Router Architecture Using Optics ...............
1.5 Key Components of a Router ...............................................................................
1.5.1 Linecard ...................................................................................................
1.5.1.1 Transponder/Transceiver ............................................................
1.5.1.2 Framer .......................................................................................
1.5.1.3 Network Processor .....................................................................
1.5.1.4 Traffic Manager .........................................................................
1.5.1.5 CPU ...........................................................................................
1.5.2 Network Processor (NP) ..........................................................................
1.5.3 Switch Fabric ............................................................................................
1.5.3.1 Shared Medium Switch ..............................................................
1.5.3.2 Shared Memory Switch Fabric ...................................................

1
2
2
2
4
5
7
7

8
8
9
11
12
14
14
14
14
15
15
16
16
19
19
20
v

AU8057_C000.indd v

11/14/2007 6:15:32 PM


vi



Contents

1.5.3.3 Distributed Output Buffered Switch Fabric ..............................

1.5.3.4 Crossbar Switch ........................................................................
1.5.3.5 Space-Time Division Switch .....................................................
1.5.4 IP-Address Lookup: A Bottleneck ...........................................................
References ....................................................................................................................

Chapter 2

Concept of IP-Address Lookup and Routing Table ................... 31

2.1 IP Address, Prefi x, and Routing Table ................................................................
2.2 Concept of IP-Address Lookup ...........................................................................
2.3 Matching Techniques .........................................................................................
2.3.1 Design Criteria and Performance Requirement .......................................
2.4 Difficulty of the Longest-Prefi x Matching Problem ............................................
2.4.1 Comparisons with ATM Address and Phone Number ............................
2.4.2 Internet Addressing Architecture ............................................................
2.5 Routing Table Characteristics .............................................................................
2.5.1 Routing Table Structure ..........................................................................
2.5.2 Routing Table Growth ............................................................................
2.5.3 Impact of Address Allocation on Routing Table .....................................
2.5.3.1 Migration of Address Allocation Policy ....................................
2.5.3.2 Impact of Address Allocations on Routing Table Size ................
2.5.3.3 Impact of Address Allocation on Prefixes
with 24-Bit Length ....................................................................
2.5.4 Contributions to Routing Table Growth .................................................
2.5.4.1 Multi-Homing ..........................................................................
2.5.4.2 Failure to Aggregate .................................................................
2.5.4.3 Load Balancing .........................................................................
2.5.4.4 Address Fragmentation .............................................................
2.5.5 Route Update ..........................................................................................

2.6 Constructing Optimal Routing Tables ...............................................................
2.6.1 Filtering Based on Address Allocation Policies ........................................
2.6.1.1 Three Filtering Rules .................................................................
2.6.1.2 Performance Evaluation ............................................................
2.6.2 Minimization of the Routing Table with Address Reassignments ...........
2.6.2.1 Case of a Single IP Routing Table ............................................
2.6.2.2 General Case .............................................................................
2.6.3 Optimal Routing Table Constructor ......................................................
2.6.3.1 Description of the Algorithm ...................................................
2.6.3.2 Improvements ..........................................................................
2.6.3.3 Experiments and Results ...........................................................
References ....................................................................................................................

Chapter 3

31
32
33
34
36
36
36
39
40
41
43
44
45
46
46

48
48
49
50
50
52
52
52
54
55
56
59
63
63
66
67
68

Classic Schemes .......................................................................... 69

3.1 Linear Search ......................................................................................................
3.2 Caching ..............................................................................................................
3.2.1 Management Policies ..............................................................................
3.2.1.1 Cache Modeling .......................................................................
3.2.1.2 Trace Generation ......................................................................

AU8057_C000.indd vi

21
22

25
27
27

69
69
70
70
71

11/14/2007 6:15:32 PM


Contents



vii

3.2.1.3 Measurement Results ................................................................ 72
3.2.1.4 Caching Cost Analysis .............................................................. 79
3.2.2 Characteristics of Destination Address Locality ..................................... 80
3.2.2.1 Locality: Concepts .................................................................. 80
3.2.2.2 Cache Replacement Algorithms .............................................. 81
3.2.2.3 Stack Reference Frequency ...................................................... 83
3.2.2.4 Analysis of Noninteractive Traffic ........................................... 86
3.2.2.5 Cache Design Issues ................................................................ 87
3.2.3 Discussions ............................................................................................ 89
3.3 Binary Trie ......................................................................................................... 89
3.4 Path-Compressed Trie ........................................................................................ 91

3.5 Dynamic Prefi x Trie ........................................................................................... 92
3.5.1 Definition and Data Structure ............................................................... 93
3.5.2 Properties of DP-Tries ............................................................................ 95
3.5.3 Algorithms for DP-Tries ......................................................................... 97
3.5.3.1 Insertion .................................................................................. 97
3.5.3.2 Deletion ................................................................................... 102
3.5.3.3 Search ...................................................................................... 104
3.5.4 Performance ........................................................................................... 105
References ................................................................................................................... 105

Chapter 4
4.1

4.2

4.3

4.4

4.5

AU8057_C000.indd vii

Multibit Tries ............................................................................ 107

Level Compression Trie ......................................................................................
4.1.1 Level Compression .................................................................................
4.1.2 Representation of LC-Tries ....................................................................
4.1.3 Building LC-Tries ...........................................................................................
4.1.4 Experiments .............................................................................................

4.1.5 Modified LC-Tries .................................................................................
Controlled Prefi x Expansion ..............................................................................
4.2.1 Prefi x Expansion ....................................................................................
4.2.2 Constructing Multibit Tries ..................................................................
4.2.3 Efficient Fixed-Stride Tries ............................................................................
4.2.4 Variable-Stride Tries ...............................................................................
Lulea Algorithms ...............................................................................................
4.3.1 Level 1 of the Data Structure ................................................................
4.3.2 Levels 2 and 3 of the Data Structure .....................................................
4.3.3 Growth Limitations in the Current Design ...........................................
4.3.4 Performance ..........................................................................................
Elevator Algorithm ............................................................................................
4.4.1 Elevator-Stairs Algorithm .......................................................................
4.4.2 log W-Elevators Algorithm .....................................................................
4.4.3 Experiments ..........................................................................................
Block Trees .........................................................................................................
4.5.1 Construction of Block Trees ...................................................................
4.5.2 Lookup ...................................................................................................
4.5.3 Updates ..................................................................................................
4.5.4 Stockpiling .............................................................................................

107
107
109
111
112
113
113
114
115

116
118
123
124
127
128
128
128
129
132
136
138
138
140
142
143

11/14/2007 6:15:33 PM


viii



Contents

4.5.5 Worst-Case Performance ........................................................................
4.5.6 Experiments ...........................................................................................
4.6 Multibit Tries in Hardware ................................................................................
4.6.1 Stanford Hardware Trie .........................................................................

4.6.2 Tree Bitmap ...........................................................................................
4.6.3 Tree Bitmap Optimizations ....................................................................
4.6.4 Hardware Reference Design ...................................................................
References ...................................................................................................................

Chapter 5

145
148
149
149
150
154
157
162

Pipelined Multibit Tries ............................................................ 165

5.1 Fast Incremental Updates for the Pipelined Fixed-Stride Tries ...........................
5.1.1 Pipelined Lookups Using Tries ...............................................................
5.1.2 Forwarding Engine Model and Assumption ...........................................
5.1.3 Routing Table and Route Update Characteristics ...................................
5.1.4 Constructing Pipelined Fixed-Stride Tries ..............................................
5.1.5 Reducing Write Bubbles .........................................................................
5.1.5.1 Separating Out Updates to Short Routes ..................................
5.1.5.2 Node Pullups ...........................................................................
5.1.5.3 Eliminating Excess Writes ........................................................
5.1.5.4 Caching Deleted SubTrees ........................................................
5.1.6 Summary and Discussion .......................................................................
5.2 Two-Phase Algorithm ........................................................................................

5.2.1 Problem Statements ................................................................................
5.2.2 Computing MMS(W − 1, k) ...................................................................
5.2.3 Computing T(W − 1, k) .........................................................................
5.2.4 Faster Two-Phase Algorithm for k = 2, 3 ................................................
5.2.5 Partitioning Scheme ...............................................................................
5.2.6 Experimental Results .............................................................................
5.3 Pipelined Variable-Stride Multibit Tries .............................................................
5.3.1 Construction of Optimal PVST .............................................................
5.3.2 Mapping onto a Pipeline Architecture ...................................................
5.3.3 Experimental Results .............................................................................
References ..................................................................................................................

165
165
167
169
170
177
177
178
180
181
184
185
186
186
190
192
194
195

198
199
200
202
204

Chapter 6 Efficient Data Structures for Bursty Access Patterns ................... 205
6.1

Table-Driven Schemes ........................................................................................
6.1.1 Table-Driven Models ..............................................................................
6.1.2 Dynamic Programming Algorithm ........................................................
6.1.3 Lagrange Approximation Algorithm ......................................................
6.2 Near-Optimal Scheme with Bounded Worst-Case Performance ........................
6.2.1 Definition ..............................................................................................
6.2.2 Algorithm MINDPQ ............................................................................
6.2.3 Depth-Constrained Weight Balanced Tree ............................................
6.2.4 Simulation .............................................................................................
6.3 Dynamic Biased Skip List ..................................................................................
6.3.1 Regular Skip List ...................................................................................
6.3.2 Biased Skip List .....................................................................................

AU8057_C000.indd viii

205
205
207
209
211
211

213
216
217
217
218
219

11/14/2007 6:15:33 PM




ix

6.3.2.1 Data Structure .........................................................................
6.3.2.2 Search Algorithm ...................................................................
6.3.3 Dynamic BSL ........................................................................................
6.3.3.1 Constructing Data Structure ...................................................
6.3.3.2 Dynamic Self-Adjustment ......................................................
6.3.3.3 Lazy Updating Scheme ...........................................................
6.3.3.4 Experimental Results ..............................................................
6.4 Collection of Trees for Bursty Access Patterns ..................................................
6.4.1 Prefi x and Range ...................................................................................
6.4.2 Collection of Red-Black Trees (CRBT) ................................................
6.4.3 Biased Skip Lists with Prefi x Trees (BSLPT) ........................................
6.4.4 Collection of Splay Trees ......................................................................
6.4.5 Experiments ..........................................................................................
References ..................................................................................................................

219

220
221
221
222
223
224
225
225
226
227
229
230
234

Contents

Chapter 7
7.1

Caching Technologies .............................................................. 237

Suez Lookup Algorithm ....................................................................................
7.1.1 Host Address Cache ..............................................................................
7.1.1.1 HAC Architecture ...................................................................
7.1.1.2 Network Address Routing Table .............................................
7.1.1.3 Simulations .............................................................................
7.1.2 Host Address Range Cache ...................................................................
7.1.3 Intelligent HARC ..................................................................................
7.1.3.1 Index Bit Selection ..................................................................
7.1.3.2 Comparisons between IHARC and HARC ............................

7.1.3.3 Selective Cache Invalidation ...................................................
7.2 Prefi x Caching Schemes ....................................................................................
7.2.1 Liu’s Scheme ..........................................................................................
7.2.1.1 Prefi x Cache ............................................................................
7.2.1.2 Prefi x Memory ........................................................................
7.2.1.3 Experiments ............................................................................
7.2.2 Reverse Routing Cache (RRC) .............................................................
7.2.2.1 RRC Structure .......................................................................
7.2.2.2 Handling Parent Prefi xes ........................................................
7.2.2.3 Updating RRC .......................................................................
7.2.2.4 Performance Evaluation ..........................................................
7.3 Multi-Zone Caches ...........................................................................................
7.3.1 Two-Zone Full Address Cache ..............................................................
7.3.2 Multi-Zone Pipelined Cache .................................................................
7.3.2.1 Architecture of MPC ..............................................................
7.3.2.2 Search in MPC .......................................................................
7.3.2.3 Outstanding Miss Buffer ........................................................
7.3.2.4 Lookup Table Transformation ................................................
7.3.2.5 Performance Evaluation ..........................................................
7.3.3 Design Method of Multi-Zone Cache ...................................................
7.3.3.1 Design Model .........................................................................
7.3.3.2 Two-Zone Design ...................................................................
7.3.3.3 Optimization Tableau .............................................................

AU8057_C000.indd ix

237
237
237
240

242
243
244
244
246
248
248
249
249
250
251
252
252
252
253
255
256
256
257
257
258
258
260
261
261
262
264
265

11/14/2007 6:15:33 PM



x



Contents

7.4

Cache-Oriented Multistage Structure ...............................................................
7.4.1 Bi-Directional Multistage Interconnection ............................................
7.4.2 COMS Operations ................................................................................
7.4.3 Cache Management ..............................................................................
7.4.4 Details of SEs ........................................................................................
7.4.5 Routing Table Partitioning ....................................................................
References ..................................................................................................................

Chapter 8

Hashing Schemes ..................................................................... 275

8.1

Binary Search on Hash Tables ..........................................................................
8.1.1 Linear Search of Hash Tables ................................................................
8.1.2 Binary Search of Hash Tables ...............................................................
8.1.3 Precomputation to Avoid Backtracking .................................................
8.1.4 Refinements to Basic Scheme ................................................................
8.1.4.1 Asymmetric Binary Search .....................................................

8.1.4.2 Mutating Binary Search .........................................................
8.1.5 Performance Evaluation ........................................................................
8.2 Parallel Hashing in Prefi x Length .....................................................................
8.2.1 Parallel Architecture .............................................................................
8.2.2 Simulation ............................................................................................
8.3 Multiple Hashing Schemes ...............................................................................
8.3.1 Multiple Hash Function .......................................................................
8.3.2 Multiple Hashing Using Cyclic Redundancy Code ..............................
8.3.3 Data Structure ......................................................................................
8.3.4 Searching Algorithms ...........................................................................
8.3.5 Update and Expansion to IPv6 .............................................................
8.3.6 Performance Comparison .....................................................................
8.4 Using Bloom Filter ............................................................................................
8.4.1 Standard Bloom Filter ...........................................................................
8.4.2 Counting Bloom Filter .........................................................................
8.4.3 Basic Configuration of LPM Using Bloom Filter ..................................
8.4.4 Optimization ........................................................................................
8.4.4.1 Asymmetric Bloom Filters ......................................................
8.4.4.2 Direct Lookup Array ..............................................................
8.4.4.3 Reducing the Number of Filters .............................................
8.4.5 Fast Hash Table Using Extended Bloom Filter ......................................
8.4.5.1 Basic Fast Hash Table ............................................................
8.4.5.2 Pruned Fast Hash Table ........................................................
8.4.5.3 Shared-Node Fast Hash Table ...............................................
References ..................................................................................................................

Chapter 9
9.1

AU8057_C000.indd x


266
267
267
269
270
271
272

275
275
276
277
278
278
281
286
287
287
288
290
290
292
294
295
295
297
297
297
299

299
301
302
304
305
307
307
309
312
314

TCAM-Based Forwarding Engine ........................................... 317

Content-Address Memory .................................................................................
9.1.1 Basic Architectural Elements .................................................................
9.1.2 Binary versus Ternary CAMs ................................................................
9.1.3 Longest-Prefi x Match Using TCAM .....................................................

317
317
319
320

11/14/2007 6:15:34 PM




xi


Efficient Updating on the Ordered TCAM .......................................................
9.2.1 Algorithm for the Prefi x-Length Ordering Constraint ..........................
9.2.2 Algorithm for the Chain-Ancestor Ordering Constraint (CAO_OPT) ......
9.2.3 Level-Partitioning Technology .............................................................
9.3 VLMP Technique to Eliminate Sorting .............................................................
9.3.1 VLMP Forwarding Engine Architecture ................................................
9.3.2 Search Algorithm ...................................................................................
9.3.2.1 First Stage ................................................................................
9.3.2.2 Second Stage ...........................................................................
9.3.3 Performance of VLMP Architecture ......................................................
9.4 Power-Efficient TCAM ......................................................................................
9.4.1 Pruned Search and Paged-TCAM ..........................................................
9.4.1.1 Pruned Search ..........................................................................
9.4.1.2 Paged TCAM ..........................................................................
9.4.2 Heuristic Partition Techniques ..............................................................
9.4.2.1 Bit-Selection Architecture ........................................................
9.4.2.2 Trie-Based Table Partitioning ..................................................
9.4.2.3 Experiments ............................................................................
9.4.2.4 Route Updating .......................................................................
9.4.3 Compaction Techniques ........................................................................
9.4.3.1 Mask Extension .......................................................................
9.4.3.2 Prefi x Aggregation and Expansion ...........................................
9.4.3.3 EaseCAM: A Two-Level Paged-TCAM Architecture ..............
9.4.4 Algorithms for Bursty Access Pattern .....................................................
9.4.4.1 Static Architecture ...................................................................
9.4.4.2 Dynamic Architecture .............................................................
9.4.4.3 Discussions ..............................................................................
9.5 A Distributed TCAM Architecture ....................................................................
9.5.1 Analysis of Routing Tables .....................................................................
9.5.2 Distributed Memory (TCAM) Organization .........................................

9.5.3 LBBTC Algorithm .................................................................................
9.5.3.1 Mathematical Model ................................................................
9.5.3.2 Adjusting Algorithm ................................................................
9.5.4 Analysis of the Power Efficiency .............................................................
9.5.5 Complete Implementation Architecture .................................................
9.5.5.1 Index Logic ..............................................................................
9.5.5.2 Priority Selector (Adaptive Load Balancing Logic) ...................
9.5.5.3 Ordering Logic ........................................................................
9.5.6 Performance Analysis .............................................................................
References ...................................................................................................................

321
321
322
322
325
325
327
327
327
327
328
329
329
330
331
331
334
340
341

343
343
346
347
350
350
352
355
356
356
358
358
359
361
362
364
364
365
366
366
369

Contents

9.2

Chapter 10 Routing-Table Partitioning Technologies .............................. 371
10.1 Prefi x and Interval Partitioning .........................................................................
10.1.1 Partitioned Binary Search Table .............................................................
10.1.1.1 Encoding Prefi xes as Ranges ..................................................

10.1.1.2 Recomputation .......................................................................

AU8057_C000.indd xi

371
371
372
373

11/14/2007 6:15:34 PM


xii



Contents

10.1.1.3 Insertion into a Modified Binary Search Table .......................
10.1.1.4 Multiway Binary Search: Exploiting the Cache Line ..............
10.1.1.5 Performance Measurements ....................................................
10.1.2 Multilevel and Interval Partitioning .......................................................
10.1.2.1 Multilevel Partitioning ...........................................................
10.1.2.2 Interval Partitioning .............................................................
10.1.2.3 Experimental Results ...........................................................
10.2 Port-Based Partitioning .....................................................................................
10.2.1 IFPLUT Algorithm ...............................................................................
10.2.1.1 Primary Lookup Table Transformation .................................
10.2.1.2 Partition Algorithm Based on Next Hops .............................
10.2.2 IFPLUT Architecture ............................................................................

10.2.2.1 Basic Architecture ................................................................
10.2.2.2 Imbalance Distribution of Prefi xes .......................................
10.2.2.3 Concept of Search Unit ........................................................
10.2.2.4 Memory Assignment Scheme ...............................................
10.2.2.5 Selector Block .......................................................................
10.2.2.6 IFPLUT Updates .................................................................
10.2.2.7 Implementation Using TCAM .............................................
10.2.2.8 Design Optimization ...........................................................
10.2.3 Experimental Results .............................................................................
10.3 ROT-Partitioning .............................................................................................
10.3.1 Concept of ROT-Partitioning ...............................................................
10.3.2 Generalization of ROT-Partition ...........................................................
10.3.3 Complexity Analysis ..............................................................................
10.3.4 Results of ROT-Partitioning .................................................................
10.3.4.1 Storage Sizes .........................................................................
10.3.4.2 Worst-Case Lookup Times ...................................................
10.4 Comb Extraction Scheme .................................................................................
10.4.1 Splitting Rule ........................................................................................
10.4.2 Comparison Set .....................................................................................
10.4.3 Implementation Using Binary Trie ........................................................
References ..................................................................................................................

375
376
378
379
380
383
385
388

388
388
391
393
393
393
394
395
395
397
398
399
400
401
401
402
404
405
405
406
407
408
412
413
414

Index ............................................................................................................ 415

AU8057_C000.indd xii


11/14/2007 6:15:34 PM


Preface
This book mainly targets high-speed packet networking. As Internet traffic grows exponentially,
there is a great need to build multi-terabit Internet protocol (IP) routers. The forwarding engine in
routers is the most important part of the high-speed router.
Packet forwarding technologies have been investigated and researched intensively for almost two
decades, but there are very few appropriate textbooks describing it. Many engineers and students have
to search for technical papers and read them in an ad-hoc manner. This book is the first that explains
packet forwarding concepts and implementation technologies in broad scope and great depth.
This book addresses the data structure, algorithms, and architectures to implement high-speed
routers. The basic concepts of packet forwarding are described and new technologies are discussed.
The book will be a practical guide to aid understanding of IP routers.
We have done our best to accurately describe packet forwarding technologies. If any errors are
found, please send an email to We will correct them in future editions.

Audience
This book can be used as a reference book for industry people whose job is related to IP networks
and router design. It is also intended to help engineers from network equipment and Internet
service providers to understand the key concepts of high-speed packet forwarding. Th is book will
also serve as a good text for senior and graduate students in electrical engineering, computer
engineering, and computer science. Using it, students will understand the technology trend in IP
networks so that they can better position themselves when they graduate and look for jobs in the
high-speed networking field.

Organization of the Book
The book is organized as follows:
Chapter 1 introduces the basic concept and functionalities of the IP router. It also discusses the
evolution of the IP router and the characteristics of its key components.

xiii

AU8057_C000.indd xiii

11/14/2007 6:15:34 PM


xiv



Preface

Chapter 2 explains the background of IP-address lookup by briefly describing the evolution of
the Internet addressing architecture, the characteristics of the routing table, and the complexity of
IP-address lookup. It discusses the design criteria and the performance requirements of high-speed
routers.
Chapter 3 introduces basic schemes, such as linear search, cache replacement algorithm, binary
trie, path-compressed trie, dynamic prefi x trie, and others. We describe the problems of the
algorithms proposed before 1996.
Chapter 4 discusses the multibit trie, in which the search operation requires simultaneous
inspection of several bits. We describe the principles involved in constructing an efficient multibit
trie and examine some schemes in detail.
Chapter 5 discusses the pipelined ASIC architecture that can produce significant savings in
cost, complexity, and space for the high-end router.
Chapter 6 discusses the dynamic data structure of the bursty access pattern. We examine the
designs of the data structure and show how to improve the throughput by turning it according to
lookup biases.
Chapter 7 introduces the advance caching techniques that speed up packet forwarding. We
discuss the impact of traffic locality, cache size, and the replacement algorithm on the miss ratio.

Chapter 8 discusses the improved hash schemes that can be used for Internet address lookups.
We examine the binary search of hash tables, parallel hashing, multiple hashing, and the use of
Bloom filter.
Chapter 9 discusses the forwarding engine based on TCAM. We examine route update
algorithms and power efficient schemes.
Chapter 10 discusses the partitioning techniques based on the properties of the forwarding
table.

AU8057_C000.indd xiv

11/14/2007 6:15:34 PM


Acknowledgments
This book could not have been published without the help of many people. We thank Pankaj
Gupta, Srinivasan Vankatachary, Sartaj Sahni, Geoff Huston, Isaac Keslassy, Mikael Degermark,
Will Eatherton, Haoyu Song, Marcel Waldvogel, Soraya Kasnavi, Vincent C. Gaudet, H. Jonathan
Chao, Vittorio Bilo, Michele Flammini, Ernst W. Biersack, Willibald Doeringer, Gunnar Karlsson,
Rama Sangireddy, Mikael Sundstrom, Anindya Basu, Girija Narlikar, Gene Cheung, Funda Ergun,
Tzi-cker Chiueh, Mehrdad Nourani, Nian-Feng Tzeng, Hyesook Lim, Andrei Broder, Michael
Mitzenmacher, Sarang Dharmapurika, Masayoshi Kobayashi, Samar Sharma, V.C. Ravikumar,
Rabi Mahapatra, Kai Zheng, B. Lampson, Haibin Lu, Yiqiang Q. Zhao, and others.
We would like to thank Jianxun Chen and Xiaolong Zhang (Wuhan University of Science and
Technology) for their support and encouragement. Weidong Wu wants to thank his wife and his
child for their love, support, patience, and perseverance.

xv

AU8057_C000.indd xv


11/14/2007 6:15:34 PM


AU8057_C000.indd xvi

11/14/2007 6:15:34 PM


About the Author
Weidong Wu received his PhD in electronics and information engineering from Huazhong
University of Science and Technology, China. In 2006, he joined Wuhan University of Science and
Technology. His research involves algorithms to improve Internet router performance, network
management, network security, and traffic engineering.

xvii

AU8057_C000.indd xvii

11/14/2007 6:15:34 PM


AU8057_C000.indd xviii

11/14/2007 6:15:34 PM


Chapter 1

Introduction


1.1 Introduction
The Internet comprises a mesh of routers interconnected by links, in which routers forward
packets to their destinations, and physical links transport packets from one router to another.
Because of the scalable and distributed nature of the Internet, there are more and more users
connected to it and more and more intensive applications over it. The great success of the
Internet thus leads to exponential increases in traffic volumes, stimulating an unprecedented
demand for the capacity of the core network. The trend of such exponential growth is not
expected to slow down, mainly because data-centric businesses and consumer networking
applications continue to drive global demand for broadband access solutions. This means that
packets have to be transmitted and forwarded at higher and higher rates. To keep pace with
Internet traffic growth, researchers are continually exploring transmission and forwarding
technologies.
Advances in fiber throughput and optical transmission technologies have enabled operators to
deploy capacity in a dramatic fashion. For example, dense wavelength division multiplexing
(DWDM) equipment can multiplex the signals of 300 channels of 11.6 Gbit/s to achieve a total
capacity of more than 3.3 Tbit/s on a single fiber and transmit them over 7000 km [1]. In the
future, DWDM networks will widely support 40 Gbit/s (OC-768) for each channel, and link
capacities are keeping pace with the demand for bandwidth.
Historically, network traffic doubled every year [2], and the speed of optical transmissions
(such as DWDM) every seven months [3]. However, the capacity of routers has doubled every
18 months [3], laging behind network traffic and the increasing speed of optical transmission.
Therefore, the router becomes the bottleneck of the Internet.
In the rest of this chapter, we briefly describe the router including the basic concept, its functionalities, architecture, and key components.

1

AU8057_C001.indd 1

11/13/2007 9:36:59 AM



2



Packet Forwarding Technologies

1.2 Concept of Routers
The Internet can be described as a collection of networks interconnected by routers using a set
of communications standards known as the Transmission Control Protocol/Internet Protocol
(TCP/IP) suite. TCP/IP is a layered model with logical levels: the application layer, the transport
layer, the network layer, and the data link layer. Each layer provides a set of services that can be
used by the layer above [4]. The network layer provides the services needed for Internetworking,
that is, the transfer of data from one network to another. Routers operate at the network layer, and
are sometimes called IP routers.
Routers knit together the constituent networks of the global Internet, creating the illusion of a
unified whole. In the Internet, a router generally connects with a set of input links through which a
packet can come in and a set of output links through which a packet can be sent out. Each packet
contains a destination IP address; the packet has to follow a path through the Internet to its destination.
Once a router receives a packet at an input link, it must determine the appropriate output link by
looking at the destination address of the packet. The packet is transferred router by router so that
it eventually ends up at its destination. Therefore, the primary functionality of the router is to
transfer packets from a set of input links to a set of output links. This is true for most of the packets,
but there are also packets received at the router that require special treatment by the router itself.

1.3 Basic Functionalities of Routers
Generally, routers consist of the following basic components: several network interfaces to the
attached networks, processing module(s), buffering module(s), and an internal interconnection unit
(or switch fabric). Typically, packets are received at an inbound network interface, processed by the
processing module and, possibly, stored in the buffering module. Then, they are forwarded through

the internal interconnection unit to the outbound interface that transmits them to the next hop on
their journey to the final destination. The aggregate packet rate of all attached network interfaces
needs to be processed, buffered, and relayed. Therefore, the processing and memory modules may be
replicated either fully or partially on the network interfaces to allow for concurrent operations.
A generic architecture of an IP router is given in Figure 1.1. Figure 1.1a shows the basic architecture of a typical router: the controller card [which holds the central processing unit (CPU)], the
router backplane, and interface cards. The CPU in the router typically performs such functions as
path computations, routing table maintenance, and reachability propagation. It runs whichever
routing protocols are needed in the router. The interface cards consist of adapters that perform
inbound and outbound packet forwarding (and may even cache routing table entries or have extensive packet processing capabilities). The router backplane is responsible for transferring packets
between the cards. The basic functionalities in an IP router can be categorized as: route processing,
packet forwarding, and router special services. The two key functionalities are route processing (i.e.,
path computation, routing table maintenance, and reachability propagation) and packet forwarding, shown in Figure 1.1b. We discuss the three functionalities in more detail subsequently.

1.3.1 Route Processing
Routing protocols are the means by which routers gain information about the network. Routing
protocols map network topology and store their view of that topology in the routing table. Thus,
route processing includes routing table construction and maintenance using routing protocols,

AU8057_C001.indd 2

11/13/2007 9:36:59 AM


Introduction

Neighbor
Nodes
Controller Card
Routing
Control


Forwarding
Interface Card

(a) Basic architecture

3

Neighbor
Topology &
Nodes
Address
Exchange
Route Computations
& Updates
Routing Table

Routing
Table
Router
Backplane



Destination Address
Lookup
Incoming Data
Packets

Packet

Forwarding

Outgoing Data
Packets

Router
(b) Routing components

Figure 1.1 Generic architecture of a router. (From Aweya, J., Journal of Systems Architecture,
46, 6, 2000. With permission.)

such as the Routing Information Protocol (RIP) and Open Shortest Path First (OSPF) [5–7]. The
routing table consists of routing entries that specify the destination and the next-hop router
through which the packets should be forwarded to reach the destination. Route calculation consists
of determining a route to the destination: network, subnet, network prefi x, or host.
In static routing, the routing table entries are created by default when an interface is configured (for directly connected interfaces), added by, for example, the route command (normally
from a system bootstrap file), or created by an Internet Control Message Protocol (ICMP) redirect
(usually when the wrong default is used) [8]. Once configured, the network paths will not change.
With static routing, a router may issue an alarm when it recognizes that a link has gone down, but
will not automatically reconfigure the routing table to reroute the traffic around the disabled link.
Static routing, used in LANs over limited distances, requires basically the network manager to
configure the routing table. Thus, static routing is fine if the network is small, there is a single
connection point to other networks, and there are no redundant routes (where a backup route can
be used if a primary route fails). Dynamic routing is normally used if any of these three conditions
do not hold true.
Dynamic routing, used in Internetworking across wide area networks, automatically reconfigures
the routing table and recalculates the least expensive path. In this case, routers broadcast advertisement packets (signifying their presence) to all network nodes and communicate with other routers
about their network connections, the cost of connections, and their load levels. Convergence, or
reconfiguration of the routing tables, must occur quickly, before routers with incorrect information
misroute data packets into dead ends. Some dynamic routers can also rebalance the traffic load.

The use of dynamic routing does not change the way an IP forwarding engine performs routing
at the IP layer. What changes is the information placed in the routing table—instead of coming
from the route commands in bootstrap files, the routes are added and deleted dynamically by a
routing protocol, as routes change over time. The routing protocol adds a routing policy to the
system, choosing which routes to place in the routing table. If the protocol finds multiple routes to
a destination, the protocol chooses which route is the best, and which one to insert in the table.

AU8057_C001.indd 3

11/13/2007 9:36:59 AM


4



Packet Forwarding Technologies

If the protocol finds that a link has gone down, it can delete the affected routes or add alternate
routes that bypass the problem.
A network (including several networks administered as a whole) can be defined as an autonomous system. A network owned by a corporation, an Internet Service Provider (ISP), or a university
campus often defines an autonomous system. There are two principal routing protocol types: those
that operate within an autonomous system, or the Interior Gateway Protocols (IGPs), and those that
operate between autonomous systems, or Exterior Gateway Protocols (EGPs). Within an autonomous system, any protocol may be used for route discovery, propagating, and validating routes. Each
autonomous system can be independently administered and must make routing information
available to other autonomous systems. The major IGPs include RIP, OSPF, and Intermediate System
to Intermediate System (IS–IS). Some EGPs include EGP and Border Gateway Protocol (BGP).

1.3.2 Packet Forwarding
In this section, we briefly review the forwarding process in IPv4 routers. More details of the forwarding requirements are given in Ref. [9]. A router receives an IP packet on one of its interfaces

and then forwards the packet out of another of its interfaces (or possibly more than one, if the
packet is a multicast packet), based on the contents of the IP header. As the packet is forwarded
hop by hop, the packet’s (original) network layer header (IP header) remains relatively unchanged,
containing the complete set of instructions on how to forward the packet (IP tunneling may call
for prepending the packet with other IP headers in the network). However, the data-link headers
and physical-transmission schemes may change radically at each hop to match the changing
media types.
Suppose that the router receives a packet from one of its attached network segments, the router
verifies the contents of the IP header by checking the protocol version, header length, packet
length, and header checksum fields. The protocol version must be equal to 4 for IPv4, for which
the header length must be greater than or equal to the minimum IP header size (20 bytes). The
length of the IP packet, expressed in bytes, must also be larger than the minimum header size. In
addition, the router checks that the entire packet has been received by checking the IP packet
length against the size of the received Ethernet packet, for example, in the case where the interface
is attached to an Ethernet network. To verify that none of the fields of the header have been corrupted, the 16-bit ones-complement checksum of the entire IP header is calculated and verified to
be equal to 0×ffff. If any of these basic checks fail, the packet is deemed to be malformed and is discarded without sending an error indication back to the packet’s originator.
Next, the router verifies that the time-to-live (TTL) field is greater than 1. The purpose of the
TTL field is to make sure that packets do not circulate forever when there are routing loops. The
host sets the packet’s TTL field to be greater than or equal to the maximum number of router hops
expected on the way to the destination. Each router decrements the TTL field by 1 when forwarding; when the TTL field is decremented to 0, the packet is discarded, and an ICMP TTL exceeded
message is sent back to the host. On decrementing the TTL, the router must update the packet’s
header checksum. RFC1624 [10] contains implementation techniques for computing the IP
checksum. Because a router often changes only the TTL field (decrementing it by 1), it can incrementally update the checksum when it forwards a received packet, instead of calculating the
checksum over the entire IP header again.
The router then looks at the destination IP address. The address indicates a single destination
host (unicast), a group of destination hosts (multicast), or all hosts on a given network segment

AU8057_C001.indd 4

11/13/2007 9:37:00 AM



Introduction



5

(broadcast). Unicast packets are discarded if they were received as data-link broadcasts or as multicasts; otherwise, multiple routers may attempt to forward the packet, possibly contributing to a
broadcast storm. In packet forwarding, the destination IP address is used as a key for the routing
table lookup. The best-matching routing table entry is returned, indicating whether to forward
the packet and, if so, the interface to forward the packet out of and the IP address of the next IP
router (if any) in the packet’s path. The next-hop IP address is used at the output interface to
determine the link address of the packet, in case the link is shared by multiple parties [such as an
Ethernet, Token Ring, or Fiber Distributed Data Interface (FDDI) network], and is consequently
not needed if the output connects to a point-to-point link.
In addition to making forwarding decisions, the forwarding process is responsible for making
packet classifications for quality of service (QoS) control and access filtering. Flows can be identified based on source IP address, destination IP address, TCP/UDP port numbers as well as IP type
of service (TOS) field. Classification can even be based on higher layer packet attributes.
If the packet is too large to be sent out of the outgoing interface in one piece [i.e., the packet
length is greater than the outgoing interface’s Maximum Transmission Unit (MTU)], the router
attempts to split the packet into smaller fragments. Fragmentation, however, can affect performance
adversely [11]. The host may instead wish to prevent fragmentation by setting the Don’t Fragment
(DF) bit in the fragmentation field. In this case, the router does not fragment the packet, but instead
drops it and sends an ICMP Destination Unreachable (subtype fragmentation needed and DF set)
message back to the host. The host uses this message to calculate the minimum MTU along the
packet’s path [12], which in turn is used to size future packets.
The router then prepends the appropriate data-link header for the outgoing interface. The IP
address of the next hop is converted to a data-link address, usually using the Address Resolution
Protocol (ARP) [13] or a variant of ARP, such as Inverse ARP [14] for Frame Relay subnets. The

router then sends the packet to the next hop, where the process is repeated.
An application can also modify the handling of its packets by extending the IP headers of its
packets with one or more IP options. IP options are used infrequently for regular data packets,
because most Internet routers are heavily optimized for forwarding packets having no options.
Most IP options (such as the record-route and timestamp options) are used to aid in statistics collection, but do not affect a packet’s path. However, the strict-source route and the loose-source
route options can be used by an application to control the path its packets take. The strict-source
route option is used to specify the exact path that the packet will take, router by router. The utility
of a strict-source route is limited by the maximum size of the IP header (60 bytes), which limits to 9
the number of hops specified by the strict-source route option. The loose-source route is used to
specify a set of intermediate routers (again, up to 9) through which the packet must go on the way
to its destination. Loose-source routing is used mainly for diagnostic purposes, for instance, as an
aid to debugging Internet routing problems.

1.3.3

Router Special Services

Besides dynamically finding the paths for packets to take toward their destinations, routers also
implement other functions. Anything beyond core routing functions falls into this category, for
example, authentication and access services, such as packet filtering for security/firewall purposes.
Companies often put a router between their company network and the Internet and then configure
the router to prevent unauthorized access to the company’s resources from the Internet. This
configuration may consist of certain patterns (e.g., source and destination address and TCP port)

AU8057_C001.indd 5

11/13/2007 9:37:00 AM


6




Packet Forwarding Technologies

whose matching packets should not be forwarded or of more complex rules to deal with protocols
that vary their port numbers over time, such as the File Transfer Protocol (FTP). Such routers are
called firewalls. Similarly, ISPs often configure their routers to verify the source address in all packets
received from the ISP’s customers. This foils certain security attacks and makes other attacks easier
to trace back to their source. Similarly, ISPs providing dial-in access to their routers typically use
Remote Authentication Dial-In User Service (RADIUS) [15] to verify the identity of the person
dialing in.
Often, other functions less directly related to packet forwarding also get incorporated into IP
routers. Examples of these nonforwarding functions include network management components,
such as Simple Network Management Protocol (SNMP) and Management Information Bases
(MIBs). Routers also play an important role in TCP/IP congestion control algorithms. When an IP
network is congested, routers cannot forward all the packets they receive. By simply discarding
some of their received packets, routers provide feedback to TCP congestion control algorithms,
such as the TCP slow-start algorithm [16,17]. Early Internet routers simply discarded excess packets instead of queuing them onto already full transmit queues; these routers are termed drop-tail
gateways. However, this discard behavior was found to be unfair, favoring applications that send
larger and more bursty data streams. Modern Internet routers employ more sophisticated, and
fairer, drop algorithms, such as Random Early Detection (RED) [18].
Algorithms also have been developed that allow routers to organize their transmit queues
so as to give resource guarantees to certain classes of traffic or to specific applications. These
queuing or link scheduling algorithms include Weighted Fair Queuing (WFQ) [19] and Class
Based Queuing (CBQ) [20]. A protocol called Resource Reservation Protocol (RSVP) [21] has
been developed that allows hosts to dynamically signal to routers which applications should
get special queuing treatment. However, RSVP has not yet been deployed, with some people
arguing that queuing preference could more simply be indicated by using the TOS bits in the
IP header [22,23].

Some vendors allow collection of traffic statistics on their routers: for example, how many packets and bytes are forwarded per receiving and transmitting interface on the router. These statistics
are used for future capacity planning. They can also be used by ISPs to implement usage-based
charging schemes for their customers.
Therefore, IP routers’ functions can be classified into two types: datapath functions and
control functions. Datapath functions are performed on every packet that passes through the
router. These include forwarding decisions, switching through the backplane, and output link
scheduling. These are most often implemented in special purpose hardware, called a forwarding
engine.
Control functions include system configuration, management, and exchange of routing table
information with neighboring routers. These are performed relatively infrequently. The route
controller exchanges topology information with other routers and constructs a routing table based
on a routing protocol (e.g., RIP, OSPF, and BGP). It can also create a forwarding table for the
forwarding engine. Control functions are not processed for each arriving packet, because speed is
not critical, they are implemented in software.
Therefore, the state of a router is maintained by the control function, the per-packet
performance of a router is determined by its datapath functions. In this book, we will focus only
on datapath functions (forwarding engine) and will not cover control functions, such as system
configuration, management, routing mechanisms, and routing protocol. For further information
on routing protocols see Refs. [24–27].

AU8057_C001.indd 6

11/13/2007 9:37:00 AM


×