delphi - the tomes of delphi - algorithms and data structures

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.16 MB, 545 trang )

TEAMFLY

Team-Fly
®

The Tomes of Delphi
™
Algorithms and Data
Structures

Julian Bucknall
Wordware Publishing, Inc.
Library of Congress Cataloging-in-Publication Data
Bucknall, Julian
Tomes of Delphi: algorithms and data structures / by Julian Bucknall.
p. cm.
Includes bibliographical references and index.
ISBN 1-55622-736-1 (pbk. : alk. paper)
1. Computer software—Development. 2. Delphi (Computer file). 3. Computer
algorithms. 4. Data structures (Computer science) I. Title.
QA76.76.D47 .B825 2001 2001033258
005.1 dc21 CIP
© 2001, Wordware Publishing, Inc.
Code © 2001, Julian Bucknall
All Rights Reserved
2320 Los Rios Boulevard
Plano, Texas 75074
No part of this book may be reproduced in any form or by
any means without permission in writing from
Wordware Publishing, Inc.
Printed in the United States of America
ISBN 1-55622-736-1
10987654321
0105
Delphi is a trademark of Inprise Corporation.
Other product names mentioned are used for identification purposes only and may be trademarks of their respective companies.
All inquiries for volume purchases of this book should be addressed to Wordware Publishing, Inc., at the
above address. Telephone inquiries may be made by calling:
(972) 423-0090
For Donna and the Greek cats

iii

Contents
Introduction x
Chapter 1 What is an Algorithm? 1
What is an Algorithm? 1
Analysis of Algorithms 3
The Big-Oh Notation 6
Best, Average, and Worst Cases 8
Algorithms and the Platform 8
Virtual Memory and Paging 9
Thrashing 10
Locality of Reference 11
The CPU Cache 12
Data Alignment 12
Space Versus Time Tradeoffs 14
Long Strings 16
Use const 17
Be Wary of Automatic Conversions 17
Debugging and Testing 18
Assertions 19
Comments 22
Logging 22
Tracing 22
Coverage Analysis 23
Unit Testing 23
Debugging 25
Summary 26
Chapter 2 Arrays 27
Arrays 27

Array Types in Delphi 28
Standard Arrays 28
Dynamic Arrays 32
New-style Dynamic Arrays 40
TList Class, an Array of Pointers 41
Overview of the TList Class 41
TtdObjectList Class 43
v
Arrays on Disk 49
Summary 62
Chapter 3 Linked Lists, Stacks, and Queues 63
Singly Linked Lists 63
Linked List Nodes 65
Creating a Singly Linked List 65
Inserting into and Deleting from a Singly Linked List 65
Traversing a Linked List 68
Efficiency Considerations 69
Using a Head Node 69
Using a Node Manager 70
The Singly Linked List Class 76
Doubly Linked Lists 84
Inserting and Deleting from a Doubly Linked List 85
Efficiency Considerations 88
Using Head and Tail Nodes 88
Using a Node Manager 88
The Doubly Linked List Class 88
Benefits and Drawbacks of Linked Lists 96
Stacks 97
Stacks Using Linked Lists 97
Stacks Using Arrays 100

Example of Using a Stack 103
Queues 105
Queues Using Linked Lists 106
Queues Using Arrays 109
Summary 113
Chapter 4 Searching 115
Compare Routines 115
Sequential Search 118
Arrays 118
Linked Lists 122
Binary Search 124
Arrays 124
Linked Lists 126
Inserting into Sorted Containers 129
Summary 131
Chapter 5 Sorting 133
Sorting Algorithms 133
Shuffling a TList 136
Sort Basics 138
Slowest Sorts 138
Bubble Sort 138
vi
Contents
Shaker Sort 140
Selection Sort 142
Insertion Sort 144
Fast Sorts 147
Shell Sort 147
Comb Sort 150
Fastest Sorts 152

Merge Sort 152
Quicksort 161
Merge Sort with Linked Lists 176
Summary 181
Chapter 6 Randomized Algorithms 183
Random Number Generation 184
Chi-Squared Tests 185
Middle-Square Method 188
Linear Congruential Method 189
Testing 194
The Uniformity Test 195
TheGapTest 195
The Poker Test 197
The Coupon Collector’s Test 198
Results of Applying Tests 200
Combining Generators 201
Additive Generators 203
Shuffling Generators 205
Summary of Generator Algorithms 207
Other Random Number Distributions 208
Skip Lists 210
Searching through a Skip List 211
Insertion into a Skip List 215
Deletion from a Skip List 218
Full Skip List Class Implementation 219
Summary 225
Chapter 7 Hashing and Hash Tables 227
Hash Functions 228
Simple Hash Function for Strings 230
The PJW Hash Functions 230

Collision Resolution with Linear Probing 232
Advantages and Disadvantages of Linear Probing 233
Deleting Items from a Linear Probe Hash Table 235
The Linear Probe Hash Table Class 237
Other Open-Addressing Schemes 245
Quadratic Probing 246
vii
Contents
Pseudorandom Probing 246
Double Hashing 247
Collision Resolution through Chaining 247
Advantages and Disadvantages of Chaining 248
The Chained Hash Table Class 249
Collision Resolution through Bucketing 259
Hash Tables on Disk 260
Extendible Hashing 261
Summary 276
Chapter 8 Binary Trees 277
Creating a Binary Tree 279
Insertion and Deletion with a Binary Tree 279
Navigating through a Binary Tree 281
Pre-order, In-order, and Post-order Traversals 282
Level-order Traversals 288
Class Implementation of a Binary Tree 289
Binary Search Trees 295
Insertion with a Binary Search Tree 298
Deletion from a Binary Search Tree 300
Class Implementation of a Binary Search Tree 303
Binary Search Tree Rearrangements 304
Splay Trees 308

Class Implementation of a Splay Tree 309
Red-Black Trees 312
Insertion into a Red-Black Tree 314
Deletion from a Red-Black Tree 319
Summary 329
Chapter 9 Priority Queues and Heapsort 331
The Priority Queue 331
First Simple Implementation 332
Second Simple Implementation 335
The Heap 337
Insertion into a Heap 338
Deletion from a Heap 338
Implementation of a Priority Queue with a Heap 340
Heapsort 345
Floyd’s Algorithm 345
Completing Heapsort 346
Extending the Priority Queue 348
Re-establishing the Heap Property 349
Finding an Arbitrary Item in the Heap 350
Implementation of the Extended Priority Queue 350
Summary 356
viii
Contents
Chapter 10 State Machines and Regular Expressions 357
State Machines 357
Using State Machines: Parsing 357
Parsing Comma-Delimited Files 363
Deterministic and Non-deterministic State Machines 366
Regular Expressions 378
Using Regular Expressions 380

Parsing Regular Expressions 380
Compiling Regular Expressions 387
Matching Strings to Regular Expressions 399
Summary 407
Chapter 11 Data Compression 409
Representations of Data 409
Data Compression 410
Types of Compression 410
Bit Streams 411
Minimum Redundancy Compression 415
Shannon-Fano Encoding 416
Huffman Encoding 421
Splay Tree Encoding 435
Dictionary Compression 445
LZ77 Compression Description 445
Encoding Literals Versus Distance/Length Pairs 448
LZ77 Decompression 449
LZ77 Compression 456
Summary 467
Chapter 12 Advanced Topics 469
Readers-Writers Algorithm 469
Producers-Consumers Algorithm 478
Single Producer, Single Consumer Model 478
Single Producer, Multiple Consumer Model 486
Finding Differences between Two Files 496
Calculating the LCS of Two Strings 497
Calculating the LCS of Two Text Files 511
Summary 514
Epilogue 515
References 516

Index 518
Contents
ix
Introduction
You’ve just picked this book up in the bookshop, or you’ve bought it, taken it
home and opened it, and now you’re wondering…
Why a Book on Delphi Algorithms?
Although there are numerous books on algorithms in the bookstores, few of
them go beyond the standard Computer Science 101 course to approach algo
-
rithms from a practical perspective. The code that is shown in the book is to
illustrate the algorithm in question, and generally no consideration is given to
real-life, drop-in-and-use application of the technique being discussed. Even
worse, from the viewpoint of the commercial programmer, many are text-
books to be used in a college or university course and hence some of the more
interesting topics are left as exercises for the reader, with little or no answers.
Of course, the vast majority of them don’t use Delphi, Kylix, or Pascal. Some
use pseudocode, some C, some C++, some the language du jour; and the
most celebrated and referenced algorithms book uses an assembly language
that doesn’t even exist (the MIX assembly language in The Art of Computer
Programming [11,12,13]—see the references section). Indeed, those books
that do have the word “practical” in their titles are for C, C++, or Java. Is
that such a problem? After all, an algorithm is an algorithm is an algorithm;
surely, it doesn’t matter how it’s demonstrated, right? Why bother buying and
reading one based on Delphi?
Delphi is, I contend, unique amongst the languages and environments used in
application development today. Firstly, like Visual Basic, Delphi is an environ
-
ment for developing applications rapidly, for either 16-bit or 32-bit Windows,
or, using Kylix, for Linux. With dexterous use of the mouse, components rain

on forms like rice at a wedding. Many double-clicks later, together with a lit
-
tle typing of code, the components are wedded together, intricately and
intimately, with event handlers, hopefully producing a halfway decent-looking
application.
Secondly, like C++, Delphi can get close to the metal, easily accessing the
various operating system APIs. Sometimes, Borland produces units to access
APIs and sells them with Delphi itself; sometimes, programmers have to pore
x
TEAMFLY

Team-Fly
®

over C header files in an effort to translate them into Delphi (witness the Jedi
project at
). In either case, Delphi can do the job
and manipulate the OS subsystems to its own advantage.
Delphi programmers do tend to split themselves into two camps: applications
programmers and systems programmers. Sometimes you’ll find programmers
who can do both jobs. The link between the two camps that both sets of pro
-
grammers must come into contact with and be aware of is the world of
algorithms. If you program for any length of time, you’ll come to the point
where you absolutely need to code a binary search. Of course, before you
reach that point, you’ll need a sort routine to get the data in some kind of
order for the binary search to work properly. Eventually, you might start using
a profiler, identify a problem bottleneck in TStringList, and wonder what
other data structure could do the job more efficiently.
Algorithms are the lifeblood of the work we do as programmers. Beginner
programmers are often afraid of formal algorithms; I mean, until you are
used to it, even the word itself can seem hard to spell! But consider this: a
program can be defined as an algorithm for getting information out of the
user and producing some kind of output for her.
The standard algorithms have been developed and refined by computer scien-
tists for use in the programming trenches by the likes of you and me.
Mastering the basic algorithms gives you a handle on your craft and on the
language you use. For example, if you know about hash tables, their strengths

and weaknesses, what they are used for and why, and have an implementa-
tion you could use at a moment’s notice, then you will look at the design of
the subsystem or application you’re currently working on in a new light, and
identify places where you could profitably use one. If sorts hold no terrors for
you, you understand how they work, and you know when to use a selection
sort versus a quicksort, then you’ll be more likely to code one in your applica
-
tion, rather than try and twist a standard Delphi component to your needs
(for example, a modern horror story: I remember hearing about someone
who used a hidden TListBox component, adding a bunch of strings, and then
setting the Sorted property to true to get them in order).
“OK,” I hear you say, “writing about algorithms is fine, but why bother with
Delphi or Kylix?”
By the way, let’s set a convention early on; otherwise I shall be writing the
phrase “Delphi or Kylix” an awful lot. When I say “Delphi,” I really mean
either Delphi or Kylix. Kylix was, after all, known for much of its pre-release
life as “Delphi” for Linux. In this book, then, “Delphi” means either Delphi for
Windows or Kylix for Linux.
Introduction
xi
So, why Delphi? Well, two reasons: the Object Pascal language and the oper
-
ating system. Delphi’s language has several constructs that are not available
in other languages, constructs that make encapsulating efficient algorithms
and data structures easier and more natural. Things like properties, for exam
-
ple. Exceptions for when unforeseen errors occur. Although it is perfectly
possible to code standard algorithms in Delphi without using these Delphi-
specific language constructs, it is my contention that we miss out on the
beauty and efficiency of the language if we do. We miss out on the ability to

learn about the ins and outs of the language. In this book, we shall deliber
-
ately be using the breadth of the Object Pascal language in Delphi—I’m not
concerned that Java programmers who pick up this book may have difficulty
translating the code. The cover says Delphi, and Delphi it will be.
And the next thing to consider is that algorithms, as traditionally taught, are
generic, at least as far as CPUs and operating systems are concerned. They
can certainly be optimized for the Windows environment, or souped up for
Linux. They can be made more efficient for the various varieties of Pentium
processor we use, with the different types of memory caches we have, with
the virtual memory subsystem in the OS, and so on. This book pays particular
attention to these efficiency gains. We won’t, however, go as far as coding
everything in Assembly language, optimized for the pipelined architecture of
modern processors—I have to draw the line somewhere!
So, all in all, the Delphi community does have need for an algorithms book,
and one geared for their particular language, operating system, and proces-
sor. This is such a book. It was not translated from another book for another
language; it was written from scratch by an author who works with Delphi
every day of his life, someone who writes library software for a living and
knows about the intricacies of developing commercial ready-to-run routines,
classes, and tools.
What Should I Know?
This book does not attempt to teach you Delphi programming. You will need
to know the basics of programming in Delphi: creating new projects, how to
write code, compiling, debugging, and so on. I warn you now: there are no
components in this book. You must be familiar with classes, procedure and
method references, untyped pointers, the ubiquitous TList, and streams as
encapsulated by Delphi’s TStream family. You must have some understanding
of object-oriented concepts such as encapsulation, inheritance, polymor
-

phism, and delegation. The object model in Delphi shouldn’t scare you!
Having said that, a lot of the concepts described in this book are simple in the
extreme. A beginner programmer should find much in the book to teach him
xii
Introduction
or her the basics of standard algorithms and data structures. Indeed, looking
at the code should teach such a programmer many tips and tricks of the
advanced programmer. The more advanced structures can be left for a rainy
day, or when you think you might need them.
So, essentially, you need to have been programming in Delphi for a while.
Every now and then you need some kind of data structure beyond what TList
and its family can give you, but you’re not sure what’s available, or even how
to use it if you found one. Or, you want a simple sort routine, but the only
reference book you can find has code written in C++, and to be honest you’d
rather watch paint dry than translate it. Or, you want to read an algorithms
book where performance and efficiency are just as prominent as the descrip
-
tion of the algorithm. This book is for you.
Which Delphi Do I Need?
Are you ready for this? Any version. With the exception of the section discuss-
ing dynamic arrays using Delphi 4 or above and Kylix in Chapter 2, and parts
of Chapter 12, and little pieces here and there, the code will compile and run
with any version of Delphi. Apart from the small amount of the version-
specific code I have just mentioned, I have tested all code in this book with all
versions of Delphi and with Kylix.
You can therefore assume that all code printed in this book will work with
every version of Delphi. Some code listings are version-specific though, and
have been so noted.
What Will I Find, and Where?
This book is divided into 12 chapters and a reference section.

Chapter 1 lays out some ground rules. It starts off by discussing performance.
We’ll look at measurement of the efficiency of algorithms, starting out with
the big-Oh notation, continuing with timing of the actual run time of algo
-
rithms, and finishing with the use of profilers. We shall discuss data
representation efficiency in regard to modern processors and operating sys
-
tems, especially memory caches, paging, and virtual memory. After that, the
chapter will talk about testing and debugging, topics that tend to be glossed
over in many books, but that are, in fact, essential to all programmers.
Chapter 2 covers arrays. We’ll look at the standard language support for
arrays, including dynamic arrays; we’ll discuss the TList class; and we’ll cre
-
ate a class that encapsulates an array of records. Another specialized array is
the string, so we’ll take a look at that too.
xiii
Introduction
Chapter 3 introduces linked lists, both the singly and doubly linked varieties.
We’ll see how to create stacks and queues by implementing them with both
singly linked lists and arrays.
Chapter 4 talks about searching algorithms, especially the sequential and the
binary search algorithms. We’ll see how binary search helps us to insert items
into a sorted array or linked list.
Chapter 5 covers sorting algorithms. We will look at various types of sorting
methods: bubble, shaker, selection, insertion, Shell sort, quicksort, and merge
sort. We’ll also sort arrays and linked lists.
Chapter 6 discusses algorithms that create or require random numbers. We’ll
see pseudorandom number generators (PRNGs) and show a remarkable
sorted data structure called a skip list, which uses a PRNG in order to help
balance the structure.

Chapter 7 considers hashing and hash tables, why they’re used, and what
benefits and drawbacks they have. Several standard hashing algorithms are
introduced. One problem that occurs with hash tables is collisions; we shall
see how to resolve this by using a couple of types of probing and also by
chaining.
Chapter 8 presents binary trees, a very important data structure in wide gen-
eral use. We’ll look at how to build and maintain a binary tree and how to
traverse the nodes in the tree. We’ll also address its unbalanced trees created
by inserting data in sorted order. A couple of balancing algorithms will be
shown: splay trees and red-black trees.
Chapter 9 deals with priority queues and, in doing so, shows us the heap
structure. We’ll consider the important heap operations, bubble up and trickle
down, and look at how the heap structure gives us a sort algorithm for free:
the heapsort.
Chapter 10 provides information about state machines and how they can be
used to solve a certain class of problems. After some introductory examples
with finite deterministic state machines, the chapter considers regular expres
-
sions, how to parse them and compile them to a finite non-deterministic state
machine, and then apply the state machine to accept or reject strings.
Chapter 11 squeezes in some data compression techniques. Algorithms such
as Shannon-Fano, Huffman, Splay, and LZ77 will be shown.
Chapter 12 includes a variety of advanced topics that may whet your appetite
for researching algorithms and structures. Of course, they still will be useful
to your programming requirements.
xiv
Introduction
Finally, there is a reference section listing references to help you find out
more about the algorithms described in this book; these references not only
include other algorithms books but also academic papers and articles.

What Are the Typographical Conventions?
Normal text is written in this font, at this size. Normal text is used for discus
-
sions, descriptions, and diversions.
Code listings are written in this font, at this size.
Emphasized words or phrases, new words about to be defined, and variables
will appear in italic.
Dotted throughout the text are World Wide Web URLs and e-mail addresses
which are italicized and underlined, like this: />.
Every now and then there will be a note like this. It’s designed to bring out
some important point in the narrative, a warning, or a caution.
What Are These Bizarre $IFDEFs in the Code?
The code for this book has been written, with certain noted exceptions, to
compile with Delphi 1, 2, 3, 4, 5, and 6, as well as with Kylix 1. (Later com-
pilers will be supported as and when they come out; please see
/>for the latest information.) Even with my best
efforts, there are sometimes going to be differences in my code between the
different versions of Delphi and Kylix.
The answer is, of course, to $IFDEF the code, to have certain blocks compile
with certain compilers but not others. Borland supplied us with the official
WINDOWS, WIN32, and LINUX compiler defines for the platform, and the
VERnnn compiler defined for the compiler version.
To solve this problem, every source file for this book has an include at the
top:
{$I TDDefine.inc}
This include file defines human-legible compiler defines for the various com
-
pilers. Here’s the list:
DelphiN define for a particular Delphi version, N = 1,2,3,4,5,6
DelphiNPlus define for a particular Delphi version or later, N = 1,2,3,4,5,6

KylixN define for a particular Kylix version,N=1
KylixNPlus define for a particular Kylix version or later,N=1
HasAssert define if compiler supports Assert
Introduction
xv
I also make the assumption that every compiler except Delphi 1 has support
for long strings.
What about Bugs?
This book is a book of human endeavor, written, checked, and edited by
human beings. To quote Alexander Pope in An Essay on Criticism, “To err is
human, to forgive, divine.” This book will contain misstatements of facts,
grammatical errors, spelling mistakes, bugs, whatever, no matter how hard I
try going over it with Fowler’s Modern English Usage, a magnifying glass, and
a fine-toothed comb. For a technical book like this, which presents hard facts
permanently printed on paper, this could be unforgivable.
Hence, I shall be maintaining an errata list on my Web site, together with any
bug fixes to the code. Also on the site you’ll find other articles that go into
greater depth on certain topics than this book. You can always find the latest
errata and fixes at />. If you do find an error, I
would be grateful if you would send me the details by e-mail to

. I can then fix it and update the Web site.
xvi
Introduction
Acknowledgments
There are several people without whom this book would never have been
completed. I’d like to present them in what might be termed historical order,
the order of their influence on me.
The first two are a couple of gentlemen I’ve never met or spoken to, and yet
who managed to open my eyes to and kindle my enthusiasm for the world of

algorithms. If they hadn’t, who knows where I might be now and what I
might be doing. I’m speaking of Donald Knuth (nford.
edu/~knuth/) and Robert Sedgewick ( In
fact, it was the latter’s Algorithms [20] that started me off, it being the first
algorithms book I ever bought, back when I was just getting into Turbo
Pascal. Donald Knuth needs no real introduction. His masterly The Art of Com-
puter Programming [11,12,13] remains at the top of the algorithms tree; I
first used it at Kings College, University of London while working toward my
B.Sc. Mathematics degree.
Fast forwarding a few years, Kim Kokkonen is the next person I would like to
thank. He gave me my job at TurboPower Software (bo-
power.com) and gave me the opportunity to learn more computer science than
I’d ever dreamt of before. A big thank you, of course, to all TurboPower’s
employees and those TurboPower customers I’ve gotten to know over the
years. I’d also like to thank Robert DelRossi, our president, for encouraging
me in this endeavor.
Next is a small company, now defunct, called Natural Systems. In 1993, they
produced a product called Data Structures for Turbo Pascal. I bought it, and,
in my opinion, it wasn’t very good. Oh, it worked fine, but I just didn’t agree
with its design or implementation and it just wasn’t fast enough. It drove me
to write my freeware EZSTRUCS library for Borland Pascal 7, from which I
derived EZDSL, my well-known freeware data structures library for Delphi.
This effort was the first time I’d really gotten to understand data structures,
since sometimes it is only through doing that you get to learn.
Thanks also to Chris Frizelle, the editor and owner of The Delphi Magazine
(
). He had the foresight to allow me to
pontificate on various algorithms in his inestimable magazine, finally
xvii
succumbing to giving me my own monthly column: Algorithms Alfresco. With

-
out him and his support, this book might have been written, but it certainly
wouldn’t have been as good. I certainly recommend a subscription to The
Delphi Magazine, as it remains, in my view, the most in-depth, intelligent ref
-
erence for Delphi programmers. Thanks to all my readers, as well, for their
suggestions and comments on the column.
Next to last, thanks to all the people at Wordware (d-
ware.com), including my editors, publisher Jim Hill, and developmental edi
-
tor Wes Beckwith. Jim was a bit dubious at first when I proposed publishing a
book on algorithms, but he soon came round to my way of thinking and has
been very supportive during its gestation. I’d also like to give my warmest
thanks to my tech editors: Steve Teixeira, the co-author of the tome on how
to get the best out of Delphi, Delphi n Developer’s Guide (where, at the time of
writing, n = 5), and my friend Anton Parris.
Finally, my thanks and my love go to my wife, Donna (she chivvied me to
write this book in the first place). Without her love, enthusiasm, and encour-
agement, I’d have given up ages ago. Thank you, sweetheart. Here’s to the
next one!
Julian M. Bucknall
Colorado Springs, April 1999 to February 2001
xviii
Acknowledgments
Chapter 1
What is an Algorithm?What is an Algorithm?
For a book on algorithms, we have to make sure that we know what we are
going to be discussing. As we’ll see, one of the main reasons for understand
-
ing and researching algorithms is to make our applications faster. Oh, I’ll

agree that sometimes we need algorithms that are more space efficient rather
than speed efficient, but in general, it’s performance we crave.
Although this book is about algorithms and data structures and how to imple-
ment them in code, we should also discuss some of the procedural algorithms
as well: how to write our code to help us debug it when it goes wrong, how
to test our code, and how to make sure that changes in one place don’t break
something elsewhere.
What is an Algorithm?What is an Algorithm?
As it happens, we use algorithms all the time in our programming careers, but
we just don’t tend to think of them as algorithms: “They’re not algorithms, it’s
just the way things are done.”
An algorithm is a step-by-step recipe for performing some calculation or pro
-
cess. This is a pretty loose definition, but once you understand that
algorithms are nothing to be afraid of per se, you’ll recognize and use them
without further thought.
Go back to your elementary school days, when you were learning addition.
The teacher would write on the board a sum like this:
45
17 +
1
and then ask you to add them up. You had been taught how to do this: start
with the units column and add the 5 and the 7 to make 12, put the 2 under
the units column, and then carry 1 above the 4.
1
45
17 +
2
You’d then add the carried 1, the 4 and the other 1 to make 6, which you’d
then write underneath the tens column. And, you’d have arrived at the con

-
centrated answer: 62.
Notice that what you had been taught was an algorithm to perform this and
any similar addition. You were not taught how to add 45 and 17 specifically
but were instead taught a general way of adding two numbers. Indeed, pretty
soon, you could add many numbers, with lots of digits, by applying the same
algorithm. Of course, in those days, you weren’t told that this was an algo-
rithm; it was just how you added up numbers.
In the programming world we tend to think of algorithms as being complex
methods to perform some calculation. For example, if we have an array of
customer records and we want to find a particular one (say, John Smith), we
might read through the entire array, element by element, until we either
found the John Smith one or reached the end of the array. This seems an
obvious way of doing it and we don’t think of it being an algorithm, but it
is—it’s known as a sequential search.
There might be other ways of finding “John Smith” in our hypothetical array.
For example, if the array were sorted by last name, we could use the binary
search algorithm to find John Smith. We look at the middle element in the
array. Is it John Smith? If so, we’re done. If it is less than John Smith (by “less
than,” I mean earlier in alphabetic sequence), then we can assume that John
Smith is in the first half of the array. If greater than, it’s in the latter half of
the array. We can then do the same thing again, that is, look at the middle
item and select the portion of the array that should have John Smith, slicing
and dicing the array into smaller and smaller parts, until we either find it or
the bit of the array we have left is empty.
Well, that algorithm certainly seems much more complicated than our origi
-
nal sequential search. The sequential search could be done with a nice simple
For loop with a call to Break at the right moment; the code for the binary
search would need a lot more calculations and local variables. So it might

seem that sequential search is faster, just because it’s simpler to code.
2
Chapter 1—What is an Algorithm?
TEAMFLY

Team-Fly
®

Enter the world of algorithm analysis where we do experiments and try and
formulate laws about how different algorithms actually work.
Analysis of Algorithms
Let’s look at the two possible searches for “John Smith” in an array: the
sequential search and the binary search. We’ll implement both algorithms and
then play with them in order to ascertain their performance attributes. Listing
1.1 is the simple sequential search.
Listing 1.1: Sequential search for a name in an array
function SeqSearch(aStrs : PStringArray; aCount : integer;
const aName : string5) : integer;
var
i : integer;
begin
for i:=0to pred(aCount) do
if CompareText(aStrs^[i], aName) = 0 then begin
Result := i;
Exit;
end;
Result := -1;
end;
Listing 1.2 shows the more complex binary search. (At the present time we
won’t go into what is happening in this routine—we discuss the binary search
algorithm in detail in Chapter 4.)
Listing 1.2: Binary search for a name in an array
function BinarySearch(aStrs : PStringArray; aCount : integer;
const aName : string5) : integer;
var
L, R, M : integer;
CompareResult : integer;
begin

L:=0;
R := pred(aCount);
while (L <= R) do begin
M:=(L+R)div 2;
CompareResult := CompareText(aStrs^[M], aName);
if (CompareResult = 0) then begin
Result := M;
Exit;
end
else if (CompareResult < 0) then
L:=M+1
else
3
Chapter 1—What is an Algorithm?
R:=M-1;
end;
Result := -1;
end;
Just by looking at both routines it’s very hard to make a judgment about
performance. In fact, this is a philosophy that we should embrace whole-
heartedly: it can be very hard to tell how speed efficient some code is just by
looking at it. The only way we can truly find out how fast code is, is to run it.
Nothing else will do. Whenever we have a choice between algorithms, as we
do here, we should test and time the code under different environments, with
different inputs, in order to ascertain which algorithm is better for our needs.
The traditional way to do this timing is with a profiler. The profiler program
loads up our test application and then accurately times the various routines
we’re interested in. My advice is to use a profiler as a matter of course in all
your programming projects. It is only with a profiler that you can truly deter-
mine where your application spends most of its time, and hence which

routines are worth your spending time on optimization tasks.
The company I work for, TurboPower Software Company, has a professional
profiler in its Sleuth QA Suite product. I’ve tested all of the code in this book
under both StopWatch (the name of the profiling program in Sleuth QA
Suite) and under CodeWatch (the resource and memory leak debugger in the
suite). However, even if you do not have a profiler, you can still experiment
and time routines; it’s just a little more awkward, since you have to embed
calls to time routines in your code. Any profiler worth buying does not alter
your code; it does its magic by modifying the executable in memory at run
time.
For this experiment with searching algorithms, I wrote the test program to do
its own timing. Essentially, the code grabs the system time at the start of the
code being timed and gets it again at the end. From these two values it can
calculate the time taken to perform the task. Actually, with modern faster
machines and the low resolution of the PC clock, it’s usually beneficial to time
several hundred calls to the routine, from which we can work out an average.
(By the way, this program was written for 32-bit Delphi and will not compile
with Delphi 1 since it allocates arrays on the heap that are greater than
Delphi 1’s 64 KB limit.)
I ran the performance experiments in several different forms. First, I timed
how long it took to find “Smith” in arrays containing 100, 1,000, 10,000, and
100,000 elements, using both algorithms and making sure that a “Smith” ele
-
ment was present. For the next series of tests, I timed how long it took to find
4
Chapter 1—What is an Algorithm?
“Smith” in the same set of arrays with both algorithms, but this time I
ensured that “Smith” was not present. Table 1.1 shows the results of my tests.
Table 1.1: Timing sequential and binary searches
Fail Success

Sequential
100 0.14 0.10
1,000 1.44 1.05
10,000 15.28 10.84
100,000 149.42 106.35
Binary
100 0.01 0.01
1,000 0.01 0.01
10,000 0.02 0.02
100,000 0.03 0.02
As you can see, the timings make for some very interesting reading. The time
taken to perform a sequential search is proportional to the number of ele-
ments in the array. We say that the execution characteristics of sequential
search are linear.
However, the binary search statistics are somewhat more difficult to charac-
terize. Indeed, it even seems as if we’re falling into a timing resolution
problem because the algorithm is so fast. The relationship between the time
taken and the number of elements in the array is no longer a simple linear
one. It seems to be something much less than this, and something that is not
brought out by these tests.
I reran the tests and scaled the binary timings by a factor of 100.
Table 1.2: Retiming binary searches
Fail Success
100 0.89 0.57
1,000 1.47 1.46
10,000 2.06 2.06
100,000 2.50 2.41
Here we get a much more impressive set of data. You can see that increasing
the number of elements tenfold results in a run time that’s increased the time
5

Chapter 1—What is an Algorithm?
by a constant amount (roughly half a unit). This is a logarithmic relationship:
the time taken to do a binary search is proportional to the logarithm of the
number of elements in the array.
(This can be a little hard to see for a non-mathematician. Recall from your
school days that one way to multiply two numbers is to calculate their loga
-
rithms, add them, and then calculate the anti-logarithm to give the answer.
Since we are multiplying by a factor of 10 in these profiling tests, it would be
equivalent to adding a constant when viewed logarithmically. Exactly the case
we see in the test results: we’re adding half a unit every time.)
So, what have we learned as a result of this experiment? As a first lesson, we
have learned that the only way to understand the performance characteristics
of an algorithm is to actually time it.
In general, the only way to see the efficiency of a piece of code is to time it.
That applies to everything you write, whether you’re using a well-known
algorithm or you’ve devised one to suit the current situation. Don’t guess,
measure.
As a lesser lesson, we have also seen that sequential search is linear in nature,
whereas binary search is logarithmic. If we were mathematically inclined, we
could then take these statistical results and prove them as theorems. In this
book, however, I do not want to overburden the text with a lot of mathemat-
ics; there are plenty of college textbooks that could do it much better than I.
The Big-Oh Notation
We need a compact notation to express the performance characteristics we
measure, rather than having to say things like “the performance of algorithm
X is proportional to the number of items cubed,” or something equally ver
-
bose. Computer science already has such a scheme; it’s called the big-Oh
notation.

For this notation, we work out the mathematical function of n, the number of
items, to which the algorithm’s performance is proportional, and say that the
algorithm is a O(f(n)) algorithm, where f(n) is some function of n. We read
this as “big-Oh of f(n)”, or, less rigorously, as “proportional to f(n).”
For example, our experiments showed us that sequential search is a O(n)
algorithm. Binary search, on the other hand, is a O(log(n)) algorithm. Since
log(n)<n, for all positive n we could say that binary search is always faster
than sequential search; however, in a moment, I will give you a couple of
warnings about taking conclusions from the big-Oh notation too far.
6
Chapter 1—What is an Algorithm?

delphi - the tomes of delphi - algorithms and data structures

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về