Tải bản đầy đủ (.pdf) (92 trang)

Data Structures and Algorithms in Java 4th phần 5 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.72 MB, 92 trang )

methods of the array list ADT on such a representation. Is it better to store the
entries in L by increasing indices or not?
C-6.17
There is a simple, but inefficient, algorithm, called bubble-sort, for sorting a
sequence S of n comparable elements. This algorithm scans the sequence n−1
times, where, in each scan, the algorithm compares the current element with the
next one and swaps them if they are out of order. Give a pseudo-code
description of bubble-sort that is as efficient as possible assuming S is
implemented with a doubly linked list. What is the running time of this
algorithm?
C-6.18
Answer Exercise C-6.17 assuming S is implemented with an array list.
C-6.19
A useful operation in databases is the natural join. If we view a database as a
list of ordered pairs of objects, then the natural join of databases A and B is the
list of all ordered triples (x,y,z) such that the pair (x,y) is in A and the pair (y,z) is
in B. Describe and analyze an efficient algorithm for computing the natural join
of a list A of n pairs and a list B of m pairs.
C-6.20
When Bob wants to send Alice a message M on the Internet, he breaks M into n
data packets, numbers the packets consecutively, and injects them into the
network. When the packets arrive at Alice's computer, they may be out of order,
so Alice must assemble the sequence of n packets in order before she can be
sure she has the entire message. Describe an efficient scheme for Alice to do
this. What is the running time of this algorithm?
C-6.21
Given a list L of n positive integers, each represented with k = logn + 1 bits,
describe an O(n)-time method for finding a k-bit integer not in L.
C-6.22
Argue why any solution to the previous problem must run in Ω(n) time.
C-6.23


Given a list L of n arbitrary integers, design an O(n)-time method for finding an
integer that cannot be formed as the sum of two integers in L.

370
C-6.24
Isabel has an interesting way of summing up the values in an array A of n
integers, where n is a power of two. She creates an array B of half the size of A
and sets B[i] = A[2i] +A[2i+ 1], for i = 0,1, , (n/2) − 1. If B has size 1, then she
outputs B[0]. Otherwise, she replaces A with B, and repeats the process. What is
the running time of her algorithm?
Projects
P-6.1
Implement the array list ADT by means of an extendable array used in a circular
fashion, so that insertions and deletions at the beginning and end of the array list
run in constant time.
P-6.2
Implement the array list ADT using a doubly linked list. Show experimentally
that this implementation is worse than the array-based approach.
P-6.3
Write a simple text editor, which stores and displays a string of characters using
the list ADT, together with a cursor object that highlights a position in this
string. Your editor should support the following operations:

left: Move cursor left one character (do nothing if at text end).

right: Move cursor right one character (do nothing if at text end).

cut: Delete the character right of the cursor (do nothing at text end).

paste c: Insert the character c just after the cursor.

P-6.4
Implement a phased favorites list. A phase consists of N accesses in the list, for
a given parameter N. During a phase, the list should maintain itself so that
elements are ordered by decreasing access counts during that phase. At the end

371
of a phase, it should clear all the access counts and start the next phase.
Experimentally, determine what are the best values of N for various list sizes.
P-6.5
Write a complete adapter class that implements the sequence ADT using a
java.util.ArrayList object.
P-6.6
Implement the favorites list application using an array list instead of a list.
Compare it experimentally to the list-based implementation.
Chapter Notes
The concept of viewing data structures as collections (and other principles of object-
oriented design) can be found in object-oriented design books by Booch [14], Budd
[17], Golberg and Robson [40], and Liskov and Guttag [69]. Lists and iterators are
pervasive concepts in the Java Collections Framework. Our node list ADT is derived
from the "position" abstraction introduced by Aho, Hopcroft, and Ullman [5], and the
list ADT of Wood [100]. Implementations of lists via arrays and linked lists are
discussed by Knuth [62].















372




Chapter 7 Trees

Contents
7.1
General Trees
266
7.1.1
Tree Definitions and Properties
267
7.1.2
The Tree Abstract Data Type

373
270
7.1.3
Implementing a Tree
271
7.2
Tree Traversal Algorithms

273
7.2.1
Depth and Height
273
7.2.2
Preorder Traversal
276
7.2.3
Postorder Traversal
279
7.3
Binary Trees
282
7.3.1
The Binary Tree ADT
284
7.3.2
A Binary Tree Interface in Java
284

374
7.3.3
Properties of Binary Trees
285
7.3.4
A Linked Structure for Binary Trees
287
7.3.5
An Array-List Representation of a Binary Tree
296

7.3.6
Traversals of Binary Trees
298
7.3.7
The Template Method Pattern
305
7.4
Exercises
309
java.datastructures.net
7.1 General Trees
Productivity experts say that breakthroughs come by thinking "nonlinearly." In this
chapter, we discuss one of the most important nonlinear data structures in
computing—trees. Tree structures are indeed a breakthrough in data organization, for
they allow us to implement a host of algorithms much faster than when using linear
data structures, such as list. Trees also provide a natural organization for data, and
consequently have become ubiquitous structures in file systems, graphical user
interfaces, databases, Web sites, and other computer systems.

375
It is not always clear what productivity experts mean by "nonlinear" thinking, but
when we say that trees are "nonlinear," we are referring to an organizational
relationship that is richer than the simple "before" and "after" relationships between
objects in sequences. The relationships in a tree are hierarchical, with some objects
being "above" and some "below" others. Actually, the main terminology for tree data
structures comes from family trees, with the terms "parent," "child," "ancestor," and
"descendent" being the most common words used to describe relationships. We show
an example of a family tree in Figure 7.1.
Figure 7.1: A family tree showing some
descendents of Abraham, as recorded in Genesis,

chapters 25–36.

7.1.1 Tree Definitions and Properties
A tree is an abstract data type that stores elements hierarchically. With the
exception of the top element, each element in a tree has a parent element and zero
or more children elements. A tree is usually visualized by placing elements inside
ovals or rectangles, and by drawing the connections between parents and children
with straight lines. (See Figure 7.2.) We typically call the top element the root of
the tree, but it is drawn as the highest element, with the other elements being
connected below (just the opposite of a botanical tree).
Figure 7.2: A tree with 17 nodes representing the
organization of a fictitious corporation. The root stores

376
Electronics R'Us. The children of the root store R&D,
Sales, Purchasing, and Manufacturing. The internal
nodes store Sales, International, Overseas, Electronics
R'Us, and Manufacturing.

Formal Tree Definition
Formally, we define a tree T as a set of nodes storing elements such that the nodes
have a parent-child relationship, that satisfies the following properties:
• If T is nonempty, it has a special node, called the root of T, that has no
parent.
• Each node v of T different from the root has a unique parent node w;
every node with parent w is a child of w.
Note that according to our definition, a tree can be empty, meaning that it doesn't
have any nodes. This convention also allows us to define a tree recursively, such
that a tree T is either empty or consists of a node r, called the root of T, and a
(possibly empty) set of trees whose roots are the children of r.

Other Node Relationships

377
Two nodes that are children of the same parent are siblings. A node v is external
if v has no children. A node v is internal if it has one or more children. External
nodes are also known as leaves.
Example 7.1: In most operating systems, files are organized hierarchically
into nested directories (also called folders), which are presented to the user in the
form of a tree. (See Figure 7.3.) More specifically, the internal nodes of the tree
are associated with directories and the external nodes are associated with regular
files. In the UNIX and Linux operating systems, the root of the tree is
appropriately called the "root directory," and is represented by the symbol "/."
Figure 7.3: Tree representing a portion of a file
system.

A node u is an ancestor of a node v if u = v or u is an ancestor of the parent of v.
Conversely, we say that a node v is a descendent of a node u if u is an ancestor of
v. For example, in Figure 7.3
, cs252/ is an ancestor of papers/, and pr3 is a
descendent of cs016/. The subtree of T rooted at a node v is the tree consisting
of all the descendents of v in T (including v itself). In Figure 7.3
, the subtree
rooted at cs016/ consists of the nodes cs016/, grades, homeworks/,
programs/, hw1, hw2, hw3, pr1, pr2, and pr3.

378
Edges and Paths in Trees
An edge of tree T is a pair of nodes (u, v) such that u is the parent of v, or vice
versa. A path of T is a sequence of nodes such that any two consecutive nodes in
the sequence form an edge. For example, the tree in Figure 7.3

contains the path
(cs252/, projects/, demos/, market).
Example 7.2: The inheritance relation between classes in a Java program
forms a tree. The root, java. lang. Object, is an ancestor of all other
classes. Each class, C, is a descendent of this root and is the root of a subtree of
the classes that extend C. Thus, there is a path from C to the root,
java.lang.Object, in this inheritance tree.
Ordered Trees
A tree is ordered if there is a linear ordering defined for the children of each node;
that is, we can identify the children of a node as being the first, second, third, and
so on. Such an ordering is usually visualized by arranging siblings left to right,
according to their ordering. Ordered trees typically indicate the linear order
among siblings by listing them in the correct order.
Example 7.3: The components of a structured document, such as a book, are
hierarchically organized as a tree whose internal nodes are parts, chapters, and
sections, and whose external nodes are paragraphs, tables, figures, and so on.
(See Figure 7.4.) The root of the tree corresponds to the book itself. We could, in
fact, consider expanding the tree further to show paragraphs consisting of
sentences, sentences consisting of words, and words consisting of characters.
Such a tree is an example of an ordered tree, because there is a well-defined
ordering among the children of each node.
Figure 7.4: An ordered tree associated with a book.


379
7.1.2 The Tree Abstract Data Type
The tree ADT stores elements at positions, which, as with positions in a list, are
defined relative to neighboring positions. The positions in a tree are its nodes, and
neighboring positions satisfy the parent-child relationships that define a valid tree.
Therefore, we use the terms "position" and "node" interchangeably for trees. As

with a list position, a position object for a tree supports the method:
element(): return the object stored at this position.
The real power of node positions in a tree, however, comes from the accessor
methods of the tree ADT that return and accept positions, such as the following:
root():
return the tree's root; an error occurs if the tree is empty.
parent (v):
return the parent of v; an error occurs if v is the root.
children(v):
return an iterable collection containing the children of node v.
If a tree T is ordered, then the iterable collection, children(v), stores the children of
v in order. If v is an external node, then children(v) is empty.
In addition to the above fundamental accessor methods, we also include the
following query methods:
isInternal(v):
Test whether node v is internal.
isExternal(v):
Test whether node v is external.
isRoot(v):
Test whether node v is the root.
These methods make programming with trees easier and more readable, since we
can use them in the conditionals of if statements and while loops, rather than
using a nonintuitive conditional.

380
There are also a number of generic methods a tree should probably support that are
not necessarily related to its tree structure, including the following:
size():
return the number of nodes in the tree.
isEmpty():

Test whether the tree has any nodes or not.
iterator():
return an iterator of all the elements stored at nodes of the tree.
positions():
return an iterable collection of all the nodes of the tree.
replace(v,e):
Replace with e and return the element stored at node v.
Any method that takes a position as an argument should generate an error condition
if that position is invalid. We do not define any specialized update methods for trees
here. Instead, we prefer to describe different tree update methods in conjunction
with specific applications of trees in subsequent chapters. In fact, we can imagine
several kinds of tree update operations beyond those given in this book.
7.1.3 Implementing a Tree
The Java interface shown in Code Fragment 7.1 represents the tree ADT. Error
conditions are handled as follows: Each method that can take a position as an
argument, may throw an InvalidPositionException, to indicate that the
position is invalid. Method parent throws a BoundaryViolationException
if it is called on the root. Method root throws an EmptyTreeException if it is
called on an empty tree.
Code Fragment 7.1: Java interface Tree
representing the tree ADT. Additional update methods
may be added, depending on the application. We do
not include such methods in the interface, however.

381

A Linked Structure for General Trees
A natural way to realize a tree T is to use a linked structure, where we represent
each node v of T by a position object (see Figure 7.5a
) with the following fields:

A reference to the element stored at v, a link to the parent of v, and a some kind of
collection (for example, a list or array) to store links to the children of v. If v is the
root of T, then the parent field of v is null. Also, we store a reference to the root of
T and the number of nodes of T in internal variables. This structure is
schematically illustrated in Figure 7.5b
.

382
Figure 7.5: The linked structure for a general tree:
(a) the position object associated with a node; (b) the
portion of the data structure associated with a node
and its children.

Table 7.1 summarizes the performance of the implementation of a general tree
using a linked structure. The analysis is left as an exercise (C-7.25), but we note
that, by using a collection to store the children of each node v, we can implement
children(v) simply by returning a reference to this collection.
Table 7.1: Running times of the methods of an n-
node general tree implemented with a linked
structure. We let c
v
denote the number of children of a
node v. The space usage is O(n).
Operation
Time
size, isEmpty
O(1)
iterator, positions
O(n)


383
replace
O(1)
root, parent
O(1)
children(v)
O(c
v
)
isInternal, isExternal, isRoot
O(1)
7.2 Tree Traversal Algorithms
In this section, we present algorithms for performing traversal computations on a tree
by accessing it through the tree ADT methods.
7.2.1 Depth and Height
Let v be a node of a tree T. The depth of v is the number of ancestors of v,
excluding v itself. For example, in the tree of Figure 7.2, the node storing
International has depth 2. Note that this definition implies that the depth of the root
of T is 0.
The depth of a node v can also be recursively defined as follows:
• If v is the root, then the depth of v is 0
• Otherwise, the depth of v is one plus the depth of the parent of v.
Based on this definition, we present a simple, recursive algorithm, depth, in Code
Fragment 7.2, for computing the depth of a node v in T. This method calls itself
recursively on the parent of v, and adds 1 to the value returned. A simple Java
implementation of this algorithm is shown in Code Fragment 7.3
.
Code Fragment 7.2: Algorithm for computing the
depth of a node v in a tree T.


384

Code Fragment 7.3: Method depth written in Java.

The running time of algorithm depth(T, v) is O(d
v
), where d
v
denotes the depth of
the node v in the tree T, because the algorithm performs a constant-time recursive
step for each ancestor of v. Thus, algorithm depth (T, v) runs in O(n) worst-case
time, where n is the total number of nodes of T, since a node of T may have depth n
− 1 in the worst case. Although such a running time is a function of the input size, it
is more accurate to characterize the running time in terms of the parameter d
v
, since
this parameter can be much smaller than n.
Height
The height of a node v in a tree T is also defined recursively:
• If v is an external node, then the height of v is 0
• Otherwise, the height of v is one plus the maximum height of a child of v.
The height of a nonempty tree T is the height of the root of T. For example, the
tree of Figure 7.2 has height 4. In addition, height can also be viewed as follows.
Proposition 7.4: The height of a nonempty tree T is equal to the maximum
depth of an external node of T.
We leave the justification of this fact to an exercise (R-7.6). We present here an
algorithm, height1, shown in Code Fragment 7.4 and implemented in Java in
Code Fragment 7.5
, for computing the height of a nonempty tree T based on the
proposition above and the algorithm depth from Code Fragment 7.2

.

385
Code Fragment 7.4: Algorithm height1 for
computing the height of a nonempty tree T. Note that
this algorithm calls algorithm depth (Code Fragment
7.2).

Code Fragment 7.5: Method height1 written in
Java. Note the use of the max method of class
java.lang. Math.

Unfortunately, algorithm height1 is not very efficient. Since height1 calls
algorithm depth (v) on each external node v of T, the running time of height1 is
given by O(n + Σ
v
(1 + d
v
)), where n is the number of nodes of T, d
v
is the depth
of node v, and E is the set of external nodes of T. In the worst case, the sumΣ
v
(1
d
+
v
) is proportional to n
2
. (See Exercise C-7.6.) Thus, algorithm height1 runs in

O(n
2
) time.
Algorithm height2, shown in Code Fragment 7.6 and implemented in Java in
Code Fragment 7.7, computes the height of tree T in a more efficient manner by
using the recursive definition of height.

386
Code Fragment 7.6: Algorithm height2 for
computing the height of the subtree of tree T rooted
at a node v.

Code Fragment 7.7: Method height2 written in
Java.

Algorithm height2 is more efficient than height1 (from Code Fragment 7.4
).
The algorithm is recursive, and, if it is initially called on the root of T, it will
eventually be called on each node of T. Thus, we can determine the running time
of this method by summing, over all the nodes, the amount of time spent at each
node (on the nonrecursive part). Processing each node in children(v) takes O(c
v
)
time, where c
v
denotes the number of children of node v. Also, the while loop
has c
v
iterations and each iteration of the loop takes O(1) time plus the time for
the recursive call on a child of v. Thus, algorithm height2 spends O(1 + c

v
)
time at each node v, and its running time is O(Σ
v
(1 + c
v
)). In order to complete the
analysis, we make use of the following property.
Proposition 7.5: Let T be a tree with n nodes, and let c
v
denote the
number of children of a node v of T. Then, summing over the vertices in T, Σ
v
c
v
=
n − 1.

387
Justification: Each node of T, with the exception of the root, is a child of
another node, and thus contributes one unit to the above sum.

By Proposition 7.5
, the running time of algorithm height2, when called on the
root of T, is O(n), where n is the number of nodes of T.
7.2.2 Preorder Traversal
A traversal of a tree T is a systematic way of accessing, or "visiting," all the nodes
of T. In this section, we present a basic traversal scheme for trees, called preorder
traversal. In the next section, we will study another basic traversal scheme, called
postorder traversal.

In a preorder traversal of a tree T, the root of T is visited first and then the subtrees
rooted at its children are traversed recursively. If the tree is ordered, then the
subtrees are traversed according to the order of the children. The specific action
associated with the "visit" of a node v depends on the application of this traversal,
and could involve anything from incrementing a counter to performing some
complex computation for v. The pseudo-code for the preorder traversal of the
subtree rooted at a node v is shown in Code Fragment 7.8. We initially call this
algorithm with preorder(T,T.root()).
Code Fragment 7.8: Algorithm preorder for
performing the preorder traversal of the subtree of a
tree T rooted at a node v.

The preorder traversal algorithm is useful for producing a linear ordering of the
nodes of a tree where parents must always come before their children in the
ordering. Such orderings have several different applications. We explore a simple
instance of such an application in the next example.
Figure 7.6: Preorder traversal of an ordered tree,
where the children of each node are ordered from left
to right.

388

Example 7.6: The preorder traversal of the tree associated with a document, as
in Example 7.3, examines an entire document sequentially, from beginning to end.
If the external nodes are removed before the traversal, then the traversal examines
the table of contents of the document. (See Figure 7.6
.)
The preorder traversal is also an efficient way to access all the nodes of a tree. To
justify this, let us consider the running time of the preorder traversal of a tree T with
n nodes under the assumption that visiting a node takes O(1) time. The analysis of

the preorder traversal algorithm is actually similar to that of algorithm height2
(Code Fragment 7.7
), given in Section 7.2.1. At each node v, the nonrecursive part
of the preorder traversal algorithm requires time O(1 + c
v
), where c
v
is the number
of children of v. Thus, by Proposition 7.5, the overall running time of the preorder
traversal of T is O(n).
Algorithm toStringPreorder(T, v), implemented in Java in Code
Fragment 7.9, performs a preorder printing of the subtree of a node v of T, that is, it
performs the preorder traversal of the subtree rooted at v and prints the element
stored at a node when the node is visited. Recall that, for an ordered tree T, method
T.children(v) returns an iterable collection that accesses the children of v in
order.
Code Fragment 7.9: Method toStringPreorder(T, v)
that performs a preorder printing of the elements in the
subtree of node v of T.


389
There is an interesting application of the preorder traversal algorithm that produces
a string representation of an entire tree. Let us assume again that for each element e
stored in tree T, calling e.toString() returns a string associated with e. The
parenthetic string representation P(T) of tree T is recursively defined as follows. If
T consists of a single node v, then
P(T) = v.element().toString().
Otherwise,
P(T) = v.element().toString() + " (" + P(T

1
) + "," + ··· + ", "+ P(T
k
) +")",
where v is the root of T and T
1
, T
2
, , T
k
are the subtrees rooted at the children of v,
which are given in order if T is an ordered tree.
Note that the above definition of P(T) is recursive. Also, we are using "+" here to
denote string concatenation. The parenthetic representation of the tree of Figure 7.2

is shown in Figure 7.7.
Figure 7.7: Parenthetic representation of the tree of
Figure 7.2
. Indentation, line breaks and spaces have
been added for clarity.

Note that, technically speaking, there are some computations that occur between
and after the recursive calls at a node's children in the above algorithm. We still
consider this algorithm to be a preorder traversal, however, since the primary action
of printing a node's contents occurs prior to the recursive calls.
The Java method parentheticRepresentation, shown in Code Fragment
7.10, is a variation of method toStringPreorder (Code Fragment 7.9). It
implements the definition given above to output a parenthetic string representation
of a tree T. As with the method toStringPreorder, the method
parentheticRepresentation makes use of the toString method that

is defined for every Java object. In fact, we can view this method as a kind of
toString() method for tree objects.

390
Code Fragment 7.10: Algorithm
parentheticRepresentation. Note the use of the +
operator to concatenate two strings.

We explore a modification to Code Fragment 7.10 in Exercise R-7.9, to display a
tree in a fashion more closely matching that given in Figure 7.7.
7.2.3 Postorder Traversal
Another important tree traversal algorithm is the postorder traversal. This
algorithm can be viewed as the opposite of the preorder traversal, because it
recursively traverses the subtrees rooted at the children of the root first, and then
visits the root. It is similar to the preorder traversal, however, in that we use it to
solve a particular problem by specializing an action associated with the "visit" of a
node v. Still, as with the preorder traversal, if the tree is ordered, we make recursive
calls for the children of a node v according to their specified order. Pseudo-code for
the postorder traversal is given in Code Fragment 7.11
.
Code Fragment 7.11: Algorithm postorder for
performing the postorder traversal of the subtree of a
tree T rooted at a node v.


391
The name of the postorder traversal comes from the fact that this traversal method
will visit a node v after it has visited all the other nodes in the subtree rooted at v.
(See Figure 7.8.)
Figure 7.8: Postorder traversal of the ordered tree

of Figure 7.6
.

The analysis of the running time of a postorder traversal is analogous to that of a
preorder traversal. (See Section 7.2.2.) The total time spent in the nonrecursive
portions of the algorithm is proportional to the time spent visiting the children of
each node in the tree. Thus, a postorder traversal of a tree T with n nodes takes O(n)
time, assuming that visiting each node takes O(1) time. That is, the postorder
traversal runs in linear time.
As an example of postorder traversal, we show a Java method
toStringPostorder in Code Fragment 7.12, which performs a postorder
traversal of a tree T. This method prints the element stored at a node when it is
visited.
Code Fragment 7.12: Method
toStringPostorder(T, v) that performs a
postorder printing of the elements in the subtree of
node v of T. The method implicitly calls toString on
elements, when they are involved in a string
concatenation operation.

392

The postorder traversal method is useful for solving problems where we wish to
compute some property for each node v in a tree, but computing that property for v
requires that we have already computed that same property for v's children. Such an
application is illustrated in the following example.
Example 7.7: Consider a file system tree T, where external nodes represent
files and internal nodes represent directories (Example 7.1). Suppose we want to
compute the disk space used by a directory, which is recursively given by the sum
of:

• The size of the directory itself
• The sizes of the files in the directory
• The space used by the children directories.
(See Figure 7.9.) This computation can be done with apostorder traversal of tree T.
After the subtrees of an internal node v have been traversed, we compute the space
used by v by adding the sizes of the directory v itself and of the files contained in v
to the space used by each internal child of v, which was computed by the recursive
postorder traversals of the children of v.
A Recursive Java Method for Computing Disk Space
Motivated by Example 7.7
, Java method diskSpace, shown in Code Fragment 7.13,
performs a postorder traversal of a file-system tree T, printing the name and disk
space used by the directory associated with each internal node of T. When called on
the root of tree T, diskSpace runs in time O(n), where n is the number of nodes of T,
provided the auxiliary methods name and size take O(1) time.
Figure 7.9: The tree of Figure 7.3 representing a file
system, showing the name and size of the associated
file/directory inside each node, and the disk space used
by the associated directory above each internal node.

393

Code Fragment 7.13: Method diskSpace prints
the name and disk space used by the directory
associated with each internal node of a file-system tree.
This method calls the auxiliary methods name and size,
which should be defined to return the name and size of
the file/directory associated with a node.

Other Kinds of Traversals


394

×