Tải bản đầy đủ (.pdf) (92 trang)

data structures algorithms in java 4th part 2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.71 MB, 92 trang )

means of an unsorted or sorted list, respectively. We
assume that the list is implemented by a doubly linked
list. The space requirement is O(n).
Method
Unsorted List
Sorted List
size, isEmpty
O(1)
O(1)
insert
O(1)
O(n)
min, removeMin
O(n)
O(1)
Java Implementation
In Code Fragments 8.6 and 8.8, we show a Java implementation of a priority
queue based on a sorted node list. This implementation uses a nested class, called
MyEntry, to implement the Entry interface (see Section 6.5.1
). We do not
show auxiliary method checkKey(k), which throws an
InvalidKeyException if key k cannot be compared with the comparator of
the priority queue. Class DefaultComparator, which realizes a comparator
using the natural ordering, is shown in Code Fragment 8.7.
Code Fragment 8.6: Portions of the Java class
SortedListPriorityQueue, which implements the
PriorityQueue interface. The nested class MyEntry
implements the Entry interface. (Continues in Code
Fragment 8.8.)

462



Code Fragment 8.7: Java class
DefaultComparator that implements a comparator
using the natural ordering and is the default
comparator for class SortedListPriorityQueue.

463

Code Fragment 8.8: Portions of the Java class
SortedListPriorityQueue, which implements the
PriorityQueue interface. (Continued from Code
Fragment 8.6.)

464


465
8.2.3 Selection-Sort and Insertion-Sort
Recall the PriorityQueueSort scheme introduced in Section 8.1.4. We are
given an unsorted sequence S containing n elements, which we sort using a priority
queue P in two phases. In Phase 1 we insert all the elements into P and in Phase 2
we repeatedly remove the elements from P using the removeMin() method.
Selection-Sort
If we implement P with an unsorted list, then Phase 1 of PriorityQueueSort
takes O(n) time, for we can insert each element in O(1) time. In Phase 2, the
running time of each removeMin operation is proportional to the size of P.
Thus, the bottleneck computation is the repeated "selection" of the minimum
element in Phase 2. For this reason, this algorithm is better known as selection-
sort. (See Figure 8.1.)
As noted above, the bottleneck is in Phase 2 where we repeatedly remove an entry

with smallest key from the priority queue P. The size of P starts at n and
incrementally decreases with each removeMin until it becomes 0. Thus, the first
removeMin operation takes time O(n), the second one takes time O(n − 1), and
so on, until the last (nth) operation takes time O(1). Therefore, the total time
needed for the second phase is
.
By Proposition 4.3, we have . Thus, Phase 2 takes time
O(n
2
), as does the entire selection-sort algorithm.
Figure 8.1: Execution of selection-sort on sequence
S = (7,4,8,2,5,3,9).

466

Insertion-Sort
If we implement the priority queue P using a sorted list, then we improve the
running time of Phase 2 to O(n), for each operation removeMin on P now takes
O(1) time. Unfortunately, Phase 1 now becomes the bottleneck for the running
time, since, in the worst case, each insert operation takes time proportional to the
size of P. This sorting algorithm is therefore better known as insertion-sort (see
Figure 8.2), for the bottleneck in this sorting algorithm involves the repeated
"insertion" of a new element at the appropriate position in a sorted list.
Figure 8.2: Execution of insertion-sort on sequence
S = (7,4,8,2,5,3,9). In Phase 1, we repeatedly remove
the first element of S and insert it into P, by scanning
the list implementing P, until we find the correct place
for this element. In Phase 2, we repeatedly perform
removeMin operations on P, each of which returns
the first element of the list implementing P, and we

add the element at the end of S. Analyzing the running
time of Phase 1 of insertion-sort, we note that it is

467

Analyzing the running time of Phase 1 of insertion-sort, we note that it is

Again, by recalling Proposition 4.3, Phase 1 runs in O(n
2
) time, and hence, so
does the entire insertion-sort algorithm.
Alternatively, we could change our definition of insertion-sort so that we insert
elements starting from the end of the priority-queue list in Phase 1, in which case
performing insertion-sort on a sequence that is already sorted would run in O(n)
time. Indeed, the running time of insertion-sort in this case is O(n + I), where I is
the number of inversions in the sequence, that is, the number of pairs of elements
that start out in the input sequence in the wrong relative order.
8.3 Heaps
The two implementations of the PriorityQueueSort scheme presented in the
previous section suggest a possible way of improving the running time for priority-
queue sorting. For one algorithm (selection-sort) achieves a fast running time for
Phase 1, but has a slow Phase 2, whereas the other algorithm (insertion-sort) has a
slow Phase 1, but achieves a fast running time for Phase 2. If we can somehow
balance the running times of the two phases, we might be able to significantly speed
up the overall running time for sorting. This is, in fact, exactly what we can achieve
using the priority-queue implementation discussed in this section.

468
An efficient realization of a priority queue uses a data structure called a heap. This
data structure allows us to perform both insertions and removals in logarithmic time,

which is a significant improvement over the list-based implementations discussed in
Section 8.2.
The fundamental way the heap achieves this improvement is to abandon
the idea of storing entries in a list and take the approach of storing entries in a binary
tree instead.
8.3.1 The Heap Data Structure
A heap (see Figure 8.3) is a binary tree T that stores a collection of entries at its
nodes and that satisfies two additional properties: a relational property defined in
terms of the way keys are stored in T and a structural property defined in terms of
the nodes of T itself. We assume that a total order relation on the keys is given, for
example, by a comparator.
The relational property of T, defined in terms of the way keys are stored, is the
following:
Heap-Order Property: In a heap T, for every node v other than the root, the key
stored at v is greater than or equal to the key stored at v's parent.
As a consequence of the heap-order property, the keys encountered on a path from
the root to an external node of T are in nondecreasing order. Also, a minimum key
is always stored at the root of T. This is the most important key and is informally
said to be "at the top of the heap"; hence, the name "heap" for the data structure. By
the way, the heap data structure defined here has nothing to do with the memory
heap (Section 14.1.2) used in the run-time environment supporting a programming
language like Java.
If we define our comparator to indicate the opposite of the standard total order
relation between keys (so that, for example, compare(3,2) > 0), then the root of the
heap stores the largest key. This versatility comes essentially "for free" from our
use of the comparator pattern. By defining the minimum key in terms of the
comparator, the "minimum" key with a "reverse" comparator is in fact the largest.
Figure 8.3: Example of a heap storing 13 entries
with integer keys. The last node is the one storing entry
(8, W).


469

Thus, without loss of generality, we assume that we are always interested in the
minimum key, which will always be at the root of the heap.
For the sake of efficiency, as will become clear later, we want the heap T to have as
small a height as possible. We enforce this requirement by insisting that the heap T
satisfy an additional structural property: it must be complete. Before we define this
structural property, we need some definitions. We recall from Section 7.3.3 that
level i of a binary tree T is the set of nodes of Tthat have depth i. Given nodes v and
w on the same level of T, we say that v is to the left of w if v is encountered before
w in an inorder traversal of T. That is, there is a node u of T such that v is in the left
subtree of u and w is in the right subtree of u. For example, in the binary tree of
Figure 8.3, the node storing entry (15,K) is to the left of the node storing entry (7,
Q). In a standard drawing of a binary tree, the "to the left of" relation is visualized
by the relative horizontal placement of the nodes.
Complete Binary Tree Property: A heap T with height h is a complete binary tree if
levels 0,1,2,… ,h − 1 of T have the maximum number of nodes possible (namely,
level i has 2
i
nodes, for 0 ≤ i ≤ h − 1) and in level h − 1, all the internal nodes are to
the left of the external nodes and there is at most one node with one child, which
must be a left child.
By insisting that a heap T be complete, we identify another important node in a
heap T, other than the root, namely, the last node of T, which we define to be the
right-most, deepest external node of T (see Figure 8.3).
The Height of a Heap
Let h denote the height of T. Another way of defining the last node of T is that it
is the node on level h such that all the other nodes of level h are to the left of it.
Insisting that T be complete also has an important consequence, as shown in

Proposition 8.5.


470
Proposition 8.5: A heap T storing n entries has height
h = logn.
Justification: From the fact that T is complete, we know that the number
of nodes of T is at least
1 + 2 + 4 + … + 2
h−1
+ 1 = 2
h
− 1 + 1
= 2
h
.
This lower bound is achieved when there is only one node on level h. In addition,
also following from T being complete, we have that the number of nodes of T is at
most
1 + 2 + 4 + … + 2
h
= 2
h + 1
− 1.
This upper bound is achieved when level h has 2h nodes. Since the number of
nodes is equal to the number n of entries, we obtain
2
h
≤ n
and

n ≤ 2
h+1
− 1.
Thus, by taking logarithms of both sides of these two inequalities, we see that
h ≤ log n
and
log(n + 1) − 1 ≤ h.
Since h is an integer, the two inequalities above imply that
h = logn.

Proposition 8.5
has an important consequence, for it implies that if we can
perform update operations on a heap in time proportional to its height, then those
operations will run in logarithmic time. Let us therefore turn to the problem of
how to efficiently perform various priority queue methods using a heap.
8.3.2 Complete Binary Trees and Their Representation
Let us discuss more about complete binary trees and how they are represented.

471
The Complete Binary Tree ADT
As an abstract data type, a complete binary T supports all the methods of binary
tree ADT (Section 7.3.1), plus the following two methods:
add(o): Add to T and return a new external node v storing element o
such that the resulting tree is a complete binary tree with last node v.
remove(): Remove the last node of T and return its element.
Using only these update operations guarantees that we will always have a
complete binary tree. As shown in Figure 8.4, there are two cases for the effect of
an add or remove. Specifically, for an add, we have the following (remove is
similar).
• If the bottom level of T is not full, then add inserts a new node on the

bottom level of T, immediately after the right-most node of this level (that is,
the last node); hence, T's height remains the same.
• If the bottom level is full, then add inserts a new node as the left child of
the left-most node of the bottom level of T; hence, T's height increases by one.
Figure 8.4: Examples of operations add and remove
on a complete binary tree, where w denotes the node
inserted by add or deleted by remove. The trees
shown in (b) and (d) are the results of performing add
operations on the trees in (a) and (c), respectively.
Likewise, the trees shown in (a) and (c) are the results
of performing remove operations on the trees in (b)
and (d), respectively.

472

The Array List Representation of a Complete Binary Tree
The array-list binary tree representation (Section 7.3.5) is especially suitable for a
complete binary tree T. We recall that in this implementation, the nodes of T are
stored in an array list A such that node v in T is the element of A with index equal
to the level number p(v) of v, defined as follows:
• If v is the root of T, then p(v) = 1.
• If v is the left child of node u, then p(v) = 2p(u).
• If v is the right child of node u, then p(v) = 2p(u) + 1.
With this implementation, the nodes of T have contiguous indices in the range
[1,n] and the last node of T is always at index n, where n is the number of nodes
of T. Figure 8.5
shows two examples illustrating this property of the last node.
Figure 8.5: Two examples showing that the last
node w of a heap with n nodes has level number n: (a)
heap T

1
with more than one node on the bottom level;
(b) heap T
2
with one node on the bottom level; (c)
array-list representation of T
1
; (d) array-list
representation of T
2
.

473

The simplifications that come from representing a complete binary tree T with an
array list aid in the implementation of methods add and remove. Assuming that no
array expansion is necessary, methods add and remove can be performed in O(1)
time, for they simply involve adding or removing the last element of the array list.
Moreover, the array list associated with T has n + 1 elements (the element at index
0 is a place-holder). If we use an extendable array that grows and shrinks for the
implementation of the array list (Section 6.1.4 and Exercise C-6.2), the space used
by the array-list representation of a complete binary tree with n nodes is O(n) and
operations add and remove take O(1) amortized time.
Java Implementation of a Complete Binary Tree
We represent the complete binary tree ADT in interface
CompleteBinaryTree shown in Code Fragment 8.9. We provide a Java class
ArrayListCompleteBinaryTree that implements the
CompleteBinaryTree interface with an array list and supports methods add
and remove in O(1) time in Code Fragments 8.10
–8.12.

Code Fragment 8.9: Interface CompleteBinaryTree
for a complete binary tree.

Code Fragment 8.10: Class
ArrayListCompleteBinaryTree implementing
interface CompleteBinaryTree using a

474
java.util.ArrayList. (Continues in Code
Fragment 8.11.)

Code Fragment 8.11: Class
ArrayListCompleteBinaryTree implementing
the complete binary tree ADT. (Continues in Code
Fragment 8.12.)

475


476
Code Fragment 8.12: Class
ArrayListCompleteBinaryTree implementing
the complete binary tree ADT. Methods children and
positions are omitted. (Continued from Code
Fragment 8.11.)

477


478

8.3.3 Implementing a Priority Queue with a Heap
We now discuss how to implement a priority queue using a heap. Our heap-based
representation for a priority queue P consists of the following (see Figure 8.6):
• heap, a complete binary tree T whose internal nodes store entries so that
the heap-order property is satisfied. We assume T is implemented using an array
list, as described in Section 8.3.2. For each internal node v of T, we denote the key
of the entry stored at v as k(v).
• comp, a comparator that defines the total order relation among the keys.
With this data structure, methods size and isEmpty take O(1) time, as usual. In
addition, method min can also be easily performed in O(1) time by accessing the
entry stored at the root of the heap (which is at index 1 in the array list).
Insertion
Let us consider how to perform insert on a priority queue implemented with a
heap T. To store a new entry (k,x) into T we add a new node z to T with operation
add so that this new node becomes the last node of T and stores entry (k,x).
After this action, the tree T is complete, but it may violate the heap-order
property. Hence, unless node z is the root of T (that is, the priority queue was
empty before the insertion), we compare key k(z) with the key k(u) stored at the
parent u of z. If k(z) ≥ k(u), the heap-order property is satisfied and the algorithm
terminates. If instead k(z) < k(u), then we need to restore the heap-order property,
which can be locally achieved by swapping the entries stored at z and u. (See
Figure 8.7c and d.) This swap causes the new entry (k,e) to move up one level.
Again, the heap-order property may be violated, and we continue swapping, going
up in T until no violation of the heap-order property occurs. (See Figure 8.7e and
h.)
Figure 8.6: Illustration of the heap-based
implementation of a priority queue.

479


Figure 8.7: Insertion of a new entry with key 2 into
the heap of Figure 8.6
: (a) initial heap; (b) after
performing operation add; (c and d) swap to locally
restore the partial order property; (e and f) another
swap; (g and h) final swap.

480

The upward movement of the newly inserted entry by means of swaps is
conventionally called up-heap bubbling. A swap either resolves the violation of
the heap-order property or propagates it one level up in the heap. In the worst
case, up-heap bubbling causes the new entry to move all the way up to the root of
heap T. (See Figure 8.7.) Thus, in the worst case, the number of swaps performed
in the execution of method insert is equal to the height of T, that is, it is
logn by Proposition 8.5.
Removal

481
Let us now turn to method removeMin of the priority queue ADT. The
algorithm for performing method removeMin using heap T is illustrated in
Figure 8.8.
We know that an entry with the smallest key is stored at the root r of T (even if
there is more than one entry with smallest key). However, unless r is the only
internal node of T, we cannot simply delete node r, because this action would
disrupt the binary tree structure. Instead, we access the last node w of T, copy its
entry to the root r, and then delete the last node by performing operation remove
of the complete binary tree ADT. (See Figure 8.8a
and b.)
Down-Heap Bubbling after a Removal

We are not necessarily done, however, for, even though T is now complete, T may
now violate the heap-order property. If T has only one node (the root), then the
heap-order property is trivially satisfied and the algorithm terminates. Otherwise,
we distinguish two cases, where r denotes the root of T:
• If r has no right child, let s be the left child of r.
• Otherwise (r has both children), let s be a child of r with the smallest key.
if k(r) ≤ k(s), the heap-order property is satisfied and the algorithm terminates. If
instead k(r) > k(s), then we need to restore the heap-order property, which can be
locally achieved by swapping the entries stored at r and s. (See Figure 8.8c
and d.)
(Note that we shouldn't swap r with s's sibling.) The swap we perform restores the
heap-order property for node r and its children, but it may violate this property at
s; hence, we may have to continue swapping down T until no violation of the
heap-order property occurs. (See Figure 8.8e and h.)
This downward swapping process is called down-heap bubbling. A swap either
resolves the violation of the heap-order property or propagates it one level down
in the heap. In the worst case, an entry moves all the way down to the bottom
level. (See Figure 8.8
.) Thus, the number of swaps performed in the execution of
method removeMin is, in the worst case, equal to the height of heap T, that is, it
is logn by Proposition 8.5.

Figure 8.8: Removal of the entry with the smallest
key from a heap: (a and b) deletion of the last node,
whose entry gets stored into the root; (c and d) swap
to locally restore the heap-order property; (e and f)
another swap; (g and h) final swap.

482


Analysis
Table 8.3 shows the running time of the priority queue ADT methods for the heap
implementation of a priority queue, assuming that two keys can be compared in
O(1) time and that the heap T is implemented with either an array list or linked
structure.

483
Table 8.3: Performance of a priority queue realized
by means of a heap, which is in turn implemented with
an array list or linked structure. We denote with n the
number of entries in the priority queue at the time a
method is executed. The space requirement is O(n).
The running time of operations insert and removeMin
is worst case for the array-list implementation of the
heap and amortized for the linked representation.
Operation
Time
size, isEmpty
O(1)
min,
O(1)
insert
O(logn)
removeMin
O(logn)
In short, each of the priority queue ADT methods can be performed in O(1) or in
O(logn) time, where n is the number of entries at the time the method is executed.
The analysis of the running time of the methods is based on the following:
• The heap T has n nodes, each storing a reference to an entry.
• Operations add and remove on T take either O(1) amortized time (array-

list representation) or O(logn) worst-case time.
• In the worst case, up-heap and down-heap bubbling perform a number of
swaps equal to the height of T.
• The height of heap T is O(logn), since T is complete (Proposition 8.5).

484
We conclude that the heap data structure is a very efficient realization of the
priority queue ADT, independent of whether the heap is implemented with a
linked structure or an array list. The heap-based implementation achieves fast
running times for both insertion and removal, unlike the list-based priority queue
implementations. Indeed, an important consequence of the efficiency of the heap-
based implementation is that it can speed up priority-queue sorting to be much
faster than the list-based insertion-sort and selection-sort algorithms.
8.3.4 A Java Heap Implementation
A Java implementation of a heap-based priority queue is shown in Code Frag ments
8.13-8.15. To aid in modularity, we delegate the maintenance of the structure of the
heap itself to a complete binary tree.
Code Fragment 8.13: Class HeapPriorityQueue,
which implements a priority queue with a heap. A
nested class MyEntry is used for the entries of the
priority queue, which form the elements in the heap
tree. (Continues in Code Fragment 8.14
.)

485

Code Fragment 8.14: Methods min, insert and
removeMin and some auxiliary methods of class
HeapPriorityQueue. (Continues in Code Fragment 8.15
.)


486

×