Tải bản đầy đủ (.pdf) (31 trang)

Tài liệu nghiên cứu về Btree

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (183.66 KB, 31 trang )

B-trees
Andreas Kaltenbrunner, Lefteris Kellis & Dani Mart´ı

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 1


What are B-trees?
• B-trees are balanced search trees: height = O log(n) for the worst case.
• They were designed to work well on Direct Access secondary storage devices
(magnetic disks).
• Similar to red-black trees, but show better performance on disk I/O operations.
• B-trees (and variants like B+ and B* trees ) are widely used in database systems.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 2


Motivation
Data structures on secondary storage:
• Memory capacity in a computer system consists broadly on 2 parts:
1. Primary memory: uses memory chips.
2. Secondary storage: based on magnetic disks.
• Magnetic disks are cheaper and have higher capacity.
• But they are much slower because they have moving parts.
B-trees try to read as much information as possible in every disk access operation.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 3


An example
The 21 english consonants as keys of a B-tree:


M

DH

BC

F G

QT X

J KL

N P

RS

V W

Y Z

• Every internal node x containing n[x] keys has n[x] + 1 children.
• All leaves are at the same depth in the tree.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 4


B-tree: definition
A B-tree T is a rooted tree (with root root[T ]) with properties:
• Every node x has four fields:
1. The number of keys currently stored in node x, n[x].

2. The n[x] keys themselves, stored in nondecreasing order:
key1[x] ≤ key2[x] ≤ · · · ≤ keyn[x][x] .
3. A boolean value,
leaf[x] =

True if x is a leaf ,
False if x is an internal node .

4. n[x] + 1 pointers, c1[x], c2[x], . . . , cn[x]+1[x] to its children.
(As leaf nodes have no children their ci are undefined).

• Representing pointers and keys in a node:

key1
c1

key2
c2

keyn
cn

cn+1

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 5


B-tree: definition (II)
Properties (cont):
• The keys keyi[x] separate the ranges of keys stored in each subtree: if ki is any key

stored in the subtree with root ci[x], then:
k1 ≤ key1[x] ≤ k2 ≤ key2[x] ≤ . . . ≤ keyn[x] ≤ kn[x]+1 .
• All leaves have the same height, which is the tree’s height h.
• There are upper on lower bounds on the number of keys on a node.
To specify these bounds we use a fixed integer t ≥ 2, the minimum degree of the
B-tree:
– lower bound: every node other than root must have at least t − 1 keys
=⇒ At least t children.
– upper bound: every node can contain at most 2t − 1 keys =⇒ every internal node has at most 2t children.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 6


The height of a B-tree (I)
Example (worst-case): A B-tree of height 3 containing a minimum possible number
of keys.
depth number
of nodes

1

t−1

t−1

t−1

t

t


···

···

t−1

t−1

t−1

t

t

t

t

t − 1 ··· t − 1

t − 1 ··· t − 1

t − 1 ··· t − 1

t − 1 ··· t − 1

0

1


1

2

2

2t

3

2t2

Inside each node x, we show the number of keys n[x] contained.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 7


The height of a B-tree (II)
• Number of disk accesses proportional to the height of the B-tree.
• The worst-case height of a B-tree is
n+1
h ≤ logt
∼ O(logt n) .
2
• Main advantadge of B-trees compared to red-black trees:
The base of the logarithm, t, can be much larger.
=⇒ B-trees save a factor ∼ log t over red-black trees in the number of
nodes examined in tree operations.
=⇒ Number of disk accesses substantially reduced.


B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 8


Basic operations on B-trees
Details of the following operations:
• B-Tree-Search
• B-Tree-Create
• B-Tree-Insert
• B-Tree-Delete
Conventions:
• Root of B-tree is always in main memory (Disk-Read on the root is never required)
• Any node pased as parameter must have had a Disk-Read operation performed
on them.
Procedures presented are all top down algorithms (no need to back up) starting at
the root of the tree.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 9


Searching a B-tree (I)
2 inputs: x, pointer to the root node of a subtree,
k, a key to be searched in that subtree.
function B-Tree-Search(x, k ) returns (y, i ) such that keyi[y] = k or nil
i ←1
while i ≤ n[x] and k > keyi[x]
do i ← i + 1
if i ≤ n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return nil

else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k )

At each internal node x we make an (n[x] + 1)-way branching decision.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 10


Searching a B-tree (II)
• Number of disk pages accessed by B-Tree-Search
Θ(h) = Θ(logt n)
• time of while loop within each node is O(t) therefore the total CPU time
O(th) = O(t logt n)

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 11


Creating an empty B-tree
B-Tree-Create(T )
x ← Allocate-Node()
leaf[x] ← true
n[x] ← 0
Disk-Write(x)
root[T ] ← x

• Allocate-Node() allocates one disk page to be used as a new node
• requires O(1) disk operations an O(1) CPU time

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 12



Splitting a node in a B-tree (I)
• Inserting a key into a B-tree is more complicated than in binary search tree.
• Splitting of a full node y (2t − 1 keys) fundamental operation during insertion.
• Splitting around median key keyt[y] into 2 nodes.
• Median key moves up into y’s parent (which has to be nonfull).

]
i+

ey
K

ey

i [x

]

1 [x

]
1 [x

K

i−

· · · 14 23 · · ·


K

x

ey

i+

ey

K

x

K

ey

i [x

]

1 [x

]

• If y is root node tree height grows by 1.

· · · 14 19 23 · · ·
−→


y = ci [x]

16 17 18 19 20 21 22

y = ci [x]

16 17 18

z = ci+1 [x]

20 21 22

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 13


Splitting a node in a B-tree (II)
3 inputs: x, a nonfull internal node,
i, an index,
y, a node such that y = ci[x] is a full child of x.
B-Tree-Split-Child(x, i, y)
z ← Allocate-Node()
leaf[z] ← leaf[y]
n[z ] ← t −1
for j ← 1 to t − 1
do keyj [z] ← keyj+t[y]
if not leaf[y]
then for j ← 1 to t
do cj [z] ← cj+t[y]
n[y] ← t − 1


for j ← n[x] + 1 downto i + 1
do cj+1[x] ← cj [x]
ci+1[x] ← z
for j ← n[x] downto i
do keyj+1[x] ← keyj [x]
keyi[x] ← keyt[y]
n[x] ← n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)

CPU time used by B-Tree-Split-Child is Θ(t) due to the loops

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 14


Inserting a key into a B-tree (I)
• The key is always inserted in a leaf node
• Inserting is done in a single pass down the tree
• Requires O(h) = O(logt n) disk accesses
• Requires O(th) = O(t logt n) CPU time
• Uses B-Tree-Split-Child to guarantee that recursion never descends to a full
node

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 15


Inserting a key into a B-tree (II)
2 inputs: T , the root node,

k, key to insert.
B-Tree-Insert(T, k )
r ← root[T ]
if n[r] = 2t − 1
then s ← Allocate-Node()
root[T ] ← s
leaf[s] ← false
n[s] ← 0
c1[s] ← r
B-Tree-Split-Child(s,1,r)
B-Tree-Insert-Nonfull(s,k)
else B-Tree-Insert-Nonfull(r,k)

Uses B-Tree-Insert-Nonfull to insert key k into nonfull node x

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 16


Inserting a key into a nonfull node of a B-tree
B-Tree-Insert-Nonfull(x, k)
i ← n[x]
if leaf[x]
then while i ≥ 1 and k < keyi[x]
do keyi+1[x] ← keyi[x]
i←i − 1
keyi+1[x] ← k
n[x] ← n[x] + 1
Disk-Write(x)
else while i ≥ 1 and k < keyi[x]
do i ← i − 1

i←i + 1
Disk-Read(ci[x])
if n ci[x] = 2t − 1
then B-Tree-Split-Child x, i, ci[x]
if k > keyi[x]
then i ← i + 1
B-Tree-Insert-Nonfull(ci[x], k)
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 17


Inserting a key - Examples (I)
7 13 16 23

Initial tree:
t=3

1345

10 11

14 15

18 19 20 21 22

24 26

7 13 16 23

2 inserted:
12345


10 11

14 15

18 19 20 21 22

24 26

7 13 16 20 23

17 inserted:
(to the previous one)

12345

10 11

14 15

17 18 19

21 22

25 26

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 18


Inserting a key - Examples (II)

7 13 16 20 23

Initial tree:
t=3

12345

10 11

14 15

17 18 19

21 22

25 26

16
7 13

12 inserted:
12345

10 11 12

20 24

14 15

17 18 19


21 22

25 26

16

6 inserted:

3 7 13

20 24

(to the previous one)

12

456

10 11 12

14 15

17 18 19

21 22

25 26

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 19



Deleting a Key from a B-tree
• Similar to insertion, with the addition of a couple of special cases
• Key can be deleted from any node.
• More complicated procedure, but similar performance figures: O(h) disk accesses,
O(th) = O(t logt n) CPU time
• Deleting is done in a single pass down the tree, but needs to return to the node with
the deleted key if it is an internal node
• In the latter case, the key is first moved down to a leaf. Final deletion always takes
place on a leaf

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 20


Deleting a Key — Cases I
• Considering 3 distinct cases for deletion
• Let k be the key to be deleted, x the node containing the key. Then the cases are:
1. If key k is in node x and x is a leaf, simply delete k from x
2. If key k is in node x and x is an internal node, there are three cases to consider:
(a) If the child y that precedes k in node x has at least t keys (more than the
minimum), then find the predecessor key k in the subtree rooted at y. Recursively
delete k and replace k with k in x
(b) Symmetrically, if the child z that follows k in node x has at least t keys, find the
successor k and delete and replace as before. Note that finding k and deleting
it can be performed in a single downward pass
(c) Otherwise, if both y and z have only t − 1 (minimum number) keys, merge k and
all of z into y, so that both k and the pointer to z are removed from x. y now
contains 2t − 1 keys, and subsequently k is deleted
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 21



Deleting a Key — Cases II
3. If key k is not present in an internal node x, determine the root of the appropriate
subtree that must contain k. If the root has only t − 1 keys, execute either of the
following two cases to ensure that we descend to a node containing at least t keys.
Finally, recurse to the appropriate child of x
(a) If the root has only t − 1 keys but has a sibling with t keys, give the root an extra
key by moving a key from x to the root, moving a key from the roots immediate
left or right sibling up into x, and moving the appropriate child from the sibling
to x
(b) If the root and all of its siblings have t − 1 keys, merge the root with one sibling.
This involves moving a key down from x into the new merged node to become
the median key for that node.

B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 22


Deleting a Key — Case 1
16
3 7 13

Initial tree:
12

456

10 11 12

20 23


14 15

17 18 19

21 22

24 26

16
3 7 13

6 deleted:
12

45

10 11 12

20 23

14 15

17 18 19

21 22

24 26

• The first and simple case involves deleting the key from the leaf. t − 1 keys remain


B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 23


Deleting a Key — Cases 2a, 2b
16
3 7 13

Initial tree:
12

45

10 11 12

20 23

14 15

17 18 19

21 22

24 26

16
3 7 12

13 deleted:
12


45

10 11

20 23

14 15

17 18 19

21 22

24 26

• Case 2a is illustrated. The predecessor of 13, which lies in the preceding child of x,
is moved up and takes 13s position. The preceding child had a key to spare in this
case
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 24


Deleting a Key — Case 2c
16
3 7 12

Initial tree:
12

45


10 11

20 23

14 15

17 18 19

21 22

24 26

16
3 12

7 deleted:
12

4 5 10 11

20 23

14 15

17 18 19

21 22

24 26


• Here, both the preceding and successor children have t − 1 keys, the minimum
allowed. 7 is initially pushed down and between the children nodes to form one leaf,
and is subsequently removed from that leaf
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 25


×