B-trees
Andreas Kaltenbrunner, Lefteris Kellis & Dani Mart´ı
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 1
What are B-trees?
• B-trees are balanced search trees: height = O log(n) for the worst case.
• They were designed to work well on Direct Access secondary storage devices
(magnetic disks).
• Similar to red-black trees, but show better performance on disk I/O operations.
• B-trees (and variants like B+ and B* trees ) are widely used in database systems.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 2
Motivation
Data structures on secondary storage:
• Memory capacity in a computer system consists broadly on 2 parts:
1. Primary memory: uses memory chips.
2. Secondary storage: based on magnetic disks.
• Magnetic disks are cheaper and have higher capacity.
• But they are much slower because they have moving parts.
B-trees try to read as much information as possible in every disk access operation.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 3
An example
The 21 english consonants as keys of a B-tree:
M
DH
BC
F G
QT X
J KL
N P
RS
V W
Y Z
• Every internal node x containing n[x] keys has n[x] + 1 children.
• All leaves are at the same depth in the tree.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 4
B-tree: definition
A B-tree T is a rooted tree (with root root[T ]) with properties:
• Every node x has four fields:
1. The number of keys currently stored in node x, n[x].
2. The n[x] keys themselves, stored in nondecreasing order:
key1[x] ≤ key2[x] ≤ · · · ≤ keyn[x][x] .
3. A boolean value,
leaf[x] =
True if x is a leaf ,
False if x is an internal node .
4. n[x] + 1 pointers, c1[x], c2[x], . . . , cn[x]+1[x] to its children.
(As leaf nodes have no children their ci are undefined).
• Representing pointers and keys in a node:
key1
c1
key2
c2
keyn
cn
cn+1
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 5
B-tree: definition (II)
Properties (cont):
• The keys keyi[x] separate the ranges of keys stored in each subtree: if ki is any key
stored in the subtree with root ci[x], then:
k1 ≤ key1[x] ≤ k2 ≤ key2[x] ≤ . . . ≤ keyn[x] ≤ kn[x]+1 .
• All leaves have the same height, which is the tree’s height h.
• There are upper on lower bounds on the number of keys on a node.
To specify these bounds we use a fixed integer t ≥ 2, the minimum degree of the
B-tree:
– lower bound: every node other than root must have at least t − 1 keys
=⇒ At least t children.
– upper bound: every node can contain at most 2t − 1 keys =⇒ every internal node has at most 2t children.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 6
The height of a B-tree (I)
Example (worst-case): A B-tree of height 3 containing a minimum possible number
of keys.
depth number
of nodes
1
t−1
t−1
t−1
t
t
···
···
t−1
t−1
t−1
t
t
t
t
t − 1 ··· t − 1
t − 1 ··· t − 1
t − 1 ··· t − 1
t − 1 ··· t − 1
0
1
1
2
2
2t
3
2t2
Inside each node x, we show the number of keys n[x] contained.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 7
The height of a B-tree (II)
• Number of disk accesses proportional to the height of the B-tree.
• The worst-case height of a B-tree is
n+1
h ≤ logt
∼ O(logt n) .
2
• Main advantadge of B-trees compared to red-black trees:
The base of the logarithm, t, can be much larger.
=⇒ B-trees save a factor ∼ log t over red-black trees in the number of
nodes examined in tree operations.
=⇒ Number of disk accesses substantially reduced.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 8
Basic operations on B-trees
Details of the following operations:
• B-Tree-Search
• B-Tree-Create
• B-Tree-Insert
• B-Tree-Delete
Conventions:
• Root of B-tree is always in main memory (Disk-Read on the root is never required)
• Any node pased as parameter must have had a Disk-Read operation performed
on them.
Procedures presented are all top down algorithms (no need to back up) starting at
the root of the tree.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 9
Searching a B-tree (I)
2 inputs: x, pointer to the root node of a subtree,
k, a key to be searched in that subtree.
function B-Tree-Search(x, k ) returns (y, i ) such that keyi[y] = k or nil
i ←1
while i ≤ n[x] and k > keyi[x]
do i ← i + 1
if i ≤ n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return nil
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k )
At each internal node x we make an (n[x] + 1)-way branching decision.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 10
Searching a B-tree (II)
• Number of disk pages accessed by B-Tree-Search
Θ(h) = Θ(logt n)
• time of while loop within each node is O(t) therefore the total CPU time
O(th) = O(t logt n)
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 11
Creating an empty B-tree
B-Tree-Create(T )
x ← Allocate-Node()
leaf[x] ← true
n[x] ← 0
Disk-Write(x)
root[T ] ← x
• Allocate-Node() allocates one disk page to be used as a new node
• requires O(1) disk operations an O(1) CPU time
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 12
Splitting a node in a B-tree (I)
• Inserting a key into a B-tree is more complicated than in binary search tree.
• Splitting of a full node y (2t − 1 keys) fundamental operation during insertion.
• Splitting around median key keyt[y] into 2 nodes.
• Median key moves up into y’s parent (which has to be nonfull).
]
i+
ey
K
ey
i [x
]
1 [x
]
1 [x
K
i−
· · · 14 23 · · ·
K
x
ey
i+
ey
K
x
K
ey
i [x
]
1 [x
]
• If y is root node tree height grows by 1.
· · · 14 19 23 · · ·
−→
y = ci [x]
16 17 18 19 20 21 22
y = ci [x]
16 17 18
z = ci+1 [x]
20 21 22
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 13
Splitting a node in a B-tree (II)
3 inputs: x, a nonfull internal node,
i, an index,
y, a node such that y = ci[x] is a full child of x.
B-Tree-Split-Child(x, i, y)
z ← Allocate-Node()
leaf[z] ← leaf[y]
n[z ] ← t −1
for j ← 1 to t − 1
do keyj [z] ← keyj+t[y]
if not leaf[y]
then for j ← 1 to t
do cj [z] ← cj+t[y]
n[y] ← t − 1
for j ← n[x] + 1 downto i + 1
do cj+1[x] ← cj [x]
ci+1[x] ← z
for j ← n[x] downto i
do keyj+1[x] ← keyj [x]
keyi[x] ← keyt[y]
n[x] ← n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)
CPU time used by B-Tree-Split-Child is Θ(t) due to the loops
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 14
Inserting a key into a B-tree (I)
• The key is always inserted in a leaf node
• Inserting is done in a single pass down the tree
• Requires O(h) = O(logt n) disk accesses
• Requires O(th) = O(t logt n) CPU time
• Uses B-Tree-Split-Child to guarantee that recursion never descends to a full
node
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 15
Inserting a key into a B-tree (II)
2 inputs: T , the root node,
k, key to insert.
B-Tree-Insert(T, k )
r ← root[T ]
if n[r] = 2t − 1
then s ← Allocate-Node()
root[T ] ← s
leaf[s] ← false
n[s] ← 0
c1[s] ← r
B-Tree-Split-Child(s,1,r)
B-Tree-Insert-Nonfull(s,k)
else B-Tree-Insert-Nonfull(r,k)
Uses B-Tree-Insert-Nonfull to insert key k into nonfull node x
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 16
Inserting a key into a nonfull node of a B-tree
B-Tree-Insert-Nonfull(x, k)
i ← n[x]
if leaf[x]
then while i ≥ 1 and k < keyi[x]
do keyi+1[x] ← keyi[x]
i←i − 1
keyi+1[x] ← k
n[x] ← n[x] + 1
Disk-Write(x)
else while i ≥ 1 and k < keyi[x]
do i ← i − 1
i←i + 1
Disk-Read(ci[x])
if n ci[x] = 2t − 1
then B-Tree-Split-Child x, i, ci[x]
if k > keyi[x]
then i ← i + 1
B-Tree-Insert-Nonfull(ci[x], k)
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 17
Inserting a key - Examples (I)
7 13 16 23
Initial tree:
t=3
1345
10 11
14 15
18 19 20 21 22
24 26
7 13 16 23
2 inserted:
12345
10 11
14 15
18 19 20 21 22
24 26
7 13 16 20 23
17 inserted:
(to the previous one)
12345
10 11
14 15
17 18 19
21 22
25 26
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 18
Inserting a key - Examples (II)
7 13 16 20 23
Initial tree:
t=3
12345
10 11
14 15
17 18 19
21 22
25 26
16
7 13
12 inserted:
12345
10 11 12
20 24
14 15
17 18 19
21 22
25 26
16
6 inserted:
3 7 13
20 24
(to the previous one)
12
456
10 11 12
14 15
17 18 19
21 22
25 26
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 19
Deleting a Key from a B-tree
• Similar to insertion, with the addition of a couple of special cases
• Key can be deleted from any node.
• More complicated procedure, but similar performance figures: O(h) disk accesses,
O(th) = O(t logt n) CPU time
• Deleting is done in a single pass down the tree, but needs to return to the node with
the deleted key if it is an internal node
• In the latter case, the key is first moved down to a leaf. Final deletion always takes
place on a leaf
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 20
Deleting a Key — Cases I
• Considering 3 distinct cases for deletion
• Let k be the key to be deleted, x the node containing the key. Then the cases are:
1. If key k is in node x and x is a leaf, simply delete k from x
2. If key k is in node x and x is an internal node, there are three cases to consider:
(a) If the child y that precedes k in node x has at least t keys (more than the
minimum), then find the predecessor key k in the subtree rooted at y. Recursively
delete k and replace k with k in x
(b) Symmetrically, if the child z that follows k in node x has at least t keys, find the
successor k and delete and replace as before. Note that finding k and deleting
it can be performed in a single downward pass
(c) Otherwise, if both y and z have only t − 1 (minimum number) keys, merge k and
all of z into y, so that both k and the pointer to z are removed from x. y now
contains 2t − 1 keys, and subsequently k is deleted
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 21
Deleting a Key — Cases II
3. If key k is not present in an internal node x, determine the root of the appropriate
subtree that must contain k. If the root has only t − 1 keys, execute either of the
following two cases to ensure that we descend to a node containing at least t keys.
Finally, recurse to the appropriate child of x
(a) If the root has only t − 1 keys but has a sibling with t keys, give the root an extra
key by moving a key from x to the root, moving a key from the roots immediate
left or right sibling up into x, and moving the appropriate child from the sibling
to x
(b) If the root and all of its siblings have t − 1 keys, merge the root with one sibling.
This involves moving a key down from x into the new merged node to become
the median key for that node.
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 22
Deleting a Key — Case 1
16
3 7 13
Initial tree:
12
456
10 11 12
20 23
14 15
17 18 19
21 22
24 26
16
3 7 13
6 deleted:
12
45
10 11 12
20 23
14 15
17 18 19
21 22
24 26
• The first and simple case involves deleting the key from the leaf. t − 1 keys remain
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 23
Deleting a Key — Cases 2a, 2b
16
3 7 13
Initial tree:
12
45
10 11 12
20 23
14 15
17 18 19
21 22
24 26
16
3 7 12
13 deleted:
12
45
10 11
20 23
14 15
17 18 19
21 22
24 26
• Case 2a is illustrated. The predecessor of 13, which lies in the preceding child of x,
is moved up and takes 13s position. The preceding child had a key to spare in this
case
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 24
Deleting a Key — Case 2c
16
3 7 12
Initial tree:
12
45
10 11
20 23
14 15
17 18 19
21 22
24 26
16
3 12
7 deleted:
12
4 5 10 11
20 23
14 15
17 18 19
21 22
24 26
• Here, both the preceding and successor children have t − 1 keys, the minimum
allowed. 7 is initially pushed down and between the children nodes to form one leaf,
and is subsequently removed from that leaf
B-trees, A. Kaltenbrunner, L. Kellis & D. Mart´ı 25