Distributed Database
Management Systems
Lecture 34
In the previous lecture
• Concluded Data Localization
• Query Optimization
– Components: Search space, cost
model, search strategy
– SS consists of eq. query trees
– SSts could be static, dynamic or
randomized
– Cost model sees response and total
times…
• Query Optimization
–Transmission cost is the most
important
–Another major factor is size of
interm tables
–Database statistics are used to
evaluate size of iterm. Tables
–Selectivity factor, card, size are
some major figures
In today’s Lecture
• Query Optimization
• Centralized QO
–Best access path
–Join Processing
• QO in Distributed Environment.
• Input to Optimizer is a
query tree generated
through QD
• Single relation queries:
executed according to
the best access path
• Queries involving Joins
in three steps
1.Determine the possible
ordering of joins2.Determine the cost of
each ordering
3.Choose the join ordering
with minimal cost
• Cost model assigns
(estimates) costs to
operations based on
cardinality of operands
• Obtained from
Database statistics-
• Two major steps in
Optimization Algorithm
–Best access path for
individual relation with pred
–The best join ordering; two
possibilities.
1- Nested loops
• for each tuple of external
relation (card(n1))
• for each tuple of internal
relation (card(n2))
–join two tuples if the join predicate
is true
• Complexity: n1∗n2 (no index)
• Merge join
• Sorted relations
• Merge relations
• Complexity: n1 + n2 if
relations are
previously sorted
• Example: Select eName
From EMP, ASG, PROJ
Where
EMP.eNo = ASG.eNo &
PROJ.pNo = ASG.pNo &
pName = ‘CAD/CAM’
• EMP has an index on eNo
• ASG has an index on pNo
• PROJ has an index on
pNo and an index on
pName
ASG
eNo
EMP
pNo
PROJ
1- Choose the best access
paths to each relation
• EMP: sequential scan (no
selection on EMP)
• ASG: sequential scan (no
selection on ASG)
• PROJ: index on pName
(there is a selection on
PROJ based on pName)
2- Determine the best join
ordering
–Total 3! orderings are
possible
–Rather than computing for
all, some of them are pruned
–Shown in the tree, next page
EMP
⋈
ASG
⋈
ASG
E
EMP
A
SG EMP x PROJMP ASG
OJ
PROJ
⋈
⋈
PROJ
A
PR
SG PROJ x EMP
(ASG⋈EMP )⋈PROJ (PROJ
MP
⋈ASG)⋈E
Join Ordering in
Fragmented Queries
• It is important
• Assumptions• Two alternatives
–Join Ordering
–Replaced by Semi-Joins
• Former is more difficult.
Join Ordering
• Two relations: move the
smaller relation to the
other site
If size(R) < size(S)
S
R
If size(S) < size(R)
• More than 2 relations
–Calculate all possible
costs
–Requires to compute size
of intermediate tables
–Difficult! Lets see why
–EMP ⋈ ASG ⋈ PROJ
Site 2
ASG
eNo
EMP
Site 1
pNo
PROJ
Site 3
• Strategy 1:
EMPsite2, site2 computes
EMP’= EMP ⋈ ASGsite3
computes EMP’ ⋈ PROJ
Site 2
ASG
eNo
EMP
Site 1
pNo
PROJ
Site 3
• Strategy 2:
ASGsite1, site1 computes
EMP’= EMP ⋈ ASGsite3
computes EMP’ ⋈ PROJ
Site 2
ASG
eNo
EMP
Site 1
pNo
PROJ
Site 3
• Strategy 3:
ASGsite3, site3 computes
ASG’= PROJ ⋈ ASGsite1
computes EMP ⋈ ASG’
Site 2
ASG
eNo
EMP
Site 1
pNo
PROJ
Site 3
• Strategy 4:
PROJsite2, site2 computes
PROJ’= PROJ ⋈ ASGsite1
computes EMP ⋈ PROJ’
Site 2
ASG
eNo
EMP
Site 1
pNo
PROJ
Site 3
• Strategy 5:
EMP, PROJsite2, site2
computes PROJ ⋈ ASG ⋈
EMP
Which one to Choose
• We need to know
–Size of operand tables
–Estimate interm tables’ size
• Computing all possibilities
could be lengthy
• Heuristic: Consider only
the size of tables-