Tải bản đầy đủ (.pdf) (26 trang)

Distributed Database Management Systems: Lecture 34

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (82.21 KB, 26 trang )

Distributed Database
Management Systems
Lecture 34


In the previous lecture
• Concluded Data Localization
• Query Optimization

– Components: Search space, cost
model, search strategy
– SS consists of eq. query trees
– SSts could be static, dynamic or
randomized
– Cost model sees response and total
times…


• Query Optimization
–Transmission cost is the most
important
–Another major factor is size of
interm tables
–Database statistics are used to
evaluate size of iterm. Tables
–Selectivity factor, card, size are
some major figures


In today’s Lecture
• Query Optimization


• Centralized QO
–Best access path
–Join Processing

• QO in Distributed Environment.


• Input to Optimizer is a
query tree generated
through QD
• Single relation queries:
executed according to
the best access path
• Queries involving Joins
in three steps


1.Determine the possible
ordering of joins2.Determine the cost of
each ordering
3.Choose the join ordering
with minimal cost


• Cost model assigns
(estimates) costs to
operations based on
cardinality of operands
• Obtained from
Database statistics-



• Two major steps in
Optimization Algorithm
–Best access path for
individual relation with pred
–The best join ordering; two
possibilities.


1- Nested loops
• for each tuple of external
relation (card(n1))
• for each tuple of internal
relation (card(n2))
–join two tuples if the join predicate
is true

• Complexity: n1∗n2 (no index)


• Merge join
• Sorted relations
• Merge relations
• Complexity: n1 + n2 if
relations are
previously sorted


• Example: Select eName

From EMP, ASG, PROJ
Where
EMP.eNo = ASG.eNo &
PROJ.pNo = ASG.pNo &
pName = ‘CAD/CAM’


• EMP has an index on eNo
• ASG has an index on pNo
• PROJ has an index on
pNo and an index on
pName
ASG

eNo
EMP

pNo
PROJ


1- Choose the best access
paths to each relation
• EMP: sequential scan (no
selection on EMP)
• ASG: sequential scan (no
selection on ASG)
• PROJ: index on pName
(there is a selection on
PROJ based on pName)



2- Determine the best join
ordering
–Total 3! orderings are
possible
–Rather than computing for
all, some of them are pruned
–Shown in the tree, next page


EMP



ASG



ASG
E
EMP
A
SG EMP x PROJMP ASG
OJ

PROJ






PROJ
A
PR
SG PROJ x EMP

(ASG⋈EMP )⋈PROJ (PROJ
MP

⋈ASG)⋈E


Join Ordering in
Fragmented Queries


• It is important
• Assumptions• Two alternatives
–Join Ordering
–Replaced by Semi-Joins

• Former is more difficult.


Join Ordering
• Two relations: move the
smaller relation to the
other site
If size(R) < size(S)

S

R

If size(S) < size(R)


• More than 2 relations
–Calculate all possible
costs
–Requires to compute size
of intermediate tables
–Difficult! Lets see why
–EMP ⋈ ASG ⋈ PROJ


Site 2
ASG

eNo
EMP

Site 1

pNo

PROJ

Site 3


• Strategy 1:
EMPsite2, site2 computes
EMP’= EMP ⋈ ASGsite3
computes EMP’ ⋈ PROJ


Site 2
ASG

eNo
EMP

Site 1

pNo

PROJ

Site 3

• Strategy 2:
ASGsite1, site1 computes
EMP’= EMP ⋈ ASGsite3
computes EMP’ ⋈ PROJ


Site 2
ASG

eNo

EMP

Site 1

pNo

PROJ

Site 3

• Strategy 3:
ASGsite3, site3 computes
ASG’= PROJ ⋈ ASGsite1
computes EMP ⋈ ASG’


Site 2
ASG

eNo
EMP

Site 1

pNo

PROJ

Site 3


• Strategy 4:
PROJsite2, site2 computes
PROJ’= PROJ ⋈ ASGsite1
computes EMP ⋈ PROJ’


Site 2
ASG

eNo
EMP

Site 1

pNo

PROJ

Site 3

• Strategy 5:
EMP, PROJsite2, site2
computes PROJ ⋈ ASG ⋈
EMP


Which one to Choose

• We need to know


–Size of operand tables
–Estimate interm tables’ size

• Computing all possibilities
could be lengthy
• Heuristic: Consider only
the size of tables-


×