Tải bản đầy đủ (.pdf) (28 trang)

Distributed Database Management Systems: Lecture 30

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (82.08 KB, 28 trang )

Distributed Database
Management Systems
Lecture 30


In the previous lecture
• Locking based CC
• Timestamp ordering
based CC
• Concluded TM.


In this Lecture
• Basic Concepts of Query
Optimization
• QP in centralized and
Distributed DBs.


Introduction
• SQL one of the success
factors of RDBMS
• Query processor
transforms complex
queries into concise and
simple ones


• Query processing is
critical performance
issue


• QP a complex
problem specially in
DDBS environment


• Main function of QP is to
transform an SQL query
into equivalent relational
algebra one (low level
language)
• Transformation must
achieve correctness and
efficiency


• Correctness is
straightforward since
rules exist
• An SQL query can
have many equivalents
in R Algebra


• Considering the tables
• EMP(eNo, eName, title)
• ASG(eNo, pNo, resp, dur)
• PROJ(pNo, pName,
budget, loc)
• Query: Get the names of
employees who are

managing a project


• SELECT eName
FROM EMP, ASG
WHERE EMP.eNo =
ASG.eNo
AND resp = ‘Manager’


( resp=‘Manager’ ^ EMP.eNo =
ASG.eNo) (EMPxASG)
eName

(EMP ⋈ ( resp=‘Manager’
(ASG)))
• Obviously second one needs
less computing resources
since avoids Cartesian product
eName


• Centralized QP is to
choose best query
execution plan
• Distributed is more
complex; it also involves
the selection of site to
execute query



• Same query in DDBS
• Suppose EMP and ASG
are HF as
• EMP1 = eNo ≤ ‘E3’ (EMP)
• EMP2 = eNo > ‘E3’ (EMP)
• ASG1 = eNo ≤ ‘E3’ (ASG)
• ASG2 = eNo > ‘E3’ (ASG)


• Further suppose these
fragments are stored
at site 1, 2, 3 and 4
and result at site 5


Site 5

result = EMP1’ U EMP2’

Site 3
EMP1’=EMP1

Site 1
ASC1’=

EMP1’

EMP2’


⋈(ASG ’)
1

ASG1’

resp = ‘Manager

(ASG1)

EMP2’=EMP2

Site 4

⋈(ASG ’)

ASG2’
ASC2’=

resp = ‘Manager

2

Site 2
(ASG2)


result = (EMP1 U EMP2) ⋈ eNo
resp = ‘Manager’ (ASG1

ASG1


Site 1

ASG2

Site 2

EMP1

U ASG2)
EMP2

Site 3

Site 4


Lets Assume
• size(EMP)
• size(ASG)
• tuple access cost
• tuple transfer cost
• There are 20 Managers
• Data distributed evenly at all
sites

400
1000
1 unit
10 units



Strategy 1
• produce ASG': 20*1

20

• transfer ASG' to the sites of
E: 20 * 10
• produce EMP': (10+10)
*1*2
• transfer EMP' to result site:
20*10
Total

200
40
200
460


Strategy 2
• Transfer EMP to site 5: 400
* 10
• Transfer ASG to the site 5
1000 * 10
• Produce ASG‘ by selecting
ASG
• Join EMP and ASG’


4000
10000
1000
8000

Total 23000


Query Optimization
• An important aspect of QP
• Minimize resource
consumption
• I/O cost + CPU cost +
communication cost
• First two in Centralized DB


• Communication Cost will
dominate in WAN
• Not that dominant in
LANs, so total cost
should be considered in
LANs
• QO can also maximize
throughput


Operators’ Complexity
• Select, Project (without
duplicate elimination)


O(n)

• Project (with duplicate
elimination), Group

O(nlogn)

• Join, Semi-Join,
Division, Set Operators O(nlog n)
• Cartesian Product
O(n2)


Characterization of
Query Processors


• Types of Optimization
–Exhaustive search for the
cost of each strategy to find
the most optimal one
–May be very costly in case of
multiple options and more
fragments
–Heuristics


• Optimization Timing
–Static: during compilation

• Size of intermediate tables not
known always
• Cost justified with repeated
execution

–Dynamic: during execution
• Intermediate tables’ size known
• Re-optimzation may be required


• Statistics
–Relation/Fragment:
Cardinality, size of a tuple,
fraction of tuples participating
in a join with another relation
–Attribute: cardinality of
domain, actual number of
distinct values


×