Distributed Database
Management Systems
Lecture 17
In this Lecture
• Continue with VF
– Information Requirement
– Attribute affinities
Virtual University of Pakistan
2
Replication of Key attributes does
not violate the disjoint ness
condition
Virtual University of Pakistan
3
Vertical Fragmentation
Information Requirements
Virtual University of Pakistan
4
• Basic idea of VF is access efficiency
• Information Requirement is application
based
• Attribute affinities: obtained from more
primitive usage data
Virtual University of Pakistan
5
• (80-20 Rule)
• Attribute usage values: Given a set of
queries Q = {q1 , q2 ,…, qq} that will
run on the relation R[A1, A2 ,…, An]
Virtual University of Pakistan
6
Attribute Usage Value
use(qi,Aj )
use(qi,Aj ) =
1 if attribute Aj is
referenced by query qi
0 otherwise
use(qi,• ) can be defined accordingly
Virtual University of Pakistan
7
PROJ(jNo, jName, budget, loc)
q1: SELECT BUDGET FROM PROJ WHERE
JNO=Value
q2: SELEC JNAME, BUDGET FROM PROJ
Virtual University of Pakistan
8
q3: SELECT JNAME FROM PROJ
WHERELOC=Value
q4: SELECTSUM(BUDGET) FROM PROJ
WHERE LOC=Value
Let A1= jNo, A2= jName, A3= budget, A4= loc
Virtual University of Pakistan
9
q1
q2
q3
q4
A1
A2
A3
A4
1
0
0
0
0
1
1
0
1 0
1
0
0 1
1 1
Attribute Usage Matrix
Virtual University of Pakistan
10
• AUM does not represent the query frequency at
different sites;
• Attribute affinity between two attribute A i and Aj,
affinity (Ai, Aj), of a relation R(A1, A2, …., An)
with respect to applications set
Q = {q1, q2, …, qq) is
Virtual University of Pakistan
11
aff(Ai, Aj) =
∑
k|use(qk, Ai) = 1
∑ refl(qk)accl(qk)
use(qk, Aj) = 1∀ sites
where refl(qk) is number of accesses to attributes (Ai,
Aj) for each execution of qk at site Sl, and…
accl(qk) is application access frequency measure from Sl
Virtual University of Pakistan
12
Attribute Usage
Matrix
A
A
A
A
S1
S2
S3
1
2
3
4
q1
1
0
1
0
q1
15
20
10
q2
0
1
1
0
q2
5
0
0
q3
0
1
0
1
q3
25
25
25
q4
0
0
1
1
q4
3
0
0
Virtual University of Pakistan
Access Frequency
Matrix
13
acc1(q1) = 15, acc2(q1) = 20, acc3(q1) = 10
acc1(q2) = 5, acc2(q2) = 0, acc3(q2) = 0
acc1(q3) = 25, acc2(q3) = 25, acc3(q3) = 25
acc1(q4) = 3, acc2(q4) = 0, acc3(q4) = 0
Virtual University of Pakistan
14
aff(A3, A4)
= ∑k = 4 ∑l =1..3 refl(qk)accl(qk)
= 3 *1 + 0 + 0 = 3
aff(A1, A2) = 0,
Since no qi accesses them both
aff(A2, A2) = 5 * 1 + 0 + 0 = 5
25 * 1 + 25 *1 + 25 * 1 = 75 + 5 = 80
Virtual University of Pakistan
15
S1
S2
S3
q1
15
20
10
0
q2
5
0
0
0
1
q3
25
25
25
1
1
q4
3
0
0
A1
A2
A3
A4
q1
1
0
1
0
q2
0
1
1
q3
0
1
q4
0
0
Virtual University of Pakistan
16
Attribute affinity matrix (AA)
A1
A2
A3
A4
A1
45
0
45
0
A2
0
80
5
75
A3
45
5
53
3
A4
0
75
3
78
Virtual University of Pakistan
17
Clustering Algorithm
• VF is based on identifying groups of
attributes based on AA
• Vertical Clustering is based on Bond
Energy Algorithm (BEA); it uses AA;
identifies groups of similar items
Virtual University of Pakistan
19
• Large affinity attributes are combined
together and lower together
• BEA takes as input the AA and
generates the cluster affinity matrix CA
Virtual University of Pakistan
20
Global Affinity
Measure (AM)
• Affinity Measure is a single value
that is calculated on the basis of
positions of elements in AA and their
surrounding elements
Virtual University of Pakistan
22
A1
A2
A3
A4
A1
45
0
45
0
A2
0
80
5
75
A3
45
5
53
3
A4
0
75
3
78
Virtual University of Pakistan
23
n
n
∑ ∑ aff(Ai, Aj) [aff(Ai, Aj1) + aff(Ai, Aj+1)
+ aff(Ai1, Aj) + aff(Ai+1, Aj) ]
AM =
i = 1 j = 1
aff(A0, Aj)= aff(Ai, A0)=
aff(An+1, Aj)= aff(Ai, An+1)=0
Virtual University of Pakistan
24
A1
A2
A3
A4
A1
45
0
45
0
A2
0
80
5
75
A3
45
5
53
3
A4
0
75
3
78
Virtual University of Pakistan
25