Tải bản đầy đủ (.pdf) (26 trang)

Thông tin tóm tắt về những đóng góp mới của luận án tiến sĩ: Khai phá luật quyết định trên mô hình dữ liệu dạng khối.

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (707.14 KB, 26 trang )

<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>

TRAINING OF SCIENCE AND TECHNOLOGY


<b>GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY </b>


<b>--- </b>


<b>Do Thi Lan Anh </b>


<b>MINING DECISION LAWS ON THE DATA BLOCK </b>


Major: Computer sciense
<b> Code: 9 48 01 01 </b>


<b>SUMMARY OF COMPUTER DOCTORAL THESIS </b>


<b> </b>


</div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2>

and Technology - Vietnam Academy of Science and
Technology


Science instructor: Assoc. Prof. Dr. Trinh Dinh Thang


Reviewer 1: Assoc. Prof. Dr. Nguyen Huu Quynh
Reviewer 2: Assoc. Prof. Dr. Do Nang Toan
Reviewer 3: Assoc. Prof. Dr. Pham Van Cuong


The thesis will be defended before the Academy-level PhD
Thesis Evaluation Council, meeting at Graduate University of
Science and Technology - Vietnam Academy of Science and
Technology at… o'clock .., date… month… year 202….



The thesis can be found at:


</div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3>

<b>INTRODUCTION </b>
<b>1. The urgency of the thesis </b>


Mining decision laws is the process of defining decision
laws on a given decision table, serving the object classification
problem. This is one of the popular data mining techniques and
has been studied by many domestic and foreign experts on both
of the relational model and the extended models of the relational
model.


Researches in the world and our country for the purpose
of finding meaningful knowledge, especialy the laws on different
data models with different research directions. An approach to
the data block model of the authors with the purpose of tracking
the laws occurring in a process that changes over time, period ...
is the desire to contribute of the thesis.


<b>2. The objective of the thesis </b>


The objective of the thesis focus on solving three problems:
- To find decision laws on the data block and block’s
slice.


- To find decision laws bettwen object groups on the
block which has index attribute value change, particulary when
smoothing, or roughing attribute value.


- To find decision laws bettwen object groups on the


block when adding or removing block’s elements.


<b>3. Layout of the thesis </b>


</div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4>

Chapter 1 presents the basic concepts of data block, data
mining, mining decision laws and equivalent relationship.


Chapter 2 presents two research results: the first is to
propose the MDLB algorithm to find decision laws on the block
and block’s slice. The second is to propose the MDLB_VAC
algorithm to find decision laws on the block in case the attribute
value changes. In addition, giving theoretical studies on block
mining, calculating complexity and setting test proposed
algorithms.


Chapter 3 builds a model to increase or decrease the
object set of decision blocks; proposes two incremental
algorithms MDLB_OSC1 and MDLB_OSC2 to find decision
laws when block’s object set changes and test setting.


<b>CHAPTER 1. SOME BASIC KNOWLEDGE </b>


<b>1.1. Data mining </b>


<i><b>1.1.1. Definition of data mining </b></i>


Data mining is the main stage in the process of
discovering knowledge in the database. This process’s output is
the latent knowledge from data to help forecast, making decisions
in business, management, production activities, ...



<i><b>1.1.2. Some data mining techniques </b></i>
- Classification.


</div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5>

<i><b>1.2.1. Information System </b></i>


<i><b>Definition 1.1 (Informattion system) </b></i>


<i>Information system is a set of four S = (U, A, V, f) where </i>
<i>U is a finite object set, different than empty objects (U is also </i>
<i>known as the set of universe) and A is the finite and non-empty </i>
<i>attributes set; V is the values set, in which where V=</i>⋃<i>a∈AVA, Va</i>


<i>is the value set of the attribute a</i> <i> A, f is the information function </i>
<i>f: U x A</i>→<i>V, where </i><i>a </i><i> A, </i><i>u </i><i> U: f(u,a) </i><i> Va</i>


<i><b>1.2.2. Indiscernibility Relation </b></i>


<i>Given the information system S = (U, A, V, f), for each </i>
<i>attribute subset P </i><i> A, there exists a binary relations on U, denoted </i>
<i>IND (P), defined as follows: </i>


<i> IND(P) = {(u,v) </i><i> U x U|u(a) = v(a), </i><i>a </i><i> P) </i>
<i>IND (P) is called an Indiscernibility. </i>


<i><b>1.2.3. Decision table </b></i>


Decision table is a special information system in which
<i>attribute set A is divided into two separate non-empty sets C and </i>
<i>D, </i>

(

<i>A C</i>

= 

<i>D C</i>

,

 = 

<i>D</i>

)

, respectively called conditional

<i>attribute set C and decision attribute set D. </i>


<i>The decision table is denoted as: DS = (U, C </i><i> D, V, f) </i>
<i>or simply DS = (U, C </i><i> D). </i>


<i><b>1.2.4. Decision law </b></i>


<i>Definition 1.4 (Decision law) </i>


<i>Given the decision table DS = (U, C</i><i>D), suppose U/C </i>
<i>= {C1, C2, …, Cm} and U/D = {D1, D2, …, Dn} are the partitions </i>


<i>generated by C, D. For Ci </i><i> U/C, Dj</i> <i> U/D, a decision law is </i>


<i>presented as: Ci</i>→<i> Dj , i=1..m, j=1..n. </i>


</div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>

<i><b>1.3.1. The block. </b></i>


<i>Definition 1.8 </i>


<i> Let R = (id; A1, A2,..., An) is a finite set of elements, where </i>


<i>id is non-empty finite index set, Ai (i=1.. n) is the attribute. Each </i>


<i>attribute Ai (i=1.. n) there is a corresponding value domain </i>


<i>dom(Ai). A block r on R, denoted r(R) consists of a finite number </i>


of elements that each element is a family of mappings from the
<i>index set id to the value domain of the attributes Ai (i = 1.. n). </i>



<i>t </i><i> r (R) </i><i> t = { ti : id </i>→<i> dom (Ai)}i=1.. n. </i>


<i> The block is denoted by r(R) or r(id; A1, A2,..., An), sometime </i>


<i>without fear of confusion we simply denoted r. </i>
<i><b>1.3.2. The block’s slice </b></i>


<i>Let R = (id; A1, A2,..., An), r(R) is a block over R. For </i>


<i>each x</i><i>id we denoted r(Rx) is a block with Rx = ({x}; A1, A2,..., </i>
<i>An) such that: </i>


<i> tx</i><i> r(Rx) </i><i> tx = {tix = ti } i=1..n , t</i><i> r(R), t = {ti : id </i>→
<i>dom(Ai)} i=1..n , x </i>


<i> where tix (x) = ti (x), i =1.. n. </i>


<i> Then r(Rx) is called a slice of the block r(R) at point x, </i>


<i>sometimes we denoted rx</i>.


<b> Here, for simplicity we use symbols: </b>
<i> x(i) = (x; Ai ) ; id(i) = {x(i) | x </i><i> id}. </i>


<i> We call x(i) (x </i><i> id, i = 1..n) are the index attributes of the </i>
<i>block scheme R = (id; A1,A2,...,An ). </i>


<i><b>1.3.3. Relational algebra on the block </b></i>



</div>
<span class='text_page_counter'>(7)</span><div class='page_container' data-page=7>

Subtraction Descartes product
Descartes product with index set Projection


Selection Connection permission


Division


<b>1.4 . Conclusion chapter 1 </b>


Chapter 1 of the thesis presents an overview of data mining,
data mining techniques, knowledge of mining decision law,
equivalence class ... The last part chapter presents basic concepts
of the data block model: blocks, block’s slices, relational algebra
on blocks These knowledge will be the basis for the issues
presented in the next chapter.


<b>CHAPTER 2. MINING DECISION LAWS ON THE DATA </b>
<b>BLOCK HAS VARIABLE ATTRIBUTE VALUES </b>


<b>2.1 Some concepts built on the block </b>
<i><b>2.1.1 Information block </b></i>


<i>Definition 2.1 </i>


<i>Let block scheme R = (id;A1,A2,...,An), r is a block over </i>
<i>R. Then, the information block is a tuples of four elements IB= </i>
<i>(U,A,V,f ) with U is a set of objects of r called space objects, A </i>
<i>= </i> ( )
1
<i>n</i>


<i>i</i>
<i>i</i>
<i>id</i>
=


<i>is the set of index attributes of the object, V = </i> ( )
( )
<i>i</i>
<i>i</i> <i>x</i>
<i>x</i> <i>A</i>

<i>V</i>



<i>, </i> ( )<i>i</i>


<i>x</i>


<i>V</i>

<i> is the set of values of the objects corresponding to the index </i>
<i>attribute x(i)<sub>, f is an information function UxA</sub></i>→<i><sub> V </sub></i>


satisfy:<i>u</i><i>U,</i><i> x(i)</i><i><sub>A we have f(u, x</sub>(i)<sub>)</sub></i>


( )<i>i</i>


<i>x</i>


<i>V</i>

<i>. </i>


<i><b>2.1.2 Indiscernibility Relation on Block </b></i>
<i>Definition 2.3 </i>



</div>
<span class='text_page_counter'>(8)</span><div class='page_container' data-page=8>

<i>index attribute set P</i><i> A we define an equivalence relation, sign </i>
<i>IND(P) defined as follows: </i>


<i>IND(P) ={(u,v)</i><i> UxU|</i><i> x(i)</i><i><sub>P: f(u,x</sub>(i)<sub>)=f(v,x</sub>(i)<sub>)}, and </sub></i>


called non-discriminatory relations:
<i><b>2.1.3 Decision block </b></i>


<i>Definition 2.5 </i>


Let information block IB = (U,A,V,f) with U is the space
of objects. A = . Suppose A is divided into two sets C and
D such that:


<i>C=</i> ( )


1,


<i>k</i>
<i>i</i>


<i>i</i> <i>x id</i>
<i>x</i>


= 


<i>, D=</i> ( )
1,


<i>n</i>


<i>i</i>


<i>i k</i> <i>x id</i>
<i>x</i>


= + 


<i>, </i>


then information block IB is called the decision block and
denoted by DB=(U,CD,V,f), with C is conditional index
attributes set and D is decision index attributes set.


<b>2.1.4 Decision laws on the block and slice. </b>
<i>Definition 2.7 </i>


<i>Let decision block DB = (U,C</i><i>D), with U is the space </i>
of objects:


<i> C = , D = , and Cx = , </i>


<i> Dx <sub>= , x</sub></i><i><sub>id. </sub></i>
<i> Then: </i>


<i> U/C={C1,C2,…,Cm}, U/Cx= </i>{ <sub>1</sub>, <sub>2</sub>,..., }


<i>x</i>


<i>x</i> <i>x</i> <i>xt</i>



<i>C</i> <i>C</i> <i>C</i> <i>, </i>


<i> U/D={D1, D2,…,Dk}, U/Dx= </i>{D ,<sub>1</sub> <sub>2</sub>,..., }


<i>x</i>


<i>x</i> <i>Dx</i> <i>Dxh</i> <i>, </i>


<i>correspondingly, the partitions are generated by C, Cx<sub>, D, D</sub>x<sub>. A </sub></i>


decision law on a block is denoted by:
<i> Ci</i>→<i> Dj , i = 1..m, j=1..k , </i>


( )
1
<i>n</i>
<i>i</i>
<i>i</i>
<i>id</i>
=
( )
1,
<i>k</i>
<i>i</i>


<i>i</i> <i>x id</i>
<i>x</i>
= 
( )
1,


<i>n</i>
<i>i</i>


</div>
<span class='text_page_counter'>(9)</span><div class='page_container' data-page=9>

<i>and on the slice at point x is denoted by: </i>
<i> Cxi</i>→<i> Dxj , i =1..tx, j=1..hx . </i>
<i>Definition 2.8 </i>


<i> </i> <i>Let decision block DB=(U,C</i><i>D), Ci</i><i>U/C, Dj</i><i>U/D,</i>
<i>xp</i>


<i>C</i>

<i>U/Cx<sub>,</sub></i>


<i>xq</i>


<i>D</i>

<i>U/Dx<sub>, i =1..m, j=1..k, p</sub></i><i><sub>{1,2,…,t</sub></i>
<i>x }, </i>
<i>q</i><i>{1,2,…,hx }, x</i><i>id. Then, support, accuracy and coverage of </i>


decision law Ci→ Dj<i> on the block are: </i>


<i> - Support: Sup(Ci,Dj) = |Ci</i><i>Dj|, </i>


<i> - Accuracy: Acc(Ci,Dj) = </i>| |


| |
<i>i</i> <i>j</i>
<i>i</i>
<i>C</i> <i>D</i>
<i>C</i>
 <i><sub>, </sub></i>



<i> - Coverage: Cov(Ci,Dj) = </i>| |


| |
<i>i</i> <i>j</i>
<i>j</i>
<i>C</i> <i>D</i>
<i>D</i>
 <i><sub>. </sub></i>
<i>Definition 2.9 </i>


<i>Let decision block DB=(U,C</i><i>D), Ci</i><i>U/C, Dj</i><i>U/D is </i>


the conditional equivalence class and decision equivalence class
<i>generated by C, D corresponding, Ci</i>→<i> Dj is the decision law on </i>


<i>the block DB, i =1..m, j=1..k. </i>


- <i>If Acc(Ci</i>→<i> Dj ) = 1 then Ci</i>→<i> Dj is called certain decision </i>


<i>law. </i>


- <i>If 0 < Acc(Ci</i>→<i> Dj ) < 1 then Ci</i>→<i> Dj is called uncertain </i>


decision law.
<i>Definition 2.10 </i>


<i>Let decision block DB = (U,C</i><i>D), Ci</i><i> U/C, Dj</i><i> U/D, </i>
<i>i =1..m, j =1..k is the conditional equivalence class and decision </i>
<i>equivalence class generated by C,D corresponding; </i><i>, </i><i> are two </i>


<i>given thresholds (</i><i>, </i><i>(0,1)). If Acc(Ci,Dj) </i><i> and Cov(Ci,Dj) </i>
<i> then we call Ci</i>→<i> Dj is the decision law meaning. </i>


<b>2.2 Mining decision law on the data block and block’s slice </b>
<b>algorithm (MDLB). </b>


</div>
<span class='text_page_counter'>(10)</span><div class='page_container' data-page=10>

- Step 1: Assign classes of conditional, decision
equivalence on blocks (on slices).


- Step 2: Calculate the support matrix on the block (on
slice)


- Step 3: Calculate accuracy matrix, coverage matrix
- Step 4: Find the decision laws on the block.


<b>2.3. Mining decision laws on the block when index attribute </b>
<b>value changed. </b>


<i><b>Definition 2.11(Definition of smoothing index attribute value on the block) </b></i>
<i>Let decision block DB=(U,C</i><i>D,V,f), with U is the space </i>
<i>of objects, a</i><i> C</i><i>D, Va is the set of existing values of the index </i>


<i>attribute a. Suppose Z={xs</i><i>U | f(xs,a) = z} is the set of objects </i>


<i>whose z value is on the index attribute a. If Z is partitioned into </i>
<i>two sets W and Y such that: Z=W</i><i>Y, W</i><i>Y=</i><i> with W={xp</i><i>U| </i>
<i>f(xp,a) = w, w</i><i>Va}, Y={xq</i><i>U| f(xq,a)=y, y</i><i>Va}, then we say the </i>
<i>z value of the index attribute a is smoothed to two new values w </i>
<i>and y </i>



<i><b>Definition 2.12(Definition of roughing index attribute value on the block) </b></i>
<i>Let decision block DB=(U,C</i><i>D,V,f), with U is the space </i>
<i>of objects, a</i><i> C</i><i>D, Va</i> is the set of existing values of the index


<i>attribute a. Suppose f(xp,a)=w, f(xq,a)=y are respectively the </i>


<i>values of xp, xq on the index attribute a (p</i><i>q). If at any one time </i>


<i>we have: f(xp,a)= f(xq,a)=z, (z</i><i>Va) then we say the two values w, </i>
<i>y of a are roughened to the new value z. </i>


<i>Theorem 2.1 </i>


</div>
<span class='text_page_counter'>(11)</span><div class='page_container' data-page=11>

<i>space of objects, a</i><i> C</i><i>D, Va is the set of existing values of the </i>


<i>index attribute a. Then, two equivalent classes Ep, Eq (Ep, </i>
<i>Eq</i><i>U/E, E</i><i>{C,D}) is made rough into new equivalent class Es if </i>


<i>and only if </i><i>aj</i><i> a: f(Ep,aj) = f(Eq,aj). </i>
<i>Theorem 2.2 </i>


<i>Let decision block DB = (U, CD, V, f ), with U is the </i>
<i>space of objects, a</i><i> C</i><i> D, Va is the set of existing values of the </i>


<i>index attribute a. Then, equivalent class Es (Es</i><i>U/E, E</i><i>{C,D}) </i>


<i>smoothed into two new equivalents classes Ep, Eq if and only if </i>


<i>we can put: f(Ep,a)=w, f(Eq,a)=y và Ep</i><i> Eq= Es, w, y</i><i>Va, w</i>
<i>y. </i>



<i>Theorem 2.3 </i>


Let decision block DB = (U, CD, V, f ). <i>, </i><i> are two </i>
<i>given thresholds (</i><i>, </i><i>(0,1)). Suppose that if Ci</i> →<i> Dj </i>is the


decision law meaning on the decision block then it is also the
<i>decision law meaning on any slice of the decision block at x</i><i>id. </i>
<i><b>2.3.1 Smoothing, roughening the conditional equivalente </b></i>
<i><b>clases on the decision block and on the slice. </b></i>


<i>Proposition 2.3 </i>


<i>Let decision block DB = (U,C</i><i>D,V,f ), a=x(i)</i><i> C, Va</i> is


the set of existing values of the conditional index attribute a, The
<i>z value of a is smoothed to two new values w and y. </i>


<i>Suppose that if the conditional equivalence class Cs</i>
<i>U/C, (f(Cs,a)=z ) smoothed into two new conditional equivalents </i>


<i>classes Cp,Cq (f(Cp,a)=w, f(Cq,a)=y, with w,yVa ) then on the </i>


<i>slice rx, exists equivalence class Cxi satisfy: Cs </i><i> Cxi, also </i>


</div>
<span class='text_page_counter'>(12)</span><div class='page_container' data-page=12>

<i>Proposition 2.5 </i>


<i>Let decision block DB = (U, C</i><i>D, V, f ), a=x(i)</i><i><sub> C, V</sub></i>
<i>a</i>



<i>is the set of existing values of the conditional index attribute a, </i>
<i>the w and y values of a are roughened to the new value z. </i>


<i>Suppose, if two conditional equivalents classes Cp,Cq </i>
<i>U/C, (f(Cp,a)=w, f(Cq,a)=y) is made rough into new conditional </i>


<i>equivalent class Cs</i><i> U/C ( f(Cs,a)=z ) then on the slice rx exists </i>


<i>two conditional equivalents classes Cxi, Cxj satisfy: Cp</i><i>Cxi, Cq</i>
<i>Cxj, also is made rough into new conditional equivalent class Cxk </i>


<i>satisfy: Cs </i><i>Cxk . </i>


<i><b>2.3.2 Smoothing, roughening the decision equivalente clases </b></i>
<i><b>on the decision block and on the slice. </b></i>


<i>Proposition 2.7 </i>


<i>Let decision block DB = (U, C</i><i>D, V, f ), a=x(i)</i><i> D, Va</i>


<i>is the set of existing values of the decision index attribute a, the </i>
<i>z value of a is smoothed to two new values w and y. </i>


<i>Suppose that if decision equivalent class Ds</i><i> U/D ( </i>
<i>f(Ds,a)=z ) smoothed into two decision equivalents classes Dp,Dq</i>
<i>(f(Dp,a)=w, f(Dq,a)=y, with w,yVa) then on the slice rx, exists </i>


<i>decision equivalence class Dxi satisfy: Ds</i><i> Dxi , also smoothed </i>


<i>into two new decision equivalents classes Dxi’ and Dxi’’ satisfy: </i>


<i>Dp</i><i>Dxi’, Dq</i><i>Dxi’’ (f(Dxi’,a)=w, f(Dxi’’,a)=y). </i>


<i>Proposition 2.9 </i>


<i>Let decision block DB = (U, C</i><i>D, V, f ), a=x(i)</i><i> D, Va</i>


<i>is the set of existing values of the decision index attribute a, the </i>
<i>w and y values of a are roughened to the new value z. </i>


</div>
<span class='text_page_counter'>(13)</span><div class='page_container' data-page=13>

<i>equivalent class Ds</i><i> U/D ( f(Ds,a)=z ) then on the slice rx exists </i>


<i>two decision equivalents classes Dxi, Dxj satisfy: Dp</i><i>Dxi, Dq</i>
<i>Dxj, also is made rough into new decision equivalent class Dxk </i>


<i>satisfy: Ds </i><i>Dxk . </i>


<i><b>2.3.4 The algorimth of mining decision laws when smoothing, </b></i>
<i><b>roughing index attribute values on the block and the slice </b></i>
<i><b>(MDLB_VAC). </b></i>


<i> The algorimth MDLB_VAC consists of the following </i>
<i>steps: </i>


<i>Step 1: Calculate the support matrix Sup (C,D) of the </i>
original block.


Step 2: Incremental calculating the support matrix on the
<i>block Sup (C',D') after roughing/ smoothing the value of the </i>
index attribute.



<i>Step 3: Calculate accuracy matrix Acc (C',D'), the </i>
<i>coverage matrix Cov (C',D') after roughing/ smoothing the value </i>
<i>of the index attribute from the matrix Sup (C',D') </i>


Step 4: Finding decision laws on the block.


<b>2.4 Complexity of the Sup matrix algorithms on the block and </b>
<b>on the slice. </b>


</div>
<span class='text_page_counter'>(14)</span><div class='page_container' data-page=14>

<b>2.6 Conclusion </b>


This chapter presents the first results of the thesis:
Building some basic concepts of mining decision laws on block.
On that basis, a number of related properties, propositions and
theorems were stated and proved.


- Building MDLB algorithm to find decision law on
block and slice.


- Propose and prove some results on the relationship
between roughing, smoothing the values of the condition or
decision attribute on the block and slice. At the same time,
propose the MDLB_VAC algorithm to calculate the support
matrices on the block and slice, find decision rules when the
value of index attribute changes.


<b>CHAPTER 3. MINING DECISION LAWS ON BLOCK </b>
<b>HAS OBJECT SET CHANGED. </b>


<b>3.1 Model of adding and removing objects on block and slice. </b>


<i><b>Proposition 3.1 </b></i>


<i>Let decision block DB = (U, C</i><i>D, V, f ), AN and DM is </i>
<i>a set of adding and removing objects to block decisions DB. Then </i>
<b>we have: </b>


<i>Acc(C’,D’ )=Acc(C’i,D’j)ij with: i =1..m+p, j = 1..h+q and </i>


<i>𝐴𝑐𝑐(𝐶′</i>𝑖<i>, 𝐷′</i>𝑗) =


{


|𝐶𝑖∩ 𝐷𝑗| + 𝑁ij− 𝑀ij


|𝐶𝑖| + ∑ 𝑁ij'− ∑<i>ℎ𝑗′=1</i>𝑀ij'


<i>ℎ+𝑞</i>


<i>𝑗′=1</i>


<i>, 𝑖 = 1. . 𝑚, 𝑗 = 1. . ℎ,</i>
𝑁ij


|𝐶𝑖| + ∑ 𝑁ij'− ∑<i>ℎ𝑗′=1</i>𝑀ij'


<i>ℎ+𝑞</i>


<i>𝑗′=1</i>


<i>, 𝑖 = 1. . 𝑚, 𝑗 = ℎ + 1. . ℎ + 𝑞</i>


𝑁ij


∑<i>ℎ+𝑞</i>𝑗=1𝑁ij


</div>
<span class='text_page_counter'>(15)</span><div class='page_container' data-page=15>

<i>Proposition 3.3 </i>


<i>Let decision block DB = (U, C</i><i>D, V, f ), AN and DM is </i>
<i>a set of adding and removing objects to block decisions DB. Then </i>
we have:


<i>Cov(C’, D’) = Cov(C’i,D’j)ij(m+p)x(h+q), with i =1..m+p, j </i>
<i>= 1..h+q and </i>


<i>𝐶𝑜𝑣(𝐶′</i>𝑖<i>, 𝐷′</i>𝑗) =


{


|𝐶𝑖∩ 𝐷𝑗| + 𝑁𝑖𝑗− 𝑀𝑖𝑗


|𝐷𝑗| + ∑ 𝑁<i>𝑖′𝑗</i>− ∑𝑚<i>𝑖′=1</i>𝑀<i>𝑖′𝑗</i>
𝑚+𝑝


<i>𝑖′=1</i>


<i>, 𝑖 = 1. . 𝑚, 𝑗 = 1. . ℎ</i>
𝑁𝑖𝑗


|𝐷𝑗| + ∑𝑚+𝑝<i>𝑖′=1</i> 𝑁<i>𝑖′𝑗</i>− ∑𝑚<i>𝑖′=1</i>𝑀<i>𝑖′𝑗</i>


<i>, 𝑖 = 𝑚 + 1. . 𝑚 + 𝑝, 𝑗 = 1. . ℎ</i>


𝑁𝑖𝑗


∑𝑚+𝑝<i>𝑖′=1</i> 𝑁<i>𝑖′𝑗</i>


<i>, 𝑖 = 1. . 𝑚 + 𝑝, 𝑗 = ℎ + 1. . ℎ + 𝑞</i>


<b>3.2 Incremental Calculating Acc and Cov when adding and </b>
<b>removing objects on decision block. </b>


<i><b>3.2.1 Adding object x into decision block </b></i>


<i>Case 1: Create a new conditional class and a new decision class. </i>
<i> Acc(C’m+1, D’h+1) = 1 and Cov(C’m+1, D’h+1) = 1, </i>


<i>j=1..h: Acc(C’m+1, D’j) = Cov(C’m+1, D’j) = 0, </i>
<i> </i><i>i=1..m: Acc(C’i, D’h+1) = Cov(C’i, D’h+1) = 0. </i>


Other way, <i>i=1..m, </i><i>j=1..h: </i>
<i> Acc(C’i, D’j) = Acc(Ci, Dj<b>) , </b></i>


<i> and Cov(C’i, D’j) = Cov(Ci, Dj) . </i>


<i>Case 2: Create only new conditional class </i>
<i> Acc(C’m+1, D’j*) = 1 and Cov(C’m+1, D’j*) =</i>


1
|𝐷𝑗∗|+1


<i>. </i>



<i>If k </i><i> j* then: Acc(C’m+1, D’k) = Cov(C’m+1, D’k) = 0. </i>


<i>If i </i><i> m+1 then: Acc(C’i, D’j*) = Acc(Ci, Dj*), Cov(C’i, D’j*) </i>
<i>=</i>|𝐶𝑖∩𝐷𝑗∗|


</div>
<span class='text_page_counter'>(16)</span><div class='page_container' data-page=16>

Other way, <i>i </i><i> m+1, </i><i>j </i><i> j*: Acc(C’i, D’j) = Acc(Ci, Dj) and </i>
<i>Cov(C’i, D’j) = Cov(Ci, Dj). </i>


<i>Case 3: Create only new decision class </i>
<i> Acc(C’i*, D’h+1) =</i> 1


|𝐶𝑗∗|+1<i>and Cov(C’i*, D’h+1) = 1. </i>
<i>If i </i><i> i* then: Acc(C’i, D’h+1) = Cov(C’i, D’h+1) = 0. </i>


<i>If k </i><i> h+1 then: Acc(C’i*, D’k) =</i>


|𝐶<sub>𝑖</sub>∩𝐷<sub>𝑘</sub>|


|𝐶<sub>𝑖∗</sub>|+1

<i>, </i>

<i>Cov(C’i*, D’k) = </i>
<i>Cov(Ci*, Dk). </i>


Other way, <i>i </i><i> i*, </i><i>j </i><i> h+1: Acc(C’i, D’j) = Acc(Ci, Dj<b>) and </b></i>
<i>Cov(C’i, D’j) = Cov(Ci, Dj). </i>


<i>Case 4: No new conditional class or new decision class is </i>
<i>created. </i>


<i>Acc(C’i*,D’j*) =</i>


|𝐶<sub>𝑖∗</sub>∩𝐷<sub>𝑗∗</sub>|+1



|𝐶𝑖∗|+1

and Cov(C’

<i>i*,D’j*) =</i>


|𝐶<sub>𝑖∗</sub>∩𝐷<sub>𝑗∗</sub>|+1
|𝐷𝑗∗|+1
<i>- If k</i><i>j* then: Acc(C’i*,D’k)=</i>


|𝐶<sub>𝑖∗</sub>∩𝐷<sub>𝑘</sub>|+1


|𝐶<sub>𝑖∗</sub>|+1 <i>; Cov(C’i*,D’k)= </i>
<i>Cov(Ci*, Dk). </i>


<i>- If u </i><i> i* then: Acc(C’u,D’j*) = Acc(Cu,Dj*) and Cov(C’u,D’j*) </i>
<i>=</i>|𝐶𝑢∩𝐷𝑗∗|


|𝐷𝑗∗|+1


<i>- If i </i><i> i* and j </i><i> j* then: Acc(C’i, D’j) = Acc(Ci, Dj) and Cov(C’i, </i>
<i>D’j) = Cov(Ci, Dj). </i>


<b>3.2.2 Removing object x from decision block. </b>
<i>Acc(C’i*,D’j*)=</i>


|𝐶<sub>𝑖∗</sub>∩𝐷<sub>𝑗∗</sub>|−1


|𝐷𝑖∗|−1 <i> and Cov(C’i*,D’j*)=</i>


|𝐶<sub>𝑖∗</sub>∩𝐷<sub>𝑗∗</sub>|−1
|𝐶𝑖∗|−1 <i>. </i>
<i>- If k</i><i>j* then: Acc(C’i*,D’k) =</i>



|𝐶<sub>𝑖∗</sub>∩𝐷<sub>𝑘</sub>|


|𝐶𝑖∗|−1<i>and Cov(C’i*,D’k) = </i>
<i>Cov(Ci*,Dk) </i>


- If ui* then: Acc(C’u,D’j*) = Acc(Cu,Dj*) and Cov(C’u,D’j*)


= |𝐶<sub>|𝐷</sub>𝑢∩𝐷𝑗∗|
𝑗∗|−1


</div>
<span class='text_page_counter'>(17)</span><div class='page_container' data-page=17>

<b>3.3 Mining decision laws algorithm using incremental </b>
<b>calculating Acc and Cov matrix after adding and removing </b>
<b>objects (MDLB_OSC1) </b>


<i>Step 1: Calculate the accuracy matrix Acc(C,D) and </i>
<i>coverage Cov(C,D) of the block before adding and removing the </i>
object.


Step 2: Incremental calculating the accuracy matrix
<i>Acc(C',D') and coverage Cov(C',D') after adding and removing </i>
object.


<i>Step 3: Remove rows/ columns in matrices Acc(C',D') </i>
<i>and Cov(C',D') that have value 0. </i>


Step 4: Generate decision laws on block.


<b>3.4 Complexity of the mining decision laws algorithm using </b>
<b>incremental calculating Acc and Cov matrix after adding </b>


<b>and removing objects on decision block. </b>


<b>Proposition 3.5: The algorimth’s complexity determining Acc </b>
<i>and Cov is O(|U|2 ). </i>


<b>Proposition 3.6: The algorimth’s complexity incremental </b>
<i>caculating Acc and Cov when adding N objects is O(N|U|2<b> ). </b></i>
<b>Proposition 3.7: The algorimth’s complexity incremental </b>
<i>caculating Acc and Cov when removing M objects is O(M|U|2<b> ). </b></i>
<b>Proposition 3.8: The algorimth’s complexity deleting rows/ </b>
<i>columns in Acc and Cov matrices that have value 0 is O(|U|2<b> ). </b></i>
<b>3.5 Incremental Calculating Sup when adding and removing </b>
<b>objects on decision block </b>


<i>When adding N objects and removing M objects, we </i>
<b>have: </b>


</div>
<span class='text_page_counter'>(18)</span><div class='page_container' data-page=18>

<i>Other way, Mij = 0 and Sup(Ci,Dj)=0 with i=m+1..m+p, </i>
<i>j=h+1..h+q </i>


<b>3.6 Mining decision laws algorithm using incremental </b>
<b>calculating Sup matrix after adding and removing objects </b>
<b>(MDLB_OSC2). </b>


<i>Step 1: Calculate the Sup(C,D) of the block before </i>
adding and removing the object.


Step 2: Incremental calculating the support matrix
<i>Sup(C',D') after adding and removing object. </i>



<i>Step 3: Delete rows/ columns in Sup(C',D') that have </i>
value 0.


<i>Step 4: Calculate Acc(C',D') and Cov(C',D') through the </i>
<i>values of Sup(C’,D') </i>


Step 5: Generate decision laws on block.


<b>3.7 Complexity of the mining decision laws algorithm using </b>
<b>incremental calculating Sup matrix after adding and </b>
<b>removing objects on decision block. </b>


<i>Proposition 3.9: The algorimth’s complexity incremental </i>
<i>caculating Sup matrix when adding N objects is O(N|U|). </i>
<i><b>Proposition 3.10: The algorimth’s complexity incremental </b></i>
<i><b>caculating Sup matrix when removing M objects is O(M|U|). </b></i>
<i><b>Proposition 3.11: The algorimth’s complexity incremental </b></i>
<i>caculating Sup matrix to find out decision laws when adding N </i>
<i>objects is O(|U|2<b>). </b></i>


</div>
<span class='text_page_counter'>(19)</span><div class='page_container' data-page=19>

<i><b>Proposition 3.13: The algorimth’s complexity incremental </b></i>
<i>caculating Sup matrix when removing M objects in block’s slice </i>
<i>at x</i><i><b>id is O(M|U|). </b></i>


<b>3.10 Experimental algorithms </b>
<i><b>3.10.1 Experimental objectives </b></i>


(1) Evaluate the enforcement of the MDLB and
MDLB_VAC algorithms.



(2) Evaluate the enforcement of the MDLB_OSC1 and
MDLB_OSC2 algorithms. In addition, compare implementation
time MDLB_OSC1 algorithm with MDLB_OSC2 algorithm.
<i><b>3.10.2 Experimental data </b></i>


The experiment was performed on 3 data sets taken from
Pediatrics Department A, B of Bach Mai Hospital 2 from March
10, 2020 to March 14, 2020. Data were collected and pre-treated,
with each data set including 3 conditional index attributes,
namely disease symptoms, including fever, cough, runny nose
and 2 decision index attributes: the treatment regimen, fever virus
level were monitored over 4 days.


The element number of the data set is:
<b>Database </b>


<b>name </b>


<b>BVBM2KN</b>
<b>A </b>


<b>BVBM2KN</b>
<b>B </b>


<b>KID PATIENT </b>
<b>FEVER VIRUS </b>


<b>Number of </b>


<b>objects </b> <i>160 </i> <i>1360 </i> <i>939 </i>



</div>
<span class='text_page_counter'>(20)</span><div class='page_container' data-page=20>

Programming algorithms is written with Java language.
Experimental environment is PC with Intel (R) Core ™ i5 2.5Ghz
configuration, 4G RAM, Windows 7 OS.


<i><b>3.10.4. Experimental result </b></i>


After running 3 algorithms on the data sets, we obtained
the following results:


<i><b>- With problem 1: find the decision laws on the block and slice: </b></i>


<i>Figure 3.4: Founded decision laws on the block </i>


</div>
<span class='text_page_counter'>(21)</span><div class='page_container' data-page=21>

<i><b>- With problem 2: Find decision laws on block and slice when </b></i>
<i><b>smoothing, roughing index attribute values </b></i>


<i>Figure 3.8: Calculate matrices Sup, Acc, Cov before and after smoothing</i>
<i>Figure 3.5: Relationship between the number of decision laws </i>


</div>
<span class='text_page_counter'>(22)</span><div class='page_container' data-page=22>

<i>Figure 3.10: Calculate matrices Sup, Acc, Cov before and after roughing</i>


<i>Figure 3.11: Founded decision laws after smoothing, roughing attribute values </i>


<i><b>- With problem 3: find the decision laws on the block and slice </b></i>
<i><b>when adding or removing objects </b></i>


</div>
<span class='text_page_counter'>(23)</span><div class='page_container' data-page=23>

<i>+ Results of MDLB_OSC2 algorimth: </i>


<i>Comment: Two methods give the same result of the rule </i>


set with the same source set, the only difference in execution
time:


<b>3.11 Conclusion </b>


</div>
<span class='text_page_counter'>(24)</span><div class='page_container' data-page=24>

matrices have been demonstrated. Based on that, two algorithms
for finding decision laws on block and slice were proposed:
- Algorithm MDLB_OSC1 calculates incremental matrices Acc,
Cov to find out decision laws on block and slice.


- Algorithm MDLB_OSC2 calculates the incremental matrix Sup
to find out decision laws.


At the end of the chapter is a comparison of the two
proposed algorithms and experimental settings.


<b>CONCLUSION </b>


<i><b>1) Main results of the thesis </b></i>


The thesis focuses on the problem of mining the decision
laws on the block in some cases with the following main results:
- Builded a model of mining decision laws on the data
block with proven definitions, theorems, and propositions.


- Proposes three algorithms to find decision laws on data
block in the following cases: fixed block data; value of index
attribute changes; and the object set of data block changes.
<i><b>2) Future research of the thesis </b></i>



- Continue to study mining decidion laws in some case:
the block has attributes changed, the data is not complete ...


- Mining decision laws on the chain of linked decision
blocks together (similar to blockchain technology).


<b>NEW FINDINGS OF THE DOCTORAL DISSERTATION </b>


</div>
<span class='text_page_counter'>(25)</span><div class='page_container' data-page=25>

- Builded a model of mining decision laws on the data
block with proven definitions, theorems, and propositions.


- Proposes three algorithms to find decision laws on data
block in the following cases: fixed block data; value of index
attribute changes; and the object set of data block changes.


<b>LIST OF WORKS OF AUTHOR </b>


1. Thang Trinh Dinh, Tuyen Tran Minh, Lan Anh Do
Thi, “Mining decision laws on data block has variable attribute
values”, Proceedings of 19th National Conference: Selected
Problems of Information Technology & Communication, Hanoi,
01-02/10/2016, Page 163-169.


2. Thang Trinh Dinh, Tuyen Tran Minh, Lan Anh Do
Thi, Quyen nguyen Thi, “Some results on the reclaim of decision
laws on the data block has variable attribute values”, Proceeding
of the 10th National Conference on Fundamental and Applied
research of Information Technology (FAIR’10), Da Nang,
17-18/08/2017, pp. 623-632.



3. Thang Trinh Dinh, Lan Anh Do Thi, “Some
algorithms to determine support matrix on the data block has
variable attribute values”, Proceedings of 21th National
Conference: Selected Problems of Information Technology &
Communication, Thanh Hoa, 27-28/07/2018, Page 216-225.


</div>
<span class='text_page_counter'>(26)</span><div class='page_container' data-page=26>

Advanced Research in Computer Science, Volume 10 issue 2
March – April 2019.


5. Lan Anh Do Thi, Thang Trinh Dinh, “An
incremental method for calculating Acc and Cov of decision laws
on the data block has the object set changed”, Journal on works
of research, development and application of Information
Technology & Communication, Journal of Science and
Technology of Ministry of Information and Communications,
Episode 2019, no. 1, 2019, pp. 1-10.


</div>

<!--links-->

×