Tải bản đầy đủ (.pdf) (28 trang)

KHAI PHÁ LUẬT KẾT HỢP VỚI DỮ LIỆU PHÂN TÁN DỰA TRÊN MÔ HÌNH MAPREDUCE

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (865.32 KB, 28 trang )

HỌC VIỆN CÔNG NGHỆ BƯU CHÍNH VIỄN THÔNG

TRẦN THỊ LỊCH
KHAI PHÁ LUẬT KẾT HỢP VỚI DỮ LIỆU
PHÂN TÁN DỰA TRÊN MÔ HÌNH MAPREDUCE
LUẬN VĂN THẠC SĨ KỸ THUẬT
HÀ NỘI – 2014
HỌC VIỆN CÔNG NGHỆ BƯU CHÍNH VIỄN THÔNG

KHAI PHÁ LUẬT KẾT HỢP VỚI DỮ LIỆU PHÂN TÁN
DỰA TRÊN MÔ HÌNH MAPREDUCE
Chuyên ngành: Khoa học máy tính
Mã số : 60.48.01.01
LUẬN VĂN THẠC SĨ KỸ THUẬT
NGƯỜI HƯỚNG DẪN KHOA HỌC:
PGS.TS TRẦN ĐÌNH QUẾ
HÀ NỘI - 2014
1


1.  tài
 (Data Mining ) 














 , 












- 
       

 kinh








 , 








 













 g, 






 























         
    
    
2



        



     
           



         


           




 Khai 




















.

  , 
















.
3

 









































 . 



 adoop, 








MapReduce, 







.
 




, 






 




p.

 

 , 























.
  , 








MapReduce, Hadoop.
4
 

 
 . 











 
Eclipse.
 










































.
4


1: 























,
 , 





 , quy
.
2: 


























.
Ch3: 









MAPREDUCE
 ,
 , 







 . 








.
4: TRONG



 , 

 , 



















.






5


1.1 










?
1.2 








1. Làm 






1.3 



quy


1.3.6 

 
 
 
6


 Phân lo
 
 
 
1.6 













1.8 






  -    


7

2: 







P
2.1 




2.1.1 
: Độ hỗ trợ (support) 
Y là :
Support =
: Độ tin cậy (Confidence) 
X  Y là : 
Confidence =
Các lu       

        và
Minimum confidence 
         
min_sup goi là frequent itemsets.
2.1.2 








2.
- 
- 
- .
8

Tìm t      

Y sao cho
support(X

Y) >= minsup và confidence(X

Y) >=
mincof.

 


           


 
 
 
2.2 
















2.2.1 

















2.2.2 Thu
2.2.3 




















9

2.4 





sau:
 C          
         làm
quen  Apriori.
 
         

 
   n  
.









10


3: 








MAPREDUCE
3.1 










3.1.1 










         
        



 

 

3.1.2 



?



11


12


           
reduce


map(k1, v1) -> list(k2, v2)



       
ey, value) trung gian có

Mc
á
ch hình thc, hàm này có thmnsau
reduce(k2, list (v2))->list(v3)
Tronk2 là key chung ca nhóm trung gian, list(v2) là
các values trong nhóm,

list(v3)là mdanh sách
các gi
á
trva reduce thuki iv3. Do
reduce dng vào nhinhóm trung gian 
nhau, chúng l ma cchsong song
nhau.
3.1.3 




3.1.4 








e
13

 
 
ph
 
     
.
      

 


14

3.2 


















3.2.1 



?
1) Hadoop là 

2) 
.
3.2.2 .

3.2.4 
3.3 Hadoop Distributed File System (HDFS)
3.3.1 




3.4 






          
.
 

     
 (HDFS).
15

4: 





ph


T        mô
        
Analysis .

           

Bài toán 1 
 
Bài toán 2:  






16

//(1) Map transaction t in data
source to all Map nodes;
C
1
= {size 1 frequent items};
// (2) min_support = num/total
items;
L
1
= {size 1 frequent items
min_support};
for (k = 1; L
k
!=∅; k++) do begin
// (3) sp xp và loi b các items
trùng nhau t L
k
C
k+1
= L
k
join_sort L
k
;
for each transaction t in data
source with C
k+1

do
// m s ln xut hin C
k+1

trong t
// (5) Tìm L
k+1
vi C
k+1
tha mãn
min_support
L
k+1
= {size k+1 frequent items
min_support};
end
end
return ∪k L
k
;



MapReduce



17

Step 1: c mi giao dch ca d liu

u vào và to ra mt tp các Item
(<V
1
>, <V
2
>,, <V
n
>) where < Vn>:(v
n1
,
v
n2
, v
nm
)
Step 2: Sp xp tt c các tp <V
n
> và
to ra mt tp các d liu  c sp
xp là <U
n
>:
(<U
1
>, <U
2
>, , <U
n
>) trong  < U
n

>:
(u
n1
, u
n2
, u
nm
)
Step 3: Vòng lp While < U
n
> có phn t
tip theo;
//Chú ý:mi danh sách U
n
c x lý
riêng r.
3.1: Vòng lp For mi Item t u
n1
ti
u
nm
ca < U
n
> with NUM_OF_PAIRS
3.a: sinh ra mt tp d liu <Y
n
>:
(y
n1
, y

n2
, y
nl
);
Y
nl
: (u
nx
u
ny
) là danh sách ca các cp
(u
n1
, u
n2
, u
nm
) where u
nx
u
ny

3.b: Làm tng s xut hin ca y
nl
;
//Chú ý: (key, value) = (y
nl
, s ln
xut hin)
3.2: Kt thúc vòng lp For

Step 4: Kt thúc vòng lp While
Tp d liu c to ra là u vào ca
giai on Reducer:
(key, <value>) = (y
nl
, <s ln xut
hin>)
Hình 4.2 MBA Algorithm for Mapper

18



1 c(y
nl
,<number of occurrences>)
data t nhiu node.
2. Add the values for y
nl
to
have
(y
nl
, total number of occurrences)
Hình 4.3. MBA Algorithm for Reducer



O(k t n))  
d

-Map/Reduce là
O(k t n/p) 
          





19


  .










.

          
nhau:
            

 


 

 là 

cùng nhau.



File config.txt có 
File transa.txt: 

Là file output.txt 
20




toán Apriori




MapReduce

34,607s
3,2s

37,063s
4s



0
10
20
30
40
APRIORI_TT APRIORI_MR
csdl1
csdl2


21

0
10
20
30
40
50
60
70
80
Apriori_TT
Apriori_MR
1000
600



       




22


Hình 



 
       
.
 

 Theo hình 4.8 
         

MapReduce b        
23




 .
 


      



4.5 



.
           
chính sau:
 
       

 toán
      
        
Hadoop trên Java.
 


×