Tải bản đầy đủ (.pdf) (12 trang)

DSpace at VNU: The lattice-based approaches for mining association rules: a review

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (720.11 KB, 12 trang )

Advanced Review

The lattice-based approaches for
mining association rules: a review
Tuong Le1,2 and Bay Vo3,4*
The traditional methods for mining association rules (ARs) include two
phrases: mining frequent itemsets (FIs)/frequent closed itemsets (FCIs)/frequent maximal itemsets (FMIs) and generating ARs from FIs/FCIs/FMIs.
Lattice-based approaches (LBAs) for mining ARs are new approaches including
two phrases: frequent itemset lattice (FIL)/frequent closed itemset lattice (FCIL)
building and generating ARs from the lattice. Total mining time of LBAs for
mining ARs outperforms the traditional methods for mining ARs. Besides, the
most important advantage of LBAs for mining ARs is that the algorithms only
build the lattice once and mine ARs with many different confidences or many
different minimum supports (the thresholds have to be greater than or equal
to the threshold used to build lattices) without mining FIs/FCIs again. In this
article, we describe a number of existing LBAs for mining ARs on static databases including lattice building and rule generation. In addition, in today’s
online system, the data often change in several operations such as insert,
delete, and update. Hence, a number of LBAs for mining ARs on dynamic
databases are mentioned. Finally, complexity analysis of the LBAs for mining
ARs is also thoroughly discussed. © 2016 John Wiley & Sons, Ltd
How to cite this article:

WIREs Data Mining Knowl Discov 2016, 6:140–151. doi: 10.1002/widm.1181

INTRODUCTION

D

ata mining is a process of analyzing the data to
find knowledge to use in intelligent systems.
There are currently many problems to be introduced


such as problem of mining association rules (ARs),
classification,1–6 clustering,7,8 text mining,9 and their
applications.2,10 Mining ARs, including ARs, minimal non-redundant association rules (MNARs), and
most generalization association rules (MGARs), is a
model being widely used in market basket analysis,

*Correspondence to:
1

Division of Data Science, Ton Duc Thang University, Ho Chi
Minh City, Vietnam

2

Faculty of Information Technology, Ton Duc Thang University,
Ho Chi Minh City, Vietnam

3

Faculty of Information Technology, Ho Chi Minh City University
of Technology, Ho Chi Minh City, Vietnam

4

College of Electronics and Information Engineering, Sejong
University, Seoul, Republic of Korea
Conflict of interest: The authors have declared no conflicts of interest for this article.

140


online e-commerce such as Amazon, Alibaba, and so
on, and several other recommendation systems.
Traditional approaches for mining ARs consist of
two steps: mining frequent itemsets (FIs)/frequent
closed itemsets (FCIs)/frequent maximal itemsets
(FMIs) (FIs/FCIs/FMIs),11,12,13 and generating rules
from those itemsets. Some variants of FIs such as
high utility itemsets (itemsets whose utility satisfies a
given threshold),14–27 top-k high utility itemsets (topk itemsets with highest utility),28 weighted pattern
(pattern with weighted items),29–31 erasable itemsets
(itemsets can be eliminated but do not greatly affect
the factory’s profit),32–34 weighted erasable patterns
(erasable itemsets considered the distinct weight of
each item),35,36 and so on are proposed. Besides, several type of representations that limit the number of
FIs such as FCIs,37–41 FMIs,42–47 top-k FIs,48,49 toprank-k FIs,50,51 and FIs with constraints52 are also
proposed. In traditional approaches for mining ARs,
researchers usually focus on the first phrase (mining
FIs/FCIs/FMIs). However, the second phrase (rule
generation) takes a lot of time for mining a large

© 2016 John Wiley & Sons, Ltd

Volume 6, July/August 2016


WIREs Data Mining and Knowledge Discovery

The lattice-based approaches for mining association rules

number of FIs/FCIs/FMIs. Therefore, lattice-based

approaches (LBAs) for mining ARs are proposed to
overcome the above weakness. Generally, these
approaches will build frequent itemset lattice (FIL)/
frequent closed itemset lattice (FCIL) (FIL/FCIL) in
the first phrase. In the next phrase, they only traverse
the lattice to generate ARs. As generating rules
from lattice has less complexity than traditional
approaches such as Apriori or hash table, total
mining time of LBAs for mining ARs outperforms
the traditional methods for mining ARs. Moreover,
the largest advantage of LBAs for mining ARs is that
the algorithms only build the lattice once and mine
ARs with many different confidences or many different minimum supports (the thresholds have to be
greater than or equal to the threshold used to build
lattices) without mining FIs/FCIs/FMIs again. Therefore, LBAs are extensively used to mine ARs nowadays. In addition, in today’s online system, the data
often change in several operations such as add,
delete, and update, especially in e-commerce systems,
which raise a need of improving AR mining methods
to adapt with the new requirements. There have been
several studies for mining patterns/rules on dynamic
databases. In this article, we conduct a review of
LBAs for mining ARs including lattice building and
rule generation phrases. Furthermore, a number of
LBAs for mining ARs for dynamic databases are also
surveyed. Next, the complexity analysis of LBA for
mining ARs is discussed. Finally, some challenges of
the LBAs and their potential applications in the near
future are introduced.
The rest of the article is organized as follows.
The section “Classical Approaches for Mining ARS”

presents the classical approaches for mining FIs/FCIs
and mining ARs/MNARs/MGARs. In the section
“The FIL/FCIL Building,” we report the existing
approaches for building FIL and FCIL. Next, the
section “LBAs for Mining ARs” presents a number of
LBAs for mining ARs. Some incremental LBAs for
mining ARs are subsequently presented in the
section “LBAs for Mining ARS on Dynamic Databases”. Then, the section “Complexity Analysis”
shows complexity analysis of LBA for mining ARs.
The conclusion is presented in the section “Conclusion and Future Researches.”

CLASSICAL APPROACHES
FOR MINING ARS
Given a database (DB) comprising of a number of
transactions (n) such that each transaction contains a
number of items. Transaction database (DBe) is

Volume 6, July/August 2016

TABLE 1 | A Transaction Database (DBe) Example
Transaction

Items

1

A, C, T, W

2


C , D, W

3

A, C, T, W

4

A, C, D, W

5

A, C, D, T, W, E

6

C, D, T, E

presented in Table 1 as an example and will be used
for illustrative purposes throughout this article.
The support of an itemset X, denoted by σ(X),
is the number of transactions in DB that contain all
items of X. An itemset X is an FI if and only if
σ(X) ≥ dminSup × ne, in which minSup is a usergiven minimum support threshold. Currently, there
are many algorithms for mining FIs, which may be
divided into three main groups: (1) Methods that
use a candidate generate-and-test strategy: they
generate frequent 1-itemsets which are then used to
generate candidate 2-itemsets, and so on until there
is no more candidates that can be generated.

Apriori53 and BitTableFI54 are exemplar algorithms.
(2) Methods that adopt a divide-and-conquer strategy: they compress DB into a tree structure and mine
FIs from this tree by using divide-and-conquer strategy. FP-Growth55 and FP-Growth*56 are exemplar
algorithms. (3) Methods that use a hybrid approach:
these methods use vertical data formats to compress
DB and also mine FIs by using divide-and-conquer
strategy. Eclat,57 dEclat,58 Index-BitTableFI,59 DBVFI,60 PrePost,61 FIN,62 NSFI,63 and PrePost+64 are
some examples.
An FI is called an FCI if none of its supersets
has the same support. For instance, consider DBe
and minSup = 50%. Two itemsets, AW and ACW,
are two FIs because σ(AW) = σ(ACW) = 4 >
dminSup × ne = d50% × 6e = 3. However, AW is
not an FCI because ACW is its superset and has the
same support to AW. Only ACW is an FCI. Most of
the previously proposed algorithms for mining FCIs
can be categorized as being either (1) generate-andtest, (2) divide-and-conquer, or (3) hybrid methods.
The generate-and-test (Apriori-based) approach uses
a level-wise search to mine FCIs. A well-known algorithm is Close.65 The divide-and-conquer approach
adopts a divide-and-conquer strategy and uses some
compact data structures to efficiently mine FCIs.
Examples are CLOSET39 and CLOSET+.56 The
hybrid approaches integrate the previous two.
Typically, the database is firstly transformed into a

© 2016 John Wiley & Sons, Ltd

141



Advanced Review

wires.wiley.com/dmkd

vertical data format or compress format. The
approach then utilizes some pruning properties to
quickly prune nonclosed itemsets. Examples are
CHARM, dCHARM,58 DBV-Miner,41 DCI_PLUS,66
and NAFCP.37
An AR is an implication expression of the form
X ! Y, where X and Y are disjoint itemsets, i.e., X \
Y = ;. The strength of an AR can be measured in
terms of its confidence. Confidence of a rule (c) determines how frequently items in Y appear in transactions that contain X: c(X ! Y) = σ(X [ Y)/σ(X). Each
frequent k-itemset, XY, can produce up to 2k−2 ARs,
ignoring rules that have empty antecedents or consequents (; ! XY or XY ! ;). An AR can be extracted
by partitioning the itemset XY into two nonempty
subsets, X and Y, such that X ! Y satisfies the confidence threshold (minConf ). Note that all such rules
must have already met the support threshold because
they are generated from an FI. Because the rule generation from FIs is simple, there are relatively few studies
on this stage. Many studies focused on the stage of
mining FIs/FCIs. Agrawal and Srikant53 introduced
the following properties: “if the rule c(AB ! CD) <
minConf, then the rules c(ABC ! D) and
c(ABD ! C) are smaller than minConf” to reduce
the search space. An algorithm based on this property has been proposed to efficiently mine ARs
from FIs/FCIs generated from stage 1. This method
has been used to mine ARs from FIs/FCIs so far.
Let X be an FCI. An itemset Y is a generator of
X if and only if in Y  X and σ(X) = σ(Y). For example, AW is a generator of ACW, because AW 
ACW and σ(AW) = σ(ACW) = 4. Similarly, A and

AC are also generators of ACW. Let G(X) is the set
of X’s generators. We have Y 2 G(X) is a minimal
generator if and only if Y does not have any subset in
G(X). For example, G(ACW) = {A, AC, AW} therefore minimal generators of ACW is mG(ACW) = {A}.
An association rule R1: X1 ! Y1 is a MNAR if there
is no AR R2: X2 ! Y2 with σ(X1 [ Y1) = σ(X2 [
Y2), c(R1) = c(R2), X2  X1 and Y2  Y1. There are
two kinds of MNARs obtained: (1) exact rules (their
confidence = 100%): the rules have the form X0 !
X, where X is an FCI, and X0 2 mG(X) and
(2) approximate rules (their confidence < 100%): the
rules have the form X0 ! Y, in which X and Y are
FCIs, and X0 2 mG(X), X  Y.
Assume that there are two rules R1: X1 ! Y1
and R2: X2 ! Y2. Rule R1 is said to be more general
than R2 (R1 / R2) if and only if X1  X2 and Y2 
Y1. Let R = {R1, R2, …, Rn} be the set of rules that
satisfy the conditions of minSup and minConf. A rule
Ri is said to have a higher precedence than another
rule Rj, denoted as Ri > Rj, if Ri / Rj and one of the
142

following conditions holds: (1) c(Ri) > c(Rj); (2) c(Ri)
= c(Rj) and σ(Ri) > σ(Rj). Let RMG be the set of the
MGARs of R: RMG = {Rj 2 R| ¬ 9 Ri 2 R: Ri > Rj}.

THE FIL/FCIL BUILDING
LBAs for mining ARS are divided into two phases:
(1) building lattice and (2) generating ARs from the
lattice. This section presents the existing approaches

for building lattices. Some of existing approaches for
mining ARs from the lattices are then introduced in
the section “LBAs for Mining ARs”.

The FIL Building
In 2009, Vo and Le67 proposed an algorithm for
building FIL (e.g., FIL-2009) directly from the database (Table 2). In FIL-2009, each node in the lattice
has the tuple hX, Tidset, Childreni where X is a kitemset, Tidset is the set of IDs associated with the
transactions containing X, and Children = {Y | Y 2
(k + 1)-itemsets and X  Y}. FIL-2009 built for DBe
in Table 1 with minSup = 50% is presented in
Figure 1.
Although, mining ARs from FIL-2009 is very
effective, FIL-2009 is not an effective structure to
mine MNARs. Therefore, in 2011, Vo and Le68
extended the structure of FIL-2009 (e.g., FIL-2011)
by adding one field to consider whether or not a lattice node is a minimal generator, and another field to
consider whether or not a lattice node is an FCI.
These values were directly determined in the lattice
building. The structure is then used to effectively
mine MNARs, which will be presented in “LBAs for
Mining ARs” section. With DBe in Table 1 and
minSup = 50%, FIL-2011 is presented in Figure 2.
On the figure, bold-nodes and dashed-nodes indicate
FCIs and minimal generators respectively.
When a node XA in an FIL-2009 (and FIL2011) is created, FIL-Building-2009 (or FIL-Building2011) has to find all the nodes that are the children
of XA to update the lattice. This process first visits
all children of X (Y 2 X.Children). With each Y, the
process visits all children of Y (YB 2 Y.Children).
With each YB, if XA  YB, the process then updates

TABLE 2 | Existing Approaches for Building Frequent Itemset
Lattice (FIL)

No

Year

Structure

FIL-Building-2009

67

2009

FIL-2009

2

FIL-Building-2011

68

2011

FIL-2011

3

TFIL69


2014

FIL-2014

1

© 2016 John Wiley & Sons, Ltd

Name of Algorithm

Volume 6, July/August 2016


WIREs Data Mining and Knowledge Discovery

The lattice-based approaches for mining association rules

{}

A1345
4

AT135
3

AW1345
4

ATW135

3

D2456
4

AC1345
4

DW245
3

ATC135
3

T1356
4

W12345
5

DC2456
4

TW135
3

AWC1345
4

C123456

6

TC1356
4

DWC245
3

WC12345
5

TWC135
3

ATWC135
3

F I G U R E 1 | FIL-2009 for DBe with minSup = 50%.

{}

A1345
4

AT135
3

AW1345
4


ATW135
3

D2456
4

AC1345
4

DW245
3

ATC135
3

T1356
4

W12345
5

DC2456
4

TW135
3

AWC1345
4


C123456
6

TC1356
4

DWC245
3

WC12345
5

TWC135
3

ATWC135
3

F I G U R E 2 | FIL-2011 for DBe with minSup = 50%.

YB belonging to the children of XA (YB 2 XA.Children). Considering FIL-2009 in Figure 1, when the
algorithm creates the node TC, it has to consider all
the child nodes associated with T, which consist of
AT and TW. Next, the algorithm has to consider all
the child nodes associated with AT and TW, which
are {ATW, ATC} and {ATW}. However, the process
of considering all child nodes of TW does not find
any nodes that are the child node of TC. The node
ATW is a duplicate, and thus making the process of
considering all child nodes associated with TW

unncessary. To overcome this weakness, Vo et al.69
proposed a new structure for an FIL (e.g., FIL-2014)

Volume 6, July/August 2016

and TFIL algorithm for FIL-2014 building. Each
node on the lattice contains the form hItemset, Tidset, ChildrenEC, ChildrenLi. In which, ChildrenEC
contains the child nodes based on the equivalent class
feature associating with Itemset; and ChildrenL contains the child nodes based on the lattice feature associated with Itemset. Because this algorithm does not
scan all the child nodes of XA to update the lattice,
the time needed to build FIL-2014 of TFIL algorithm
is less than that of FIL-Building-2009 to build FIL2009 and FIL-Building-2011 to build FIL-2011. For
DBe in Table 1 and minSup = 50%, FIL-2014 is presented in Figure 3.

© 2016 John Wiley & Sons, Ltd

143


Advanced Review

wires.wiley.com/dmkd

{}

A1345
4

D2456
4


AW1345
4

AT135
3

AC1345
4

ATW135
3

T1356
4

DC2456
4

DW245
3

ATC135
3

W12345
5

AWC1345
4


C123456
6

TW135
3

TC1356
4

DWC245
3

WC12345
5

TWC135
3

ATWC135
3

F I G U R E 3 | FIL-2014 for DBe with minSup = 50%.

The FCIL Building
In 2005, Zaki and Hsiao58 proposed CHARM-L to
create FCIL-2005 (Table 3). The FCIL-2005 created
by CHARM-L for DBe with minSup = 50% is shown
in Figure 4. However, MNARs and MGARs cannot
be generated from FCIL-2005. Mining MNARs and

MGARs from FCIL-2005 requires using a level-wise
approach to generate generators; therefore, it is inefficient in terms of the mining time.
TABLE 3 | Existing Approaches for Building Frequent Closed
Itemset Lattice (FCIL)
No

Name of Algorithm

Year

Name of FCIL

1

CHARM-L58

2005

FCIL-2005

2

FCIL-Building-201370

2013

FCIL-2013

3


Snow-Touch71

2014

FCIL-2013

{}

In 2013, Vo et al.70 proposed FLC-Building2013 to build FCIL (e.g., FCIL-2013) effectively.
First, FCIs with their minimal generators are mined
using MG-CHARM.67 Then, an algorithm (e.g.,
FCIL-Building-2013) is proposed to insert FCIs into
FCIL-2013 with O(n × k) complexity where n is the
number of FCIs and k is the average of the number
of child nodes on the lattice. Since k << n, the FCILBuilding-2013 algorithm is efficient. The FCIL-2013
created by FCIL-Building-2013 on DBe with minSup
= 50% is shown in Figure 5.
In 2014, Szathmary et al.71 proposed SnowTouch, a novel computation schema for iceberg
lattices with generators. First, FCI computation is
delegated to the Charm algorithm. Then, FGs are
extracted by Talky-G. Next, two of the above methods together with an FG-to-FCI matching technique
form the Touch algorithm. Finally, the precedence is
retrieved from FCIs with FGs by the Snow algorithm
using a ground duality result from hyper graph theory. The result of Snow-Touch is the same with

C123456
6
DC2456
4


TC1356
4

{}

WC12345
5
AWC1345
4

DWC245
3

C123456
6
DC2456
4

144

TC1356
4

DWC245
3

ATWC135
3

T

WC12345
5
AWC1345
4

ATWC135
3

F I G U R E 4 | FCIL-2005 for DBe with minSup = 50%.

D

W
DW
A
AT, TW

F I G U R E 5 | FCIL-2013 for DBe with minSup = 50%.

© 2016 John Wiley & Sons, Ltd

Volume 6, July/August 2016


WIREs Data Mining and Knowledge Discovery

The lattice-based approaches for mining association rules

FCIL-Building-2013. On DBe with minSup = 50%,
this result is shown in Figure 5.


confidence ≥ minConf, this algorithm will add this
rule to the results.
For example, LBA-ARs-2009 uses FIL-2009 for
DBe with minSup = 50%, which was shown in
Figure 1 to generate ARs. Let minConf = 100%.
Considering the first child node of root, Ω = {AT,
AW, AC} (Figure 6).

LBAS FOR MINING ARS
Besides ARs, a number of types of ARs were proposed, namely MNARs and MGARs. Table 4 shows
the list of existing LBAs for mining ARs, MNARs
and MGARs.

1. Let L = AC, the last element of Ω. We have c(A
! AC) = σ(AC)/σ(A) = 4/4 = minConf. Hence,
this rule will be added to the results.
2. Let L = AW, the second element of Ω. We have
c(A ! AW) = σ(AW)/σ(A) = 4/4 = minConf.
Hence, this rule will be added to the results.

LBA for Mining ARs
67

At first, LBA-ARs-2009 traverses all child nodes Lc
of the root of FIL-2009, and then it calls a recursively
function to traverse all nodes in the lattice (recursively and mark the visited nodes by turning the flag
on). Then, this algorithm uses a queue (Ω) for traversing all child nodes of Lc (and marking all of the visited nodes for rejecting coincides). For each child
node (of Lc), this algorithm computes the confidence
of rule basing on the information of this node. If the


3. Let L = AT, the first element of Ω. We have c(A
! AT) = σ(AT)/σ(A) = 3/4 < minConf. Hence,
this rule A ! AT will not be added to the
results.
Next, LBA-ARs-2009 will perform recursively
to generate all rules on the lattice.

TABLE 4 | Existing Lattice-based Approaches (LBAs) for Mining

LBA for Mining ARs with Interestingness
Measures

Association Rules (ARs)

No

Name of Algorithm

Authors (Year)
67

Type of ARs

1

LBA-ARs-2009

Vo and Le


2

LBA-ARs-IM-2011

Vo and Le60

ARs (with
interestingness
measures)

3

MNARs-FCIL

Vo and Le68

MNARs

4

70

MGARs-FCIL

Vo et al.

After building FIL-2009, LBA-ARs-IM-201160 will
create the HT-FIs (Hash table of FIs) including two
levels of key: (1) first level uses the length of the itemset as a key. (2) In case of the itemsets sharing the
same length, the

P algorithm uses hash tables with keys
computed by y2Y (Y is the itemset which needs to
determine its support).
At first, LBA-ARs-IM-2011 traverses all child
nodes Lc of the root of FIL-2009, and then it calls a

ARs

MGARs

MGARs, most generalization association rules; MNARs, minimal nonredundant association rules.

{}

D2456
4

A1345
4

AT135
3

AW1345
4

ATW135
3

AC1345

4

DW245
3

ATC135
3

T1356
4

W12345
5

DC2456
4

TW135
3

AWC1345
4

C123456
6

TC1356
4

DWC245

3

WC12345
5

TWC135
3

ATWC135
3

FI GU RE 6 | Association rules generation on node A.
Volume 6, July/August 2016

© 2016 John Wiley & Sons, Ltd

145


Advanced Review

wires.wiley.com/dmkd

recursively function to traverse all nodes in the lattice
(recursively marks the visited nodes if the flag is
turned on). This algorithm uses a queue for traversing all child nodes of Lc (and marks all of the visited
nodes for rejecting coincides). For each child node
(of Lc), the authors compute the measure value by
using vm(n, σ(Lc), λ(L\Lc), σ(L)) function (where n is
the number of transactions, σ(Lc) is support of Lc,

σ(L) is support of L and λ(L\Lc) = get support from
the hash table |L\Lc|th), and add this rule into ARs.
There are a number of measures shown in Table 5.
In fact, the number of generated rules is very large.
Therefore, we need to use a threshold of vm to
shrink the rules set.
For example, LBA-ARs-2009 uses FIL-2009 for
DBe with minSup = 50%, which is shown in
Figure 1, to generate ARs with Lift measures. Let
minVM = 1.2. Considering the first child node, A, of
root, Ω = {AT, AW, AC} (Figure 7).

Hence, this rule will not be added to the
results.
2. Let L = AW, the second element of Ω. We have
vm(A ! AW) = (4 × 6)/(4 × 5) = 1.2 =
minVM. Hence, this rule will be added to the
results.
3. Let L = AT, the first element of Ω. We have vm
(A ! AT) = (4 × 6)/(4 × 4) = 1.5 < minVM.
Hence, this rule will be added to the results.
Next, LBA-ARs-2009 will recursively perform
to generate all rules on the lattice.

LBA for Mining MNARs
Firstly, MNARs-FCIL traverses all child nodes Lc of
the root of FIL-2011, in which each of the nodes has
one field marking whether or not a lattice node is a
minimal generator (mG) and another field indicating
whether or not a lattice node is a closed. Then it calls

a function to traverse all nodes on the lattice (recursively and marks the visited nodes by turning the flag
on). This algorithm uses a queue to traverse all child
nodes of Lc (and marks all visited nodes to reject
coincide). For each child node L of Lc (Lc is a minimal generator), the algorithm computes c(Lc ! L\Lc),
if L is an FCI and c(Lc ! L\Lc) ≥ minConf then
this algorithm adds Lc ! L\Lc to the results.
For example, MNARs-FCIL uses FIL-2011 for
DBe with minSup = 50% (e.g., Figure 2) to generate
MNARs. Let minConf = 100%. Considering the first
child node, A, of root. Because A is minimal generator, and c(A ! AWC) = 4/4 = minConf, A ! AWC
will be added to the results (e.g., Figure 8). Next,

1. Let L = AC, the last element of Ω. We have vm
(A ! AC) = (4 × 6)/(4 × 6) = 1 < minVM.
TABLE 5 | Value of Some Measures with Rule X ! Y
No

Measures

1

Confidence

Equations

2

Cosine

3


Lift

4

Rule interest

5

Laplace

6

Jaccard

7

Phi-coefficient

nXY
nX
nXY ffi
pffiffiffiffiffiffiffiffi
nX ×nY
nXY ×n
nX ×nY
nXY − nX ×n nY
nXY + 1
nX + 2
nXY

nX + nY − nXY
n
×n − nX ×nY
pXYffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
nX ×nY ×nX ×nY

{}

A1345
4

AT135
3

AW1345
4

ATW135
3

D2456
4

AC1345
4

DW245
3

ATC135

3

T1356
4

W12345
5

DC2456
4

TW135
3

AWC1345
4

DWC245
3

C123456
6

TC1356
4

WC12345
5

TWC135

3

ATWC135
3

FI G URE 7 | Association rules with interestingness measures generation on node A.
146

© 2016 John Wiley & Sons, Ltd

Volume 6, July/August 2016


WIREs Data Mining and Knowledge Discovery

The lattice-based approaches for mining association rules

{}

A1345
4

AT135
3

AW1345
4

D2456
4


AC1345
4

ATW135
3

DW245
3

ATC135
3

T1356
4

W12345
5

DC2456
4

TW135
3

AWC1345
4

C123456
6


TC1356
4

DWC245
3

WC12345
5

TWC135
3

ATWC135
3

FI GU RE 8 | Minimal non-redundant association rules generation on node A.
MNARs-FCIL will recursively perform this process
to generate all of the MNARs on the lattice.

LBA for Mining MGARs
In 2013, Vo et al.70 proposed a LBA for mining
MGARs. A theorem was introduced. Given three
nodes l1, l2, and l3 in FCIL-2013, if l1 is the parent
node of l2, l2 is the parent node of l3, and σ(l2)/σ(l1) <
minConf, then σ(l3)/σ(l1) < minConf. According to
this theorem, if a lattice node {Y} is a child node of
{X} in the FCIL and σ(Y)/σ(X) < minConf, then the
child nodes of {Y} cannot form rules with {X}.
In details, at first, MGARs-FCIL traverses all

the FCI in FCIs. For each FCI, C, it then initializes
the RHS (right-hand side) to ;. It then generates rules
from the minimal generators of C to C. This algorithm uses a queue (Ω) to traverse all the child nodes
of {C} (and marks all the visited nodes to avoid coincidence). For each child node Ls of {C}, the confidence of all rules with the form X0 ! Ls\X0 (X0 2
mG(C)) is then calculated. If the confidence satisfies
minConf and Ls is not marked, then Ls is added to Ω
for generating all rules from C to Ls. After Ls is
added to Ω, it is marked to avoid coincidence in the
future. This process is subsequently repeated to generate all rules on the lattice.

LBAs FOR MINING ARs
ON DYNAMIC DATABASES
In 2014, Vo et al.69 proposed two effective
approaches for maintaining an FIL with dynamically
inserted data based on the pre-large and tidset/diffset

Volume 6, July/August 2016

concepts. The pre-large concept72 is proposed basing
c to reduce the
on a safety threshold f = bðSU −1 S−SL Þ×jDj
U
need of rescanning the original database for efficiently maintaining ARs. In which, SU is the upper
threshold, SL is the lower threshold, and |D| is the
number of original database D’s transactions. When
the number of new transactions is equal to or less
than f, the algorithm does not need to rescan the
original database. The FIL with pre-large concept is
called by PFIL (Pre-large FIL). In tidset-based maintenance of a pre-large FIL (TMPFIL) and diffset-based
maintenance of a PFIL (DMPFIL), for each of the

increments, the process of this algorithm is described
as follows. (1) If the original database is empty, the
algorithm uses the lower threshold SL to build a PFIL
and recalculates the safety threshold f for incremental
database D0 . (2) If the number of transactions in
incremental database D0 is larger than f, the algorithm uses SL to build a PFIL and recalculates the
safety threshold f for D + D0 . (3) If the number of
transactions in incremental database D0 is equal to or
less than f, the algorithm updates the PFIL without
scanning the database. (4) The original database is
updated as D = D + D0 . The experimental results69
show that DMPFIL outweighs both of TMPFIL and
the batch approach in terms of execution time
required to build an FIL (Table 6).
In 2014, La et al.73 proposed MFCIL-2014 for
maintaining an FCIL with dynamically inserted data
based on the pre-large concepts. The process of this
algorithm is illustrated as follows. (1) Building the
initial FCIL-2005 with CHARM-L58 with SL.
(2) Building an index table for the initial lattice from
step 1. (3) Adding transactions, one by one, to the

© 2016 John Wiley & Sons, Ltd

147


Advanced Review

wires.wiley.com/dmkd


TABLE 6 | Existing Lattice-based Approaches (LBAs) for Mining
Association Rules (ARs) on Dynamic Databases
No

Name of Algorithm
69

Year

Structure Actions

1

TMPFIL and DMPFIL

2014

FIL

Inserted data

2

MFCIL-201473

2014

FCIL


Inserted data

3

TiFU-FIL and DiFU-FIL74

2015

FIL

Deleted data

FCIL, frequent closed itemset lattice; FIL, frequent itemset lattice.

lattice with the improved CLICL75 algorithm when
the number of inserted transactions is lower than f.
(4) Rescanning the entire database when the rescanning value reaches f, then going back to Step
1. Experimental results73 show that MFCIL-2014
outperforms CLICL in terms of both execution time
and memory space.
In 2015, based on pre-large and tidset/diffset
concepts, Vo et al.74 proposed two algorithms (TiFUFIL and DiFU-FIL algorithms) for updating PFIL
with transaction deletion. The experimental results74
show that the two approaches outperform the batchmode algorithm in building PFIL, with the diffsetbased approach (DiFU-FIL) being more efficient than
the tidset-based approach (TiFU-FIL).
Current incremental approaches are mainly
used on pre-large concept which require to rescan the
database when the number of inserted or updated
transactions over a safety threshold. Thus, it is
important to find methods that can mine ARs without rescanning database toward mining ARs on data

streams.

COMPLEXITY ANALYSIS
The complexity of mining FIs/FCIs in the worst case
is O(2|I|) where |I| is the number of items in the database. The complexity for building an FIL from the
database67–69 is O(2|I| × k), where k is the average
number of all subsets of all FIs. In addition, the complexity for building an FCIL from the set of FCIs70 in
the worst case is O(n × k), where n is the number of
FCIs and k is the average number of all subsets of all
FCIs. Therefore, the complexity for building an FCIL
from the database70 is O(2|I| + n × k). Fortunately,
Vo et al.70 shown that k  n and n  2|I| in most
databases, therefore, the overall computational complexity of FIL/FCIL building67–70 is O(2|I|), the same
with that of mining FIs/FCIs.
For generation of ARs/MNARs/MGARs from
built LBAs, the complexity is O(n × k), where n is

148

the number of nodes on FIL/FCIL and k is the average number of all subsets of all FCIs with k  n.
Meanwhile, mining ARs from FIs/FCIs requires O
(n2). Therefore, LBAs for mining ARs/MNARs/
MGARs are especially effective in the case users need
to mine ARs/MNARs/MGARs with many different
confidences or many different minimum support
thresholds (the thresholds have to be greater than or
equal to the threshold used to build lattices).

CONCLUSION AND FUTURE
RESEARCHES

LBAs for mining ARs are new approaches that comprise of two phrases: FIL/FCIL building and generating ARs from the lattice. Total mining time of LBAs
for mining ARs outperforms the traditional methods
for mining ARs, especially when the number of FIs/FCIs is large. In this article, we survey the existing
LBAs for mining ARs on static and dynamic databases. First, we present some of the building method
of lattice on static databases including FIL and FCIL.
Next, methods using LBAs for generating traditional
ARs, MNARs and MGARs from FIL/FCIL are presented. Then, maintenance FIL/FCIL approaches
toward mining ARs for dynamic databases including
inserted and deleted transactions are surveyed.
In reality, that the number of FIs/FCIs is often
large and end users just interested in a small set concerning a certain number of issues. Therefore, mining
FIs/FCIs with constraints are proposed. However,
FIL/FCIL with constrains building is still an open
challenge.
Although methods for maintaining an FIL with
inserted and deleted transactions are proposed, a
general method that facilitates to maintain an FIL
with inserted, deleted and updated data is quite necessary. For FCIL, the study on methods for maintaining an FCIL with deleted and updated transactions,
and as well as a general method for maintaining an
FCIL with all operations are necessary.
In addition, current incremental approaches are
mainly used on pre-large concept. These methods
require to rescan the database when the number of
inserted or updated transactions over a safety threshold. Thus, it is crucial to investigate methods that
can mine ARs without rescanning database toward
mining ARs on data streams. Finally, examining
LBAs methods for mining ARs on quantitative database is also a potential research direction.

© 2016 John Wiley & Sons, Ltd


Volume 6, July/August 2016


WIREs Data Mining and Knowledge Discovery

The lattice-based approaches for mining association rules

REFERENCES
1. Menardi G, Torelli N. Training and assessing classification rules with imbalanced data. Data Min Knowl
Discov 2014, 28:92–122.

16. Lin CW, Hong TP, Lan GC, Wong JW, Lin WY.
Incrementally mining high utility patterns based on
pre-large concept. Appl Intell 2014, 40:343–357.

2. Nassirtoussi AK, Aghabozorgi SR, The Y.W,
Ngo DCL. Text mining for market prediction: a systematic review. Expert Syst Appl 2014, 41:7653–7670.

17. Lin JCW, Gan W, Hong TP. A fast updated algorithm
to maintain the discovered high-utility itemsets for
transaction modification. Adv Eng Inform 2015,
29:562–574.

3. Niemann U, Völzke H, Kühn JP, Spiliopoulou M.
Learning and inspecting classification rules from longitudinal epidemiological data to identify predictive features on hepatic steatosis. Expert Syst Appl 2014,
41:5405–5415.
4. Nguyen TTL, Vo B, Hong TP, Hoang CT. Classification based on association rules: a lattice-based
approach. Expert Syst Appl 2012, 39:11357–11366.
5. Shindea S, Kulkarnib U. Extracting classification rules
from modified fuzzy min–max neural network for data

with mixed attributes. Appl Soft Comput 2016,
40:364–378.
6. Wang X, Liu X, Pedrycz W, Zhu X, Hu G. Mining
axiomatic fuzzy set association rules for classification
problems. Eur J Oper Res 2012, 218:202–210.
7. Le HS. A novel kernel fuzzy clustering algorithm for
Geo-Demographic Analysis. Inform Sci 2015,
317:202–223.
8. Mai TS, He X, Feng J, Plant C, Böhm C. Anytime
density-based clustering of COMPLEx data. Knowl Inf
Syst 2015, 45:319–355.
9. Indurkhya N. Emerging directions in predictive text
mining. Data Min Knowl Discov 2015, 5:155–164.
10. Vairavasundaram
S,
Varadharajan
V,
Vairavasundaram I, Ravi L. Data mining-based tag
recommendation system: an overview. Data Min
Knowl Discov 2015, 5:87–112.

18. Lin JCW, Gan W, Hong TP, Tseng VS. Efficient algorithms for mining up-to-date high-utility patterns. Adv
Eng Inform 2015, 29:648–661.
19. Song W, Liu Y, Li J. Mining high utility itemsets by
dynamically pruning the tree structure. Appl Intell
2014, 40:29–43.
20. Song W, Liu Y, Li J. BAHUI: fast and memory efficient
mining of high utility itemsets based on bitmap. Int J
Data Warehouse Min 2014, 10:1–15.
21. Tseng VS, Wu CW, Fournier-Viger P, Yu PS. Efficient

algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl
Data Eng 2015, 27:726–739.
22. Yun U, Ryang H. Incremental high utility pattern mining with static and dynamic databases. Appl Intell
2015, 42:323–352.
23. Zhang X, Deng ZH. Mining summarization of high
utility itemsets. Knowl-Based Syst 2015, 84:67–77.
24. Kim D, Yun U. Efficient mining of high utility patterns
with considering of rarity and length. Appl Intell, in
press, doi:10.1007/s10489-015-0750-2.
25. Ryang H, Yun U, Ryu K. Fast algorithm for high utility pattern mining with the sum of item quantities.
Intell Data Anal 2016, 20:395–415.
26. Ryang H, Yun U, Ryu K. Discovering high utility itemsets with multiple minimum supports. Intell Data Anal
2014, 18:1027–1047.

11. Fariha A, Ahmed CF, Leung CK, Samiullah M,
Pervin S, Cao L. A new framework for mining frequent
interaction patterns from meeting databases. Eng Appl
Artif Intel 2015, 45:103–118.

27. Yun U, Ryang H, Ryu K. High utility itemset mining
with techniques for reducing overestimated utilities
and pruning candidates. Expert Syst Appl 2014,
41:3861–3878.

12. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A,
Wu CW, Tseng VS. SPMF: a Java open-source pattern
mining library. J Mach Learn Res 2014,
15:3389–3393.

28. Tseng VS, Wu CW, Fournier-Viger P, Yu PS. Efficient

algorithms for mining top-k high utility itemsets. IEEE
Trans Knowl Data Eng 2016, 28:54–67.

13. Hacene MR, Huchard M, Napoli A, Valtchev P. Relational concept analysis: mining concept lattices from
multi-relational data. Ann Math Artif Intell 2013,
67:81–108.
14. Lan GC, Hong TP, Tseng VS. An efficient projectionbased indexing approach for mining high utility itemsets. Knowl Inf Syst 2014, 38:85–107.
15. Lin CW, Lan GC, Hong TP. Mining high utility itemsets for transaction deletion in a dynamic database.
Intell Data Anal 2015, 19:43–55.

Volume 6, July/August 2016

29. Lee G, Yun U, Ryang H. An uncertainty-based
approach: frequent itemset mining from uncertain data
with different item importance. Knowl-Based Syst
2015, 90:239–256.
30. Yun U, Pyun G, Yoon E. Efficient mining of robust
closed weighted sequential patterns without information loss. Int J Artif Intell Tools 2015, 24:1–28.
31. Yun U, Yoon E. An efficient approach for mining
weighted approximate closed frequent patterns considering noise constraints. Int J Uncertainty Fuzziness
Knowl Based Syst 2014, 22:879–912.

© 2016 John Wiley & Sons, Ltd

149


Advanced Review

wires.wiley.com/dmkd


32. Deng ZH, Xu X. Fast mining erasable itemsets using
NC_sets. Expert Syst Appl 2012, 39:4453–4463.
33. Le T, Vo B, Nguyen G. A survey of erasable itemset
mining algorithms. Data Min Knowl Discov 2014,
4:356–379.
34. Nguyen G, Le T, Vo B, Le B. EIFDD: an efficient
approach for erasable itemset mining of very dense
datasets. Appl Intell 2015, 43:85–94.
35. Lee G, Yun U, Ryang H. Mining weighted erasable
patterns by using underestimated constraint-based
pruning technique. J Intell Fuzzy Syst 2015,
28:1145–1157.
36. Yun U, Lee G. Sliding window based weighted erasable stream pattern mining for stream data applications.
Future Gener Comput Syst 2016, 59:1–20.
37. Le T, Vo B. An N-list-based algorithm for mining frequent closed patterns. Expert Syst Appl 2015,
42:6648–6657.
38. Liu XB, Zhai K, Pedrycz W. An improved association
rules mining method. Expert Syst Appl 2012,
39:1362–1374.
39. Pei J, Han J, Mao R. CLOSET: an efficient algorithm
for mining frequent closed itemsets. In: 5th ACMSIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, 21–30.
40. Tran A, Truong CT, Le B. Simultaneous mining of frequent closed itemsets and their generators: foundation
and algorithm. Eng Appl Artif Intel 2014, 36:64–80.
41. Vo B, Hong TP, Le B. DBV-Miner: a dynamic bitvector approach for fast mining frequent closed itemsets. Expert Syst Appl 2012, 39:7196–7206.
42. Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T.
MAFIA: a maximal frequent itemset algorithm. IEEE
Trans Knowl Data Eng 2005, 17:1490–1504.
43. Gouda K, Zaki MJ. GenMax: an efficient algorithm
for mining maximal frequent itemsets. Data Min

Knowl Discov 2005, 11:223–242.
44. Yun U, Lee G. Incremental mining of weighted maximal frequent itemsets from dynamic databases. Expert
Syst Appl 2016, 54:304–327.
45. Lee G, Yun U, Ryu K. Sliding window based weighted
maximal frequent pattern mining over data streams.
Expert Syst Appl 2014, 41:694–708.
46. Yun U, Lee G, Ryu K. Mining maximal frequent patterns by considering weight conditions over data
streams. Knowl-Based Syst 2014, 55:49–65.
47. Yun U, Ryu K. Efficient mining of maximal correlated
weight frequent patterns. Intell Data Anal 2013,
17:917–939.
48. Pyun G, Yun U. Mining top-k frequent patterns with
combination reducing techniques. Appl Intell 2014,
41:76–98.
49. Ryang H, Yun U. Top-k high utility pattern mining
with effective threshold raising strategies. Knowl-Based
Syst 2015, 76:109–126.

150

50. Deng ZH. Fast mining top-rank-k frequent patterns by
using
node-lists.
Expert
Syst
Appl
2014,
41:1763–1768.
51. Huynh TLQ, Vo B, Le B. An efficient and effective
algorithm for mining top-rank-k frequent patterns.

Expert Syst Appl 2015, 42:156–164.
52. Duong VH, Truong TC, Vo B. An efficient method for
mining frequent itemsets with double constraints. Eng
Appl Artif Intel 2014, 27:148–154.
53. Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of VLDB Conference,
1994, 487–499
54. Dong J, Han M. BitTableFI: an efficient mining frequent itemsets algorithm. Knowl-Based Syst 2007,
20:329–335.
55. Han J, Pei J, Yin Y. Mining frequent patterns without
candidate generation. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, 2000, 1–12.
56. Grahne G, Zhu J. Fast algorithms for frequent itemset
mining using FP-trees. IEEE Trans Knowl Data Eng
2005, 17:1347–1362.
57. Zaki MJ, Parthasarathy S, Ogihara M, Li W. New
algorithms for fast discovery of association rules. In:
3rd International Conference on Knowledge Discovery
and Data Mining (KDD), 1997, 283–286
58. Zaki MJ, Hsiao CJ. Efficient algorithms for mining
closed itemsets and their lattice structure. IEEE Trans
Knowl Data Eng 2005, 17:462–478.
59. Song W, Yang B, Xu Z. Index-BitTableFI: an
improved algorithm for mining frequent itemsets.
Knowl-Based Syst 2008, 21:507–513.
60. Vo B, Le B. Interestingness measures for association
rules: combination between lattice and hash tables.
Expert Syst Appl 2011, 38:11630–11640.
61. Deng Z, Wang Z, Jiang JJ. A new algorithm for fast
mining frequent itemsets using N-lists. Sci China
Inform Sci 2012, 55:2008–2030.

62. Deng ZH, Lv SL. Fast mining frequent itemsets using
nodesets. Expert Syst Appl 2014, 41:4505–4512.
63. Vo B, Le T, Coenen F, Hong TP. Mining frequent
itemsets using the N-list and subsume concepts. Int J
Mach Learn Cybern, 2016, 7:253–265.
64. Deng ZH, Lv SL. PrePost+: an efficient N-lists-based
algorithm for mining frequent itemsets via childrenparent equivalence pruning. Expert Syst Appl 2015,
42:5424–5432.
65. Pasquier N, Bastide Y, Taouil R, Lakhal L. Efficient
mining of association rules using closed itemset lattices. Inf Syst 1999, 24:25–46.
66. Sahoo J, Das AK, Goswami A. An effective association
rule mining scheme using a new generic basis. Knowl
Inf Syst 2015, 43:127–156.
67. Vo B, Le B. Mining traditional association rules using
frequent itemsets lattice. In: 39th International

© 2016 John Wiley & Sons, Ltd

Volume 6, July/August 2016


WIREs Data Mining and Knowledge Discovery

The lattice-based approaches for mining association rules

Conference on Computers & Industrial Engineering,
2009, 1401–1406.
68. Vo B, Le B. Mining minimal non-redundant association rules using frequent itemsets lattice. Int J Intell
Syst Technol Appl 2011, 10:92–106.
69. Vo B, Le T, Hong TP, Le B. An effective approach for

maintenance of pre-large-based frequent-itemset lattice
in incremental mining. Appl Intell 2014, 41:759–775.
70. Vo B, Hong TP, Le B. A lattice-based approach for
mining most generalization association rules. KnowlBased Syst 2013, 45:20–30.
71. Szathmary L, Valtchev P, Napoli A, Godin R, Boc A,
Makarenkov V. A fast compound algorithm for mining generators, closed itemsets, and computing links

Volume 6, July/August 2016

72.

73.

74.

75.

between equivalence classes. Ann Math Artif Intell
2014, 70:81–105.
Hong TP, Wang CY, Tao YH. A new incremental data
mining algorithm using pre-large itemsets. Intell Data
Anal 2001, 5:111–129.
La PT, Le B, Vo B. Incrementally building frequent
closed itemset lattice. Expert Syst Appl 2014,
41:2703–2712.
Vo B, Le T, Hong TP, Le B. Fast updated frequentitemset lattice for transaction deletion. Data Knowl
Eng 2015, 96:78–89.
Gupta A, Bhatnagar V, Kumar N. Mining closed itemsets in data stream using formal concept analysis. In:
12th International Conference on Data Warehousing
and Knowledge Discovery, 2010, 285–296.


© 2016 John Wiley & Sons, Ltd

151



×