Tải bản đầy đủ (.pdf) (10 trang)

Data Mining and Knowledge Discovery Handbook, 2 Edition part 63 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (430.06 KB, 10 trang )

600 Noa Ruschin Rimini and Oded Maimon
On the next phase of the research we plan to further develop the proposed scheme which
is based on fractal representation, to account for online changes in monitored processes. We
plan to suggest a novel type of online interactive SPC chart that enables a dynamic inspection
of non-linear state dependant processes.
The presented algorithmic framework is applicable for many practical domains, for exam-
ple visual analysis of the affect of operation sequence on product quality (See Ruschin-Rimini
et al., 2009), visual analysis of customers action history, visual analysis of products defect
codes history, and more.
The developed application was utilized by the General Motors research labs located in
Bangalore, India, for visual analysis of vehicle failure history.
References
Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition
Letters, 27(14): 1619–1631, 2006, Elsevier.
Barnsley M., Fractals Everywhere, Academic Press, Boston, 1988
Barnsley, M., Hurd L. P., Fractal Image Compression, A. K. Peters, Boston, 1993
Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with
Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007.
Da Cunha C., Agard B., and Kusiak A., Data mining for improvement of product quality,
International Journal of Production Research, 44(18-19), pp. 4027-4041, 2006
Falconer K., Techniques in Fractal geometry, John Wiley & Sons, 1997
Jeffrey H. J., Chaos game representation of genetic sequences, Nucleic Acids Res., vol. 18,
pp. 2163 – 2170, 1990
Keim D. A., Information Visualization and Visual Data mining, IEEE Transactions of Visu-
alization and Computer Graphics, Vol. 7, No. 1, pp. 100-107, 2002
Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors
manufacturing case study, in Data Mining for Design and Manufacturing: Methods and
Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001.
Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behav-
ioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–
4566, 2008.


Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993
Rokach L., Mining manufacturing data using genetic algorithm-based feature set decompo-
sition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008.
Rokach L., Genetic algorithm-based feature set partitioning for classification prob-
lems,Pattern Recognition, 41(5):1676–1700, 2008.
Rokach, L., Decomposition methodology for classification tasks: a meta decomposer frame-
work, Pattern Analysis and Applications, 9(2006):257–271.
Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE In-
ternational Conference on Data Mining, IEEE Computer Society Press, pp. 473–480,
2001.
Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intel-
ligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158.
Rokach L., and Maimon O., Data mining for improving the quality of manufacturing: A
feature set decomposition approach. Journal of Intelligent Manufacturing, 17(23.3), pp.
285-299, 2006
29 Visual Analysis of Sequences Using Fractal Geometry 601
Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor
fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3)
(2006), pp. 329–350.
Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical
Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer-
Verlag, 2004.
Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Ap-
proach, Proceedings of the 14th International Symposium On Methodologies For Intel-
ligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag,
2003, pp. 24–31.
Rokach L., Romano R. and Maimon O., Mining manufacturing databases to discover the
effect of operation sequence on the product quality, Journal of Intelligent Manufacturing,
2008
Ruschin-Rimini N., Maimon O. and Romano R., Visual Analysis of Quality-related Manu-

facturing Data Using Fractal Geometry, working paper submitted for publication, 2009.
Weiss C. H., Visual Analysis of Categorical Time Series, Statistical Methodology 5, pp. 56-
71, 2008

30
Interestingness Measures - On Determining What Is
Interesting
Sigal Sahar
Department of Computer Science,
Tel-Aviv University, Israel

Summary. As the size of databases increases, the sheer number of mined from them can
easily overwhelm users of the KDD process. Users run the KDD process because they are
overloaded by data. To be successful, the KDD process needs to extract interesting patterns
from large masses of data. In this chapter we examine methods of tackling this challenge: how
to identify interesting patterns.
Key words: Interestingness Measures, Association Rules
Introduction
According to (Fayyad et al., 1996) “Knowledge Discovery in Databases (KDD) is the non-
trivial process of identifying valid, novel, potentially useful, and ultimately understandable
patterns in data.” Mining algorithms primarily focus on discovering patterns in data, for exam-
ple, the Apriori algorithm (Agrawal and Shafer, 1996) outputs the exhaustive list of association
rules that have at least the predefined support and confidence thresholds. Interestingness dif-
ferentiates between the “valid, novel, potentially useful and ultimately understandable” mined
association rules and those that are not—differentiating the interesting patterns from those that
are not interesting. Thus, determining what is interesting, or interestingness, is a critical part
of the KDD process. In this chapter we review the main approaches to determining what is
interesting.
Figure 30.1 summarizes the three main types of interestingness measures, or approaches to
determining what is interesting. Subjective interestingness explicitly relies on users’ specific

needs and prior knowledge. Since what is interesting to any user is ultimately subjective, these
subjective interestingness measures will have to be used to reach any complete solution of
determining what is interesting. (Silberschatz and Tuzhilin, 1996) differentiate between sub-
jective and objective interestingness. Objective interestingness refers to measures of interest
“where interestingness of a pattern is measured in terms of its structure and the underlying
data used in the discovery process” (Silberschatz and Tuzhilin, 1996) but requires user inter-
vention to select which of these measures to use and to initialize it. Impartial interestingness,
introduced in (Sahar, 2001), refers to measures of interest that can be applied automatically
O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_30, © Springer Science+Business Media, LLC 2010
604 Sigal Sahar
Pruning &
constraints
Summarization
Ranking
patterns
Interest via
not interesting
Rule-by-rule
classification
Expert/
grammar
Fig. 30.1. Types of Interestingness Approaches.
to the output of any association rule mining algorithm to reduce the number of not-interesting
rules independently of the domain, task and users.
30.1 Definitions and Notations
Let
Λ
be a set of attributes over the boolean domain.
Λ

is the superset of all attributes we
discuss in this chapter. An itemset I is a set of attributes: I ⊆
Λ
.Atransaction is a subset of
attributes of
Λ
that have the boolean value TRUE. We will refer to the set of transactions over
Λ
as a database. If exactly s% of the transactions in the database contain an itemset I then we
say that I has support s, and express the support of I as P(I). Given a support threshold s we
will call itemsets that have at least support s large or frequent.
Let A and B be two sets of attributes such that A,B ⊆
Λ
and A ∩B = /0. Let D be a set of
transactions over
Λ
. Following the definition in (Agrawal and Shafer, 1996), an association
rule A→B is defined to have support s% and confidence c%inD if s% of the transactions in
D contain A ∪B and c% of the transactions that contain A also contain B. For convenience,
in an association rule A→B we will refer to A as the assumption and B as the consequent of
the rule. We will express the support of A→B as P(A ∪B). We will express the confidence of
A→B as P(B|A) and denote it with
confidence
(A→B). (Agrawal and Shafer, 1996) presents
an elegantly simple algorithm to mine the exhaustive list of association rules that have at least
predefined support and confidence thresholds from a boolean database.
30.2 Subjective Interestingness
What is interesting to users is ultimately subjective; what is interesting to one user may be
known or irrelevant, and therefore not interesting, to another user. To determine what is sub-
jectively interesting, users’ domain knowledge—or at least the portion of it that pertains to the

data at hand—needs to be incorporated into the solution. In this section we review the three
main approaches to this problem.
30 Interestingness Measures 605
30.2.1 The Expert-Driven Grammatical Approach
In the first and most popular approach, the domain knowledge required to subjectively de-
termine which rules are interesting is explicitly described through a predefined grammar. In
this approach a domain expert is expected to express, using the predefined grammar what is, or
what is not, interesting. This approach was introduced by (Klemettinen et al., 1994), who were
the first to apply subjective interestingness, and many other applications followed. (Klemetti-
nen et al., 1994) define pattern templates that describe the structure of interesting association
rules through inclusive templates and the structure of not-interesting rules using restrictive
templates. (Liu et al., 1997) present a formal grammar that allows the expression of imprecise
or vague domain knowledge, the General Impressions. (Srikant et al., 1997) introduce into
the mining process user defined constraints, including taxonomical constraints, in the form of
boolean expressions, and (Ng et al., 1998) introduce user constraints as part of an architecture
that supports exploratory association rule mining. (Padmanabhan and Tuzhilin, 2000) use a
set of predefined user beliefs in the mining process to output a minimal set of unexpected as-
sociation rules with respect to that set of beliefs. (Adomavicius and Tuzhilin, 1997) define an
action hierarchy to determine which which association rules are actionable; actionability is an
aspect of being subjectively interesting. (Adomavicius and Tuzhilin, 2001, Tuzhilin and Ado-
mavicius, 2002) iteratively apply expert-driven validation operators to incorporate subjective
interestingness in the personalization and bioinformatics domains.
In some cases the required domain knowledge can be obtained from a pre-existing knowl-
edge base, thus eliminating the need to engage directly with a domain expert to acquire it.
For example, in (Basu et al., 2001) the WordNet lexical knowledge-base is used to measure
the novelty—an indicator of interest—of an association rule by assessing the dissimilarity be-
tween the assumption and the consequent of the rule. An example of a domain where such
a knowledge base exists naturally is when detecting rule changes over time, as in (Liu et al.,
2001a). In many domains, these knowledge-bases are not readily available. In those cases the
success of this approach is conditioned on the availability of a domain expert willing and able

to complete the task of defining all the required domain knowledge. This is no easy task: the
domain expert may unintentionally neglect to define some of the required domain knowledge,
some of it may not be applicable across all cases, and could change over time. Acquiring such
a domain expert for the duration of the task is often costly and sometimes unfeasible. But
given the domain knowledge required, this approach can output the small set of subjectively
interesting rules.
30.2.2 The Rule-By-Rule Classification Approach
In the second approach, taken in (Subramonian, 1998), the required domain knowledge base is
constructed by classifying rules from prior mining sessions. This approach does not depend on
the availability of domain experts to define the domain knowledge, but does require very in-
tensive user interaction of a mundane nature. Although the knowledge base can be constructed
incrementally, this, as the author says, can be a tedious process.
30.2.3 Interestingness Via What Is Not Interesting Approach
The third approach, introduced by (Sahar, 1999), capitalizes on an inherent aspect in the inter-
estingness task: the majority of the mined association rules are not interesting. In this approach
a user is iteratively presented with simple rules, with only one attribute in their assumption and
606 Sigal Sahar
one attribute in the consequent, for classification. These rules are selected so that a single user
classification of a rule can imply that a large number of the mined association rules are also
not-interesting. The advantages of this approach are that it is simple so that a naive user can
use it without depending on a domain expert to provide input, that it very quickly, with only
a few questions, can eliminate a significant portion of the not-interesting rules, and that it cir-
cumvents the need to define why a rule is interesting. However, this approach is used only to
reduce the size of the interestingness problem by substantially decreasing the number of po-
tentially interesting association rules, rather than pinpointing the exact set of interesting rules.
This approach has been integrated into the mining process in (Sahar, 2002b).
30.3 Objective Interestingness
The domain knowledge needed in order to apply subjective interestingness criteria is difficult
to obtain. Although subjective interestingness is needed to reach the short list of interesting
patterns, much can be done without explicitly using domain knowledge. The application of

objective interestingness measures depends only the structure of the data and the patterns
extracted from it; some user intervention will still be required to select the measure to be used,
etc. In this section we review the three main types of objective interestingness measures.
30.3.1 Ranking Patterns
To rank association rules according to their interestingness, a mapping, f , is introduced from
the set of mined rules,
Ω
, to the domain of real numbers:
f :
Ω
→ ℜ. (30.1)
The number an association rule is mapped to is an indication of how interesting this rule is;
the larger the number a rule is mapped to, the more interesting the rule is assumed to be. Thus,
the mapping imposes an order, or ranking, of interest on a set of association rules.
Ranking rules according to their interest has been suggested in the literature as early
as (Piatetsky-Shapiro, 1991). (Piatetsky-Shapiro, 1991) introduced the first three principles
of interestingness evaluation criteria, as well as a simple mapping that could satisfy them:
P-S(A→B)=P(A ∪B) −P(B)·P(A). Since then many different mappings, or rankings, have
been proposed as measures of interest. Many definitions of such mappings, as well as their em-
pirical and theoretical evaluations, can be found in (Kl
¨
osgen, 1996, Bayardo Jr. and Agrawal,
1999, Sahar and Mansour, 1999, Hilderman and Hamilton, 2000, Hilderman and Hamilton,
2001, Tan et al., 2002). The work on the principles introduced by (Piatetsky-Shapiro, 1991)
has been expanded by (Major and Mangano, 1995, Kamber and Shinghal, 1996). (Tan et al.,
2002) extends the studies of the properties and principles of the ranking criteria. (Hilderman
and Hamilton, 2001) provide a very thorough review and study of these criteria, and introduce
an interestingness theory for them.
30.3.2 Pruning and Application of Constraints
The mapping in Equation 30.1 can also be used as a pruning technique: prune as not-interesting

all the association rules that are mapped to an interest score lower than a user-defined thresh-
old. Note that in this section we only refer to pruning and application of constraints performed
30 Interestingness Measures 607
using objective interestingness measures, and not subjective ones, such as removing rules if
they contain, or do not contain, certain attributes.
Additional methods can be used to prune association rules without requiring the use of
the an interest mapping. Statistical tests such as the
χ
2
test are used for pruning in (Brin et al.,
1997,Liu et al., 1999,Liu et al., 2001b). These tests have parameters that need to be initialized.
A collection of pruning methods is described in (Shah et al., 1999).
Another type of pruning is the constraint based approach of (Bayardo Jr. et al., 1999). To
output a more concise list of rules as the output of the mining process, the algorithm of (Ba-
yardo Jr. et al., 1999) only mines rules that comply with the usual constraints of minimum
support and confidence thresholds as well as with two new constraints. The first constraint is a
user-specified consequent (subjective interestingness). The second, unprecedented, constraint
is of a user-specified minimum confidence improvement threshold. Only rules whose confi-
dence is at least the minimum confidence improvement threshold greater than the confidence
of any of their simplifications are outputted; a simplification of a rule is formed by removing
one or more attributes from its assumption.
30.3.3 Summarization of Patterns
Several distinct methods fall under the summarization approach. (Aggarwal and Yu, 1998)
introduce a redundancy measure that summarizes all the rules at the predefined support and
confidence levels very compactly by using more ”complex” rules. The preference to complex
rules is formally defined as follows: a rule C→D is redundant with respect to A→B if (1)
A ∪B = C ∪D and A ⊂C, or (2) C ∪D ⊂A ∪B and A ⊆C. A different type of summary that
favors less ”complex” rules was introduced by (Liu et al., 1999). (Liu et al., 1999) provide a
summary of association rules with a single attributed consequent using a subset of ”direction-
setting” rules, rules that represent the direction a group of non-direction-setting rules follows.

The direction is calculated using the
χ
2
test, which is also used to prune the mined rules prior
to the discovery of direction-setting rules. (Liu et al., 2000) present a summary that simplifies
the discovered rules by providing an overall picture of the relationships in the data and their
exceptions. (Zaki, 2000) introduces an approach to mining only the non-redundant association
rules from which all the other rules can be inferred. (Zaki, 2000) also favors ”less-complex”
rules, defining a rule C→D to be redundant if there exists another rule A→B such that A ⊆C
and B ⊆ D and both rules have the same confidence.
(Adomavicius and Tuzhilin, 2001) introduce summarization through similarity based rule
grouping. The similarity measure is specified via an attribute hierarchy, organized by a domain
expert who also specifies a level of rule aggregation in the hierarchy, called a cut. The asso-
ciation rules are then mapped to aggregated rules by mapping to the cut, and the aggregated
rules form the summary of all the mined rules.
(Toivonen et al., 1995) suggest clustering rules “that make statements about the same
database rows [ ]” using a simple distance measure, and introduce an algorithm to compute
rule covers as short descriptions of large sets of rules. For this approach to work without los-
ing any information, (Toivonen et al., 1995) make a monotonicity assumption, restricting the
databases on which the algorithm can be used. (Sahar, 2002a) introduce a general clustering
framework for association rules to facilitate the exploration of masses of mined rules by au-
tomatically organizing them into groups according to similarity. To simplify interpretation of
the resulting clusters, (Sahar, 2002a) also introduces a data-inferred, concise representation of
the clusters, the ancestor coverage.
608 Sigal Sahar
30.4 Impartial Interestingness
To determine what is interesting, users need to first determine which interestingness measures
to use for the task. Determining interestingness according to different measures can result in
different sets of rules outputted as interesting. This dependence of the output of the interesting-
ness analysis on the interestingness measure used is clear when domain knowledge is applied

explicitly, in the case of the subjective interestingness measures (Section 30.2). When domain
knowledge is applied implicitly, this dependence may not be as clear, but it still exists. As (Sa-
har, 2001) shows, objective interestingness measures depend implicitly on domain knowledge.
This dependence is manifested during the selection of the objective interestingness measure
to be used, and, when applicable, during its initialization (for pruning and constraints) and the
interpretation of the results (for summarization).
(Sahar, 2001) introduces a new type of interestingness measure, as part of an interest-
ingness framework, that can be applied automatically to eliminate a portion of the rules that
is not interesting, as in Figure 30.2. This type of interestingness is called impartial interest-
ingness because it is domain-independent, task-independent, and user-independent, making
it impartial to all considerations affecting other interestingness measures. Since the impartial
interestingness measures do not require any user intervention, they can be applied sequentially
and automatically, directly following the Data Mining process, as depicted in Figure 30.2. The
impartial interestingness measure preprocess the mined rules to eliminate those rules that are
not interesting regardless of the domain, task and user, and so they form the Interestingness
PreProcessing Step. This step is followed by Interestingness Processing, which includes the
application of objective (when needed) and subjective interestingness criteria.

rules outputted
by a data-mining
algorithm
Interestingness
Processing:
(objective and subjective criteria)
pruning, summarization, etc.
Interestingness
PreProcessed
rules
Interesting
rules

Interestingness
PreProcessing
(impartial criteria)
includes several techniques
Fig. 30.2. Framework for Determining Interestingness.
To be able to define impartial measures, (Sahar, 2001) assume that the goal of the in-
terestingness analysis on a set of mined rules is to find a subset of interesting rules, rather
than to infer from the set of mined rules rules that have not been mined that could po-
tentially be interesting. An example of an impartial measure is (Overfitting, (Sahar, 2001))
the deletion of all rules r = A∪C→B if there exists another mined rule

r = A→B such that
confidence
(

r) ≥
confidence
(r).
30 Interestingness Measures 609
30.5 Concluding Remarks
Characterizing what is interesting is a difficult problem, primarily because what is interest-
ing is ultimately subjective. Numerous attempts have been made to formulate these qualities,
ranging from evidence and simplicity to novelty and actionability, with no formal definition
for “interestingness” emerging so far. In this chapter we reviewed the three main approaches
to tackling the challenge of discovering which rules are interesting under certain assumptions.
Some of the interestingness measures reviewed have been incorporated
into the mining process as opposed to being applied after the mining process.
(Spiliopoulou and Roddick, 2000) discuss the advantages of processing the set of rules
after the mining process, and introduce the concept of higher order mining, showing that
rules with higher order semantics can be extracted by processing the mined results. (Hipp

and G
¨
unter, 2002) argue that pushing constraints into the mining process ”[ ] is based on an
understanding of KDD that is no longer up-to-date” as KDD is an iterative discovery process
rather than ”pure hypothesis investigation”. There is no consensus on whether it is advisable
to push constraints into the mining process. An optimal solution is likely to be produced
through a balanced combination of these approaches; some interestingness measures (such
as the impartial ones) can be pushed into the mining process without overfitting its output
to match the subjective interests of only a small audience, permitting further interestingness
analysis that will tailor it to each user’s subjective needs.
Data Mining algorithms output patterns. Interestingness discovers the potentially inter-
esting patterns. To be successful, the KDD process needs to extract the interesting patterns
from large masses of data. That makes interestingness a very important capability in the ex-
tremely data-rich environment in which we live. It is likely that our environment will continue
to inundate us with data, making determining interestingness critical for success.
References
Adomavicius, G. and Tuzhilin, A. (1997). Discovery of actionable patterns in databases:
The action hierarchy approach. In Proceedings of the Third International Conference
on Knowledge Discovery and Data Mining, pages 111–114, Newport Beach, CA, USA.
AAAI Press.
Adomavicius, G. and Tuzhilin, A. (2001). Expert-driven validation of rule-based user models
in personalization applications. Data Mining and Knowledge Discovery, 5(1/2):33–58.
Aggarwal, C. C. and Yu, P. S. (1998). A new approach to online generation of association
rules. Technical Report Research Report RC 20899, IBMTJWatson Research Center.
Agrawal, R., Heikki, M., Srikant, R., Toivonen, H., and Verkamo, A. I. (1996). Advances
in Knowledge Discovery and Data Mining, chapter 12: Fast Discovery of Association
Rules, pages 307–328. AAAI Press/The MIT Press, Menlo Park, California.
Basu, S., Mooney, R. J., Pasupuleti, K. V., and Ghosh, J. (2001). Evaluating the novelty of
text-mined rules using lexical knowledge. In Proceedings of the Seventh ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 233–238,

San Francisco, CA, USA.
Bayardo Jr., R. J. and Agrawal, R. (1999). Mining the most interesting rules. In Proceedings
of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pages 145–154, San Diego, CA.

×