Tải bản đầy đủ (.pdf) (10 trang)

Data Mining and Knowledge Discovery Handbook, 2 Edition part 53 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (392.89 KB, 10 trang )

500
References
A. Abraham, C. Grosan and V. Ramos (Eds.) (2006), Swarm Intelligence and Data Mining,
Studies in Computational Intelligence, Springer Verlag, Germany, pages 270, ISBN: 3-
540-34955-3.
Ahmed MN, Yaman SM, Mohamed N, Farag AA and Moriarty TA (2002) Modified fuzzy
c-means algorithm for bias field estimation and segmentation of MRI data. IEEE Trans
Med Imaging, 21, pp. 193199.
Azzag H, Guinot C and Venturini G (2006) Data and text mining with hierarchical clustering
ants, in Swarm Intelligence in Data Mining, Abraham A, Grosan C and Ramos V (Eds),
Springer, pp. 153-186.
Bandyopadhyay S and Maulik U (2000) Genetic clustering for automatic evolution of clus-
ters and application to image classification, Pattern Recognition, 35, pp. 1197-1208.
Beni G and Wang U (1989) Swarm intelligence in cellular robotic systems. In NATO Ad-
vanced Workshop on Robots and Biological Systems, Il Ciocco, Tuscany, Italy.
Bensaid AM, Hall LO, Bezdek JC.and Clarke LP (1996) Partially supervised clustering for
image segmentation. Pattern Recognition, vol. 29, pp. 859-871.
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. New York:
Plenum.
Bonabeau E, Dorigo M and Theraulaz G (1999) Swarm Intelligence: From Natural to Artifi-
cial Systems. Oxford University Press, New York.
Brucker P (1978) On the complexity of clustering problems. Beckmenn M and Kunzi
HP(Eds.), Optimization and Operations Research, Lecture Notes in Economics and
Mathematical Systems, Berlin, Springer, vol.157, pp. 45-54.
Calinski RB and Harabasz J (1975) Adendrite method for cluster analysis, Commun. Statis-
tics, 1 27.
Chou CH, Su MC, and Lai E (2004) A new cluster validity measure and its application to
image compression, Pattern Analysis and Applications 7(2), 205-220.
Clark MC, Hall LO, Goldgof DB, Clarke LP, Velthuizen RP and Silbiger MS (1994) MRI
segmentation using fuzzy clustering techniques. IEEE Eng Med Biol, 13, pp.730742.
Clerc M and Kennedy J. The particle swarm - explosion, stability, and convergence in a


multidimensional complex space, In IEEE Transactions on Evolutionary Computation
(2002) 6(1), pp. 58-73.
Couzin ID, Krause J, James R, Ruxton GD, Franks NR (2002) Collective Memory and Spa-
tial Sorting in Animal Groups, Journal of Theoretical Biology, 218, pp. 1-11
Cui X and Potok TE (2005) Document Clustering Analysis Based on Hybrid PSO+Kmeans
Algorithm, Journal of Computer Sciences (Special Issue), ISSN 1549-3636, pp. 27-33.
Das S, Abraham A, and Konar A (2008) Automatic Kernel Clustering with Multi-Elitist
Particle Swarm Optimization Algorithm, Pattern Recognition Letters, Elsevier Science,
Volume 29, pp. 688-699.
Davies DL and Bouldin DW (1979) A cluster separation measure, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 1, 224227.
Deb K, Pratap A, Agarwal S, and Meyarivan T (2002) A fast and elitist multiobjective ge-
netic algorithm: NSGA-II, IEEE Trans. on Evolutionary Computation, Vol.6, No.2, April
2002.
Deneubourg JL, Goss S, Franks N, Sendova-Franks A, Detrain C and Chetien L (1991) The
dynamics of collective sorting: Robot-like ants and ant-like robots. In Meyer JA and
Wilson SW (Eds.) Proceedings of the First International Conference on Simulation of
Swagatam Das and Ajith Abraham
23 Pattern Clustering Using a Swarm Intelligence Approach 501
Adaptive Behaviour: From Animals to Animats 1, pp. 356363. MIT Press, Cambridge,
MA.
Dorigo M, Maniezzo V and Colorni A (1996), The ant system: Optimization by a colony of
cooperating agents, IEEE Trans. Systems Man and Cybernetics Part B, vol. 26.
Dorigo M and Gambardella LM (1997) Ant colony system: A cooperative learning approach
to the traveling salesman problem, IEEE Trans. Evolutionary Computing, vol. 1, pp.
5366.
Duda RO and Hart PE (1973) Pattern Classification and Scene Analysis. John Wiley and
Sons, USA.
Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95-104.
Eberhart RC and Shi Y (2001) Particle swarm optimization: Developments, applications and

resources, In Proceedings of IEEE International Conference on Evolutionary Computa-
tion, vol. 1, pp. 81-86.
Evangelou IE, Hadjimitsis DG, Lazakidou AA, Clayton C (2001) Data Mining and Knowl-
edge Discovery in Complex Image Data using Artificial Neural Networks, Workshop on
Complex Reasoning an Geographical Data, Cyprus.
Everitt BS (1993) Cluster Analysis. Halsted Press, Third Edition.
Falkenauer E (1998) Genetic Algorithms and Grouping Problems, John Wiley and Son,
Chichester.
Forgy EW (1965) Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of
classification, Biometrics, 21.
Frigui H and Krishnapuram R (1999) A Robust Competitive Clustering Algorithm with Ap-
plications in Computer Vision, IEEE Transactions on Pattern Analysis and Machine In-
telligence 21 (5), pp. 450-465.
Gath I and Geva A (1989) Unsupervised optimal fuzzy clustering. IEEE Transactions on
PAMI, 11, pp. 773-781.
Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans. Neural
Networks 13(3), 780784.
Goldberg DE (1975) Genetic Algorithms in Search, Optimization and Machine Learning,
Addison-Wesley, Reading, MA.
Grosan C, Abraham A and Monica C (2006) Swarm Intelligence in Data Mining, in Swarm
Intelligence in Data Mining, Abraham A, Grosan C and Ramos V (Eds), Springer, pp.
1-16.
Hall LO, zyurt IB and Bezdek JC (1999) Clustering with a genetically optimized approach,
IEEE Trans. Evolutionary Computing 3 (2) pp. 103112.
Handl J, Knowles J and Dorigo M (2003) Ant-based clustering: a comparative study of its
relative performance with respect to k-means, average link and 1D-som. Technical Re-
port TR/IRIDIA/2003-24. IRIDIA, Universite Libre de Bruxelles, Belgium
Handl J and Meyer B (2002) Improved ant-based clustering and sorting in a document re-
trieval interface. In Proceedings of the Seventh International Conference on Parallel
Problem Solving from Nature (PPSN VII), volume 2439 of LNCS, pp. 913923. Springer-

Verlag, Berlin, Germany.
Hertz T, Bar A, and Daphna Weinshall, H (2006) Learning a Kernel Function for Classifi-
cation with Small Training Samples, Appearing in Proceedings of the 23rd International
Conference on Machine Learning, Pittsburgh, PA.
Hoe K, Lai W, and Tai T (2002) Homogenous ants for web document similarity modeling and
categorization. In Proceedings of the Third International Workshop on Ant Algorithms
(ANTS 2002), volume 2463 of LNCS, pp. 256261. Springer-Verlag, Berlin, Germany.
502
Holland JH (1975) Adaptation in Natural and Artificial Systems, University of Michigan
Press, Ann Arbor.
Huang Z and Ng MG (1999) A fuzzy k-modes algorithm for clustering categorical data.
IEEE Trans. Fuzzy Systems 7 (4), 446452.
Jain AK, Murty MN and Flynn PJ (1999) Data clustering: a review, ACM Computing Sur-
veys, vol. 31, no.3, pp. 264323.
Kanade PM and Hall LO (2003) Fuzzy Ants as a Clustering Concept. In Proceedings of
the 22nd International Conference of the North American Fuzzy Information Processing
Society (NAFIPS03), pp. 227-232.
Kaufman, L and Rousseeuw, PJ (1990) Finding Groups in Data: An Introduction to Cluster
Analysis. John Wiley & Sons, New York.
Kennedy J, Eberhart R and Shi Y (2001) Swarm Intelligence, Morgan Kaufmann Academic
Press.
Kennedy J and Eberhart R (1995) Particle swarm optimization, In Proceedings of IEEE In-
ternational conference on Neural Networks, pp. 1942-1948.
Kim D W, Lee KY, Lee D, Lee KH (2005) A kernel-based subtractive clustering method.
Pattern Recognition Letters 26(7), 879-891.
Kohonen T (1995) Self-Organizing Maps, Springer Series in Information Sciences, Vol 30,
Springer-Verlag.
Konar A (2005) Computational Intelligence: Principles, Techniques and Applications,
Springer.
Krause J and Ruxton GD (2002) Living in Groups. Oxford: Oxford University Press.

Kuntz P, Snyers D and Layzell P (1998) A stochastic heuristic for visualising graph clusters
in a bi-dimensional space prior to partitioning. Journal of Heuristics, 5(3), pp. 327351.
Kuntz P and Snyers D (1994) Emergent colonization and graph partitioning. In Proceed-
ings of the Third International Conference on Simulation of Adaptive Behaviour: From
Animals to Animats 3, pp. 494 500. MIT Press, Cambridge, MA.
Kuntz P and Snyers D (1999) New results on an ant-based heuristic for highlighting the or-
ganization of large graphs. In Proceedings of the 1999 Congress on Evolutionary Com-
putation, pp. 14511458. IEEE Press, Piscataway, NJ.
Leung Y, Zhang J and Xu Z (2000) Clustering by Space-Space Filtering, IEEE Transactions
on Pattern Analysis and Machine Intelligence 22 (12), pp. 1396-1410.
Lewin B (1995) Genes VII. Oxford University Press, New York, NY.
Lillesand T and Keifer R (1994) Remote Sensing and Image Interpretation, John Wiley &
Sons, USA.
Lumer E and Faieta B (1994) Diversity and Adaptation in Populations of Clustering Ants. In
Proceedings Third International Conference on Simulation of Adaptive Behavior: from
animals to animates 3, Cambridge, Massachusetts MIT press, pp. 499-508.
Lumer E and Faieta B (1995) Exploratory database analysis via self-organization, Unpub-
lished manuscript.
MacQueen J (1967) Some methods for classification and analysis of multivariate observa-
tions, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and
Probability, pp. 281-297.
Major PF, Dill LM (1978) The three-dimensional structure of airborne bird flocks. Behav-
ioral Ecology and Sociobiology, 4, pp. 111-122.
Mao J and Jain AK (1995) Artificial neural networks for feature extraction and multivariate
data projection. IEEE Trans. Neural Networks. vol. 6, 296317.
Milonas MM (1994) Swarms, phase transitions, and collective intelligence, In Langton CG
Ed., Artificial Life III, Addison Wesley, Reading, MA.
Swagatam Das and Ajith Abraham
23 Pattern Clustering Using a Swarm Intelligence Approach 503
Mitchell T (1997) Machine Learning. McGraw-Hill, Inc., New York, NY.

Mitra S, Pal SK and Mitra P (2002) Data mining in soft computing framework: A survey,
IEEE Transactions on Neural Networks, Vol. 13, pp. 3-14.
Monmarche N, Slimane M and Venturini G (1999) Ant Class: discovery of clusters in nu-
meric data by a hybridization of an ant colony with the k means algorithm. Internal
Report No. 213, E3i, Laboratoire dInformatique, Universite de Tours
Moskovitch R, Elovici Y, Rokach L (2008) Detection of unknown computer worms based
on behavioral classification of the host, Computational Statistics and Data Analysis,
52(9):4544–4566.
Ng R and Han J (1994) Efficient and effective clustering method for spatial data mining.
In: Proc. 1994 International Conf. Very Large Data Bases (VLDB94). Santiago, Chile,
September pp. 144155.
Omran M, Salman A and Engelbrecht AP (2002) Image Classification using Particle Swarm
Optimization. In Conference on Simulated Evolution and Learning, volume 1, pp.
370374.
Omran M, Engelbrecht AP and Salman A (2005) Particle Swarm Optimization Method
for Image Clustering. International Journal of Pattern Recognition and Artificial Intelli-
gence, 19(3), pp. 297322.
Omran M, Salman A and Engelbrecht AP (2005) Dynamic Clustering using Particle Swarm
Optimization with Application in Unsupervised Image Classification. Fifth World En-
formatika Conference (ICCI 2005), Prague, Czech Republic.
Pakhira MK, Bandyopadhyay S, and Maulik U (2004) Validity index for crisp and fuzzy
clusters, Pattern Recognition Letters, 37, 487501.
Pal NR, Bezdek JC and Tsao ECK (1993) Generalized clustering networks and Kohonens
self-organizing scheme. IEEE Trans. Neural Networks, vol 4, 549557.
Partridge BL, Pitcher TJ (1980) The sensory basis of fish schools: relative role of lateral line
and vision. Journal of Comparative Physiology, 135, pp. 315-325.
Partridge BL (1982) The structure and function of fish schools. Science American, 245, pp.
90-99.
Paterlini S and Krink T (2006) Differential Evolution and Particle Swarm Optimization in
Partitional Clustering. Computational Statistics and Data Analysis, vol. 50, pp. 1220

1247.
Paterlini S and Minerva T (2003) Evolutionary Approaches for Cluster Analysis. In Bonarini
A, Masulli F and Pasi G (eds.) Soft Computing Applications. Springer-Verlag, Berlin.
167-178.
Pirooznia M and Deng Y: SVM Classifier a comprehensive java interface for support vector
machine classification of microarray data, in Proc of Symposium of Computations in
Bioinformatics and Bioscience (SCBB06), Hangzhou, China.
Ramos V, Muge F and Pina P (2002) Self-Organized Data and Image Retrieval as a Con-
sequence of Inter-Dynamic Synergistic Relationships in Artificial Ant Colonies. Soft
Computing Systems: Design, Management and Applications. 87, pp. 500509.
Ramos V and Merelo JJ (2002) Self-organized stigmergic document maps: Environments
as a mechanism for context learning. In Proceedings of the First Spanish Conference
on Evolutionary and Bio-Inspired Algorithms (AEB 2002), pp. 284293. Centro Univ.
Merida, Merida, Spain.
Rao MR (1971) Cluster Analysis and Mathematical Programming,. Journal of the American
Statistical Association, Vol. 22, pp 622-626.
Ratnaweera A and Halgamuge KS (2004) Self organizing hierarchical particle swarm opti-
mizer with time-varying acceleration coefficients, In IEEE Trans. on Evolutionary Com-
putation 8(3): 240-254.
504
Rokach L (2006), Decomposition methodology for classification tasks: a meta decomposer
framework, Pattern Analysis and Applications, 9(2006):257–271.
Rokach L and Maimon O.(2001), Theory and applications of attribute decomposition, IEEE
International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480,
2001.
Rokach L and Maimon O (2005), Clustering Methods, Data Mining and Knowledge Discov-
ery Handbook, Springer, pp. 321-352.
Rosenberger C and Chehdi K (2000) Unsupervised clustering method with optimal estima-
tion of the number of clusters: Application to image segmentation, in Proc. IEEE Inter-
national Conference on Pattern Recognition (ICPR), vol. 1, Barcelona, pp. 1656-1659.

Sarkar M, Yegnanarayana B and Khemani D (1997) A clustering algorithm using an evolu-
tionary programming-based approach, Pattern Recognition Letters, 18, pp. 975986.
Scholkopf B and Smola AJ (2002) Learning with Kernels. The MIT Press, Cambridge.
Selim SZ and Alsultan K (1991) A simulated annealing algorithm for the clustering problem.
Pattern recognition, 24(10), pp. 1003-1008.
Shi Y and Eberhart RCD (1999) Empirical Study of particle swarm optimization, In Pro-
ceedings of IEEE International Conference Evolutionary Computation, Vol. 3, 101-106.
Storn R and Price K (1997) Differential evolution A Simple and Efficient Heuristic for
Global Optimization over Continuous Spaces, Journal of Global Optimization, 11(4),
pp. 341359.
Tsang W and Kwong S (2006) Ant Colony Clustering and Feature Extraction for Anomaly
Intrusion Detection, in Swarm Intelligence in Data Mining, Abraham A, Grosan C and
Ramos V (Eds), Springer, pp. 101-121.
Vapnik VN (1998) Statistical Learning Theory. Wiley, New York.
Wang X, Wang Y and Wang L (2004) Improving fuzzy c-means clustering based on feature-
weight learning. Pattern Recognition Letters, vol. 25, pp. 112332.
Xiao X, Dow ER, Eberhart RC, Miled ZB and Oppelt RJ (2003) Gene Clustering Using
Self-Organizing Maps and Particle Swarm Optimization, Proc of the 17th International
Symposium on Parallel and Distributed Processing (PDPS ’03), IEEE Computer Society,
Washington DC.
Xu, R., Wunsch, D.: (2005), Survey of Clustering Algorithms, IEEE Transactions on Neural
Networks, Vol. 16(3): 645-678
Xu R and Wunsch D (2008) Clustering, IEEE Press Series on Computational Intelligence,
USA.
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters,
IEEE Transactions on Computers C-20, 6886.
Zhang T, Ramakrishnan R and Livny M (1997) BIRCH: A New Data Clustering Algorithm
and Its Applications, Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 141-182.
Zhang DQ and Chen SC (2003) Clustering incomplete data using kernel-based fuzzy c-
means algorithm. Neural Process Letters 18, 155162.

Zhang R and Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In:
The Sixteenth International Conference on Pattern Recognition, p. 289292.
van den Bergh F and Engelbrecht AP (2001) Effects of swarm size on cooperative particle
swarm optimizers, In Proceedings of GECCO-2001, San Francisco CA, 892-899.
van der Merwe DW and Engelbrecht AP (2003) Data clustering using particle swarm opti-
mization. In: Proceedings of the 2003 IEEE Congress on Evolutionary Computation, pp.
215-220, Piscataway, NJ: IEEE Service Center
Swagatam Das and Ajith Abraham
24
Using Fuzzy Logic in Data Mining
Lior Rokach
1
Department of Information System Engineering, Ben-Gurion University, Israel

Summary. In this chapter we discuss how fuzzy logic extends the envelop of the main data
mining tasks: clustering, classification, regression and association rules. We begin by pre-
senting a formulation of the data mining using fuzzy logic attributes. Then, for each task,
we provide a survey of the main algorithms and a detailed description (i.e. pseudo-code) of
the most popular algorithms. However this chapter will not profoundly discuss neuro-fuzzy
techniques, assuming that there will be a dedicated chapter for this issue.
24.1 Introduction
There are two main types of uncertainty in supervised learning: statistical and cognitive. Sta-
tistical uncertainty deals with the random behavior of nature and all existing data mining tech-
niques can handle the uncertainty that arises (or is assumed to arise) in the natural world from
statistical variations or randomness. While these techniques may be appropriate for measuring
the likelihood of a hypothesis, they says nothing about the meaning of the hypothesis.
Cognitive uncertainty, on the other hand, deals with human cognition. Cognitive uncer-
tainty can be further divided into two sub-types: vagueness and ambiguity.
Ambiguity arises in situations with two or more alternatives such that the choice between
them is left unspecified. Vagueness arises when there is a difficulty in making a precise dis-

tinction in the world.
Fuzzy set theory, first introduced by Zadeh in 1965, deals with cognitive uncertainty and
seeks to overcome many of the problems found in classical set theory.
For example, a major problem faced by researchers of control theory is that a small change
in input results in a major change in output. This throws the whole control system into an un-
stable state. In addition there was also the problem that the representation of subjective knowl-
edge was artificial and inaccurate. Fuzzy set theory is an attempt to confront these difficulties
and in this chapter we show how it can be used in data mining tasks.
24.2 Basic Concepts of Fuzzy Set Theory
In this section we present some of the basic concepts of fuzzy logic. The main focus, however,
is on those concepts used in the induction process when dealing with data mining. Since fuzzy
O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_24, © Springer Science+Business Media, LLC 2010
506 Lior Rokach
set theory and fuzzy logic are much broader than the narrow perspective presented here, the
interested reader is encouraged to read (Zimmermann, 2005)).
24.2.1 Membership function
In classical set theory, a certain element either belongs or does not belong to a set. Fuzzy set
theory, on the other hand, permits the gradual assessment of the membership of elements in
relation to a set.
Definition 1. Let U be a universe of discourse, representing a collection of objects denoted
generically by u. A fuzzy set A in a universe of discourse U is characterized by a member-
ship function
μ
A
which takes values in the interval [0, 1]. Where
μ
A
(u)=0 means that u is
definitely not a member of A and

μ
A
(u)=1 means that u is definitely a member of A.
The above definition can be illustrated on the vague set of Young. In this case the set U is
the set of people. To each person in U, we define the degree of membership to the fuzzy set
Young. The membership function answers the question ”to what degree is person u young?”.
The easiest way to do this is with a membership function based on the person’s age. For
example Figure 24.1 presents the following membership function:
μ
Young
(u)=



0
1
32−age(u)
16
age(u) > 32
age(u) < 16
otherwise
(24.1)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7

0.8
0.9
1
10 15 20 25 30 35
Age
Young Membership
Fig. 24.1. Membership function for the young set.
Given this definition, John, who is 18 years old, has degree of youth of 0.875. Philip, 20
years old, has degree of youth of 0.75. Unlike probability theory, degrees of membership do
not have to add up to 1 across all objects and therefore either many or few objects in the set
may have high membership. However, an objects membership in a set (such as ”young”) and
the sets complement (”not young”) must still sum to 1.
24 Using Fuzzy Logic in Data Mining 507
The main difference between classical set theory and fuzzy set theory is that the latter
admits to partial set membership. A classical or crisp set, then, is a fuzzy set that restricts its
membership values to {0,1}, the endpoints of the unit interval. Membership functions can be
used to represent a crisp set. For example, Figure 24.2 presents a crisp membership function
defined as:
μ
CrispYoung
(u)=

0 age(u) > 22
1 age(u) ≤ 22
(24.2)
0
0.1
0.2
0.3
0.4

0.5
0.6
0.7
0.8
0.9
1
10 15 20 25 30 35
Age
Crisp Young Membership
Fig. 24.2. Membership function for the crisp young set.
In regular classification problems, we assume that each instance takes one value for each
attribute and that each instance is classified into only one of the mutually exclusive classes. To
illustrate how fuzzy logic can help data mining tasks, we introduce the problem of modelling
the preferences of TV viewers. In this problem there are 3 input attributes:
A = {Time of Day,Age Group,Mood}
and each attribute has the following values:
• dom(Time of Day)={Morning,Noon,Evening,Night}
• dom(Age Group)={Young,Adult}
• dom(Mood)={Happy,Indifferent,Sad,Sour,Grumpy}
The classification can be the movie genre that the viewer would like to watch, such as
C = {Action,Comedy,Drama}.
All the attributes are vague by definition. For example, peoples feelings of happiness, in-
difference, sadness, sourness and grumpiness are vague without any crisp boundaries between
them. Although the vagueness of ”Age Group” or ”Time of Day” can be avoided by indicating
the exact age or exact time, a rule induced with a crisp decision tree may then have an artificial
crisp boundary, such as ”IF Age < 16 THEN action movie”. But how about someone who is
508 Lior Rokach
17 years of age? Should this viewer definitely not watch an action movie? The viewer pre-
ferred genre may still be vague. For example, the viewer may be in a mood for both comedy
and drama movies. Moreover, the association of movies into genres may also be vague. For

instance the movie ”Lethal Weapon” (starring Mel Gibson and Danny Glover) is considered
to be both comedy and action movie.
Fuzzy concept can be introduced into a classical problem if at least one of the input at-
tributes is fuzzy or if the target attribute is fuzzy. In the example described above , both input
and target attributes are fuzzy. Formally the problem is defined as following (Yuan and Shaw,
1995):
Each class c
j
is defined as a fuzzy set on the universe of objects U. The member-
ship function
μ
c
j
(u) indicates the degree to which object u belongs to class c
j
. Each at-
tribute a
i
is defined as a linguistic attribute which takes linguistic values from dom(a
i
)=
{v
i,1
,v
i,2
, ,v
i,
|
dom(a
i

)
|
}. Each linguistic value v
i,k
is also a fuzzy set defined on U. The mem-
bership
μ
v
i,k
(u) specifies the degree to which object u’s attribute a
i
is v
i,k
. Recall that the
membership of a linguistic value can be subjectively assigned or transferred from numerical
values by a membership function defined on the range of the numerical value.
Typically, before one can incoporate fuzzy concepts into a data mining application, an
expert is required to provide the fuzzy sets for the quantitative attributes, along with their
corresponding membership functions. Alternatively the appropriate fuzzy sets are determined
using fuzzy clustering.
24.2.2 Fuzzy Set Operations
Like classical set theory, fuzzy set theory includes operations union, intersection, complement,
and inclusion, but also includes operations that have no classical counterpart, such as the
modifiers concentration and dilation, and the connective fuzzy aggregation. Definitions of
fuzzy set operations are provided in this section.
Definition 2. The membership function of the union of two fuzzy sets A and B with membership
functions
μ
A
and

μ
B
respectively is defined as the maximum of the two individual membership
functions:
μ
A∪B
(u)=max{
μ
A
(u),
μ
B
(u)} (24.3)
Definition 3. The membership function of the intersection of two fuzzy sets A and B with mem-
bership functions
μ
A
and
μ
B
respectively is defined as the minimum of the two individual
membership functions:
μ
A∩B
(u)=min{
μ
A
(u),
μ
B

(u)} (24.4)
Definition 4. The membership function of the complement of a fuzzy set A with membership
function
μ
A
is defined as the negation of the specified membership function:
μ
A
(u)=1 −
μ
A
(u). (24.5)
To illustrate these fuzzy operations, we elaborate on the previous example. Recall that
John has a degree of youth of 0.875. Additionally John’s happiness degree is 0.254. Thus, the
membership of John in the set Young ∪ Happy would be max(0.875, 0.254)=0.875, and its
membership in Young ∩ Happy would be min(0.875,0.254)=0.254.
24 Using Fuzzy Logic in Data Mining 509
It is possible to chain operators together, thereby constructing quite complicated sets. It
is also possible to derive many interesting sets from chains of rules built up from simple
operators. For example John’s membership in the set
Young ∪ Happy would be max(1 −
0.875,0.254)=0.254
The usage of the max and min operators for defining fuzzy union and fuzzy intersection,
respectively is very common. However, it is important to note that these are not the only
definitions of union and intersection suited to fuzzy set theory.
Definition 5. The fuzzy subsethood S(A,B) measures the degree to which A is a subset of B.
S(A,B)=
M(A ∩B)
M(A)
(24.6)

where M(A) is the cardinality measure of a fuzzy set A and is defined as
M(A)=

u∈U
μ
A
(u) (24.7)
The subsethood can be used to measure the truth level of the rule of classification rules.
For example given a classification rule such as ”IF Age is Young AND Mood is Happy THEN
Comedy” we have to calculate S(Hot ∩Sunny,Swimming) in order to measure the truth level
of the classification rule.
24.3 Fuzzy Supervised Learning
In this section we survey supervised methods that incoporate fuzzy sets. Supervised meth-
ods are methods that attempt to discover the relationship between input attributes and a target
attribute (sometimes referred to as a dependent variable). The relationship discovered is repre-
sented in a structure referred to as a model. Usually models describe and explain phenomena,
which are hidden in the dataset and can be used for predicting the value of the target attribute
knowing the values of the input attributes.
It is useful to distinguish between two main supervised models: classification models
(classifiers) and Regression Models. Regression models map the input space into a real-value
domain. For instance, a regressor can predict the demand for a certain product given its char-
acteristics. On the other hand, classifiers map the input space into pre-defined classes. For
instance, classifiers can be used to classify mortgage consumers as good (fully payback the
mortgage on time) and bad (delayed payback).
Fuzzy set theoretic concepts can be incorporated at the input, output, or into to backbone
of the classifier. The data can be presented in fuzzy terms and the output decision may be
provided as fuzzy membership values. In this chapter we will concentrate on fuzzy decision
trees.
24.3.1 Growing Fuzzy Decision Tree
Decision tree is a predictive model which can be used to represent classifiers. Decision trees

are frequently used in applied fields such as finance, marketing, engineering, medicine and
security (Moskovitch et al. (2008)). In the opinion of many researchers decision trees gained
popularity mainly due to their simplicity and transparency. Decision tree are self-explained.
There is no need to be an expert in data mining in order to follow a certain decision tree.

×