Khoo, Li-Pheng et al "RClass*: A Prototype Rough-Set and Genetic Algorithms Enhanced
Multi-Concept Classification System for Manufacturing Diagnosis"
Computational Intelligence in Manufacturing Handbook
Edited by Jun Wang et al
Boca Raton: CRC Press LLC,2001
©2001 CRC Press LLC
19
RClass
*
: A Prototype
Rough-Set and Genetic
Algorithms Enhanced
Multi-Concept
Classification System
for Manufacturing
Diagnosis
19.1 Introduction
19.2 Basic Notions
19.3 A Prototype Multi-Concept Classification System
19.4 Validation of RClass
*
19.5 Application of RClass
*
to Manufacturing Diagnosis
19.6 Conclusions
19.1 Introduction
Inductive learning
or classification of objects from large-scale empirical data sets is an important research
area in artificial intelligence (AI). In recent years, many techniques have been developed to perform
inductive learning. Among them, the decision tree learning technique is the most popular. Using such a
technique, Quinlan [1992] has successfully developed the Inductive Dichotomizer 3 (ID3), and its later
versions C4.5 and C5.0 (See 5.0) in 1986, 1992, and 1997, respectively. Essentially, decision support is
based on human knowledge about a specific part of a real or abstract world. If the knowledge is gained
by experience, decision rules can possibly be induced from the empirical training data obtained.
In reality, due to various reasons, empirical data often has the property of granularity and may be
incomplete, imprecise, or even conflicting. For example, in diagnosing a manufacturing system, the
opinions of two engineers can be different, or even contradictory. Some earlier inductive learning systems
such as the once prevailing decision tree learning system, the ID3, are unable to deal with imprecise and
inconsistent information present in empirical training data [Khoo et al., 1999]. Thus, the ability to handle
imprecise and inconsistent information has become one of the most important requirements for a
classification system.
Li-Pheng Khoo
Nanyang Technological University
Lian-Yin Zhai
Nanyang Technological University
©2001 CRC Press LLC
Many theories, techniques, and algorithms have been developed to deal with the analysis of imprecise
or inconsistent data in recent years. The most successful ones are fuzzy set theory and Dempster–Shafer
theory of evidence. On the other hand,
rough set
theory, which was introduced by Pawlak [1982] in the
early 1980s, is a new mathematical tool that can be employed to handle uncertainty and vagueness.
Basically, rough set handles inconsistent information using two approximations, namely the upper and
lower approximations. Such a technique is different from fuzzy set theory or Dempster–Shafer theory of
evidence. Furthermore, rough set theory focuses on the discovery of patterns in inconsistent data sets
obtained from information sources [Slowinski and Stefanowski, 1989; Pawlak, 1996] and can be used as
the basis to perform formal reasoning under uncertainty, machine learning, and rule discovery [Ziarko,
1994; Pawlak, 1984; Yao et al., 1997]. Compared to other approaches in handling uncertainty, rough set
theory has its unique advantages [Pawlak, 1996, 1997]. It does not require any preliminary or additional
information about the empirical training data such as probability distribution in statistics; the basic
probability assignment in the Dempster–Shafer theory of evidence; or grades of membership in fuzzy
set theory [Pawlak et al., 1995]. Besides, rough set theory is more justified in situations where the set of
empirical or experimental data is too small to employ standard statistical method [Pawlak, 1991].
In less than two decades, rough set theory has rapidly established itself in many real-life applications
such as medical diagnosis [Slowinski, 1992], control algorithm acquisition and process control [Mrozek,
1992], and structural engineering [Arciszewski and Ziarko, 1990]. However, most literature related to
inductive learning or classification using rough set theory is limited to a binary concept, such as
yes
or
no
in decision making or
positive
or
negative
in classification of objects.
Genetic algorithms
(GAs) are stochastic and evolutionary search techniques based on the principles
of biological evolution, natural selection, and genetic recombination. GAs have received much attention
from researchers working on optimization and machine learning [Goldberg, 1989]. Basically, GA-based
learning techniques take advantage of the unique search engine of GAs to perform machine learning or
to glean probable decision rules from its search space. This chapter describes the work that leads to the
development of RClass
*
, a prototype multi-concept classification system for manufacturing diagnosis.
RClass
*
is based on a hybrid technique that combines the strengths of rough set, genetic algorithms, and
Boolean algebra. In the following sections, the basic notions of rough set theory and GAs are presented.
Details of RClass
*
, its validation, and a case study using the prototype system are also described.
19.2 Basic Notions
19.2.1 Rough Set Theory
Large amounts of applications of rough set theory have proven its robustness in dealing with uncertainty
and vagueness, and many researchers attempted to combine it with other inductive learning techniques
to achieve better results. Yasdi [1995] combined rough set theory with neural network to deal with
learning from imprecise training data. Khoo et al. [1999] developed RClass
*
, a prototype system based
on rough sets and a decision-tree learning methodology, and the predecessor of RClass
*
, for inductive
learning under noisy environment.
Approximation space and the lower and upper approximations of a set form two important notions
of rough set theory. The
approximation space
of a rough set is the classification of the domain of interest
into disjoint categories [Pawlak, 1991]. Such a classification refers to the ability to characterize all the
classes in a domain. The upper and lower approximations represent the classes of indiscernible objects
that possess sharp descriptions on concepts but with no sharp boundaries. The basic philosophy behind
rough set theory is based on equivalence relations or indiscernibility in the classification of objects. Rough
set theory employs a so-called
information table
to describe objects. The information about the objects
are represented in a structure known as an
information system,
which can be viewed as a table with its
rows and columns corresponding to objects and attributes, respectively (Table 19.1). For example, an
information system (
S
) with 4-tuple can be expressed as follows:
S
=
〈
U, Q, V,
ρ
〉
©2001 CRC Press LLC
where
U
is the
universe
which contains a finite set of objects,
Q
is a finite set of attributes,
V =
q
∈
Q
V
q
V
q
is a domain of the attribute
q
,
ρ
:
U
×
Q
→
V
is the information function such that
ρ
(
x, q
)
∈
for every
q
∈
Q
and
x
∈
U
and
∃
(
q,
v
), where
q
∈
Q
and
v
∈
V
q
is called a
descriptor
in
S.
Table 19.1 shows a typical information system used for rough set analysis with
x
i
s
(
i
=
1, 2, . . . 10
)
representing objects of the set
U
to be classified;
q
i
s
(
i
=
1, 2
) denoting the
condition attributes
; and
d
representing the
decision attribute
. As a result,
q
i
s and
d
form the set of attributes,
Q
.
More specifically,
A typical information function,
ρ
(x
1
,
q
1
), can be expressed as
Any attribute-value pair such as (
q
1
,
1) is called a descriptor in
S
.
Indiscernibility
is one of the most important concepts in rough set theory. It is caused by imprecise
information about the observed objects. The
indiscernibility relation
(
R
) is an equivalence relation on
the set
U
and can be defined in the following manner:
If
x, y
∈
U
and
P
∈
Q
, then
x
and
y
are
indiscernible
by the set of attributes
P
in
S
.
Mathematically, it can be expressed as follows
For example, using the information system given in Table 19.1, objects
x
5
and
x
7
are indiscernible by
the set of attributes
P
= {
q
1
,q
2
}. The relation can be expressed as because the information
functions for the two objects are identical and are given by
TABLE 19.1
A Typical Information System Used by Rough Set Theory
Objects Attributes Decisions
Uq
1
q
2
d
x
1
100
x
2
111
x
3
121
x
4
000
x
5
010
x
6
021
x
7
011
x
8
020
x
9
100
x
10
000
U
Uxxx
Qqqd
VVVV
qq d
=…
{}
=
{}
=
{}
=
{}{ }{}
{}
12 10
12
12
01 012 01
,
,,
,, ,,,,,,.
;
; and
ρ
xq
11
1,
()
=
{}
xPy
ˆ
if for
ρρ
xq yq q P,, .
()
=
()
∃∈
xP
x
57
ˆ
©2001 CRC Press LLC
Hence, it is not possible to distinguish one from another using attributes set {
q
1
,q
2
}.
The equivalence classes of relation, , are known as
P-elementary sets
in
S
. Particularly, when
P
=
Q
,
these
Q
-elementary sets are known as the
atoms
in
S
. In an information system,
concepts
can be represented
by the
decision
-elementary sets. For example, using the information system depicted in Table 19.1, the
{
q
1
}
-elementary sets, atoms, and concepts can be expressed as follows:
{
q
1
}-elementary sets
E
1
= {x
1
,x
2
,x
3
,x
9
}for
ρ
(x,
q
1
) = {1}
E
1
= {x
4
,x
5
,x
6
,x
7
,x
8
,x
10
}for
ρ
(x,
q
1
) = {0}
Atoms
A
1
= {x
1
, x
9
} A
2
= {x
2
} A
3
= {x
3
} A
4
= {x
4
, x
10
}
A
5
= {x
5
} A
6
= {x
6
} A
7
= {x
7
} A
8
= {x
8
}
Concepts
C
1
= {x
1
,x
4
,x
5
,x
8
,x
9
,x
10
} ⇒ Class = 0 (d = 0)
C
2
= {x
2
,x
3
,x
6
,x
7
} ⇒ Class = 1 (d = 1)
Table 19.1 shows that objects
x
5
and
x
7
are indiscernible by condition attributes q
1
and q
2
. Furthermore,
they possess different decision attributes. This implies that there exists a conflict (or inconsistency) between
objects
x
5
and
x
7
. Similarly, another conflict also exists between objects
x
6
and
x
8
.
Rough set theory offers a means to deal with inconsistency in information systems. For a concept (C),
the greatest definable set contained in the concept is known as the lower approximation of C (R
(C)). It
represents the set of objects (Y) on U that can be certainly classified as belonging to concept C by the set
of attributes, R, such that
where U/R represents the set of all atoms in the approximation space (U, R). On the other hand, the
least definable set containing concept C is called the upper approximation of C (R
(C)). It represents the
set of objects (Y) on U that can be possibly classified as belonging to concept C by the set of attributes
R such that
where U/R represents the set of all atoms in the approximation space (U, R). Elements belonging only
to the upper approximation compose the boundary region (BN
R
) or the doubtful area. Mathematically, a
boundary region can be expressed as
A boundary region contains a set of objects that cannot be certainly classified as belonging to or not
belonging to concept C by a set of attributes, R. Such a concept, C, is called a rough set. In other words,
rough sets are sets having non-empty boundary regions.
ρρ
xqq xqq
512 712
10,, ,, ,.
()
=
()
=
{}
ˆ
P
RC Y U RY C
()
=∈ ⊆
{}
U /.:
RC Y U RY C
()
=∈ ∩≠∅
{}
U / :
BN C R C R C
R
()
=
() ()
–.
©2001 CRC Press LLC
Using the information system shown in Table 19.1 again, based on rough set theory, the upper and
lower approximations, concepts C
1
for d = 0 and C
2
for d = 1, can be easily obtained. For example, the
lower approximation of concept C
1
(d = 0) is given by
and its upper approximation is denoted as
Thus, the boundary region of concept C
1
is given by
As for concept C
2
(d = 1), the approximations can be similarly obtained as follows.
As already mentioned, rough set theory offers a powerful means to deal with inconsistency in an
information system. The upper and lower approximations make it possible to mathematically describe
classes of indiscernible objects that possess sharp descriptions on concepts but with no sharp boundaries.
For example, universe U (Table 19.1) consists of ten objects and can be described using two concepts,
namely “d = 0” and “d = 1.” As already mentioned, two conflicts, namely objects x
5
and x
7
, and objects
x
6
and x
8
, exist in the data set. These conflicts cause the objects to be indiscernible and constitute doubtful
areas, which are denoted by BN
R
(0) or BN
R
(1), respectively (Figure 19.1). The lower approximation of
concept “0” is given by object set {x
1
,x
4
,x
9
,x
10
}, which forms the certain training data set of concept “0.”
On the other hand, the upper approximation is represented by object set {x
1
,x
4
,x
5
,x
6
,x
7
,x
8
,x
9
,x
10
}, which
contains the possible training data set of concept “0.” Concept “1” can be similarly interpreted.
19.2.2 Genetic Algorithms
As already mentioned, GAs are stochastic and evolutionary search techniques based on the principles of
biological evolution, natural selection, and genetic recombination. They simulate the principle of “sur-
vival of the fittest” in a population of potential solutions known as chromosomes. Each chromosome
represents one possible solution to the problem or a rule in a classification. The population evolves over
time through a process of competition whereby the fitness of each chromosome is evaluated using a
fitness function. During each generation, a new population of chromosomes is formed in two steps. First,
the chromosomes in the current population are selected to reproduce on the basis of their relative fitness.
Second, the selected chromosomes are recombined using idealized genetic operators, namely crossover
and mutation, to form a new set of chromosomes that are to be evaluated as the new solution of the
problem. GAs are conceptually simple but computationally powerful. They are used to solve a wide
variety of problems, particularly in the areas of optimization and machine learning [Grefenstette, 1994;
Davis, 1991].
Figure 19.2 shows the flow of a typical GA program. It begins with a population of chromosomes
either generated randomly or gleaned from some known domain knowledge. Subsequently, it proceeds
to evaluate the fitness of all the chromosomes, select good chromosomes for reproduction, and produce
RC x x x x
1 14910
()
=
{}
,,, ;
RC xxxxxxxx
1 145678910
()
=
{}
,,,,,,, .
BN C R C R C x x x x
R 1 1 1 5678
()
=
() ()
=
{}
–,,,.
RC x x
RC xxxxxx
BNC RC RC xxxx
R
223
2 235678
2225678
()
=
{}
()
=
{}
()
=
() ()
=
{}
,;
,,,,,
– ,,, .
; and
©2001 CRC Press LLC
FIGURE 19.1 Basic notions of rough sets.
FIGURE 19.2 A typical GA program flow.
U
R
(0)
BN
R
(0) = BN
R
(1)
R
(1)
R (1)
Concept ‘1‘
Concept ‘0’
R (0)
1
4
9
5
8
6
7
2
3
10
Generation of a random population of
chromosomes
Computation of the fitness of individual
chromosome
Selection of chromosomes with good
fitness
Reproduction of next generation of
chromosomes/population
No
Yes
End
Start
Limit on number of
generation reached?
©2001 CRC Press LLC
the next generation of chromosomes. More specifically, each chromosome is evaluated according to a
given performance criterion or fitness function, and assigned a fitness score. Using the fitness value attained
by each chromosome, good chromosomes are selected to undergo reproduction. Reproduction involves
the creation of offspring using two operators namely crossover and mutation (Figure 19.3). By randomly
selecting a common crossover site on two parent chromosomes, two new chromosomes are produced.
During the process of reproduction, mutation may take place. For example, the binary value of bit 2 in
Figure 19.3 has been changed from 0 to 1. The above process of fitness evaluation, chromosome selection,
and reproduction of next generation of chromosomes continues for a predetermined number of gener-
ations or until an acceptable performance level is reached.
19.3 A Prototype Multi-Concept Classification System
19.3.1 Twin-Concept and Multi-Concept Classification
The basic principle of rough set theory is founded on a twin-concept classification [Pawlak, 1982]. For
example, in the information system shown in Table 19.1, an object belongs either to “0” or “1.” However,
binary-concept classification, in reality, has limited application. This is because in most situations, objects
can be classified into more than two classes. For example, in describing the vibration experienced by a
rotary machinery such as a turbine in a power plant or a pump in a chemical refinery, it is common to
use more than two states such as normal, slight vibration, mild vibration, and abnormal, rather than just
normal or abnormal to describe the condition. As a result, the twin-concept classification of rough set
theory needs to be generalized in order to handle multi-concept problems. Based on rough set theory,
Grzymala-Busse [1992] developed an inductive learning system called LERS to deal with inconsistency
in training data. Basically, LERS is able to perform multi-concept classification. However, as observed by
Grzymala-Busse [1992], LERS becomes impractical when it encounters a large training data set. This can
possibly be attributed to the complexity of its computational algorithm. Furthermore, the rules induced
by LERS are relatively complex and difficult to interpret.
19.3.2 The Prototype System — RClass
*
19.3.2.1 The Approach
RClass
*
adopts a hybrid approach that combines the basic notions of rough set theory, the unique
searching engine of GAs, and Boolean algebraic operations to carry out multi-concept classification. It
possesses the ability of
FIGURE 19.3 Genetic operators.
Chromosome 1
101100 101
010
New chromosome 1
Crossover
Chromosome 2
001010 001
100
New chromosome 2
1 0 0001
Before Mutation
1 1 0001
After Mutation
Crossover site
Before Crossover After Crossover
Crossover
Mutation
⇒
⇒