Tải bản đầy đủ (.pdf) (27 trang)

Nghiên cứu phát triển một số thuật toán phân cụm bán giám sát sử dụng mạng nơron min max mờ và ứng dụng tt tiếng anh

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (835.6 KB, 27 trang )

1
MINISTRY OF EDUCATION AND TRAINING

MINISTRY OF NATIONAL DEFENCE

ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY

VU DINH MINH

RESEARCH ON DEVELOPMENT OF SEMI-SUPERVISED
CLUSTERING ALGORITHMS USING FUZZY MIN-MAX NEURAL
NETWORK AND THEIR APPLICATIONS

Specialization: Mathematical Foundation for Informatics
Code:

9 46 01 10

SUMMARY OF PhD THESIS IN MATHEMATICAL

Hanoi, 2019


This thesis has been completed at: 2
ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY

Scientific supervisors:
1. Assoc. Prof. Dr Le Ba Dung
2. Dr Nguyen Doan Cuong

Reviewer 1: Assoc. Prof. Dr Bui Thu Lam



Military Technical Academy
Reviewer 2: Assoc. Prof. Phung Trung Nghia

Thai Nguyen University
Reviewer 3: Dr Nguyen Do Van

Academy of Military Science and Technology

The thesis was defended at the Doctoral Evaluating Council at
Academy level held at

Academy of Military Science and

Technology at ..... date ……., 2019

The thesis can be found at:
- The library of Academy of Military Science and Technology
- Vietnam National Library.


1

INTRODUCTION
1. The necessary of the thesis
Fuzzy semi-supervised clustering is an extension of fuzzy clustering
using prior knowledge that increases quality of clusters. Pre-informed
information, also known as additional information, is intended to guide,
monitor and control the clustering process.
Fuzzy min-max neural network (FMNN) model proposed by Patrick

K. Simpson is based on advantages of combining fuzzy logic, artificial
neural network, fuzzy min-max theory to solve classing and clustering
problem. FMNN is an incremental learning model based on fuzzy metafiles for ability to process large data sets.
Liver disease diagnosis based on data from liver enzyme test results
can be formulated as a pattern recognition problem. Use of FMNN is
considered an effective approach. One of the reasons that FMNN is used
in disease diagnostic support is the ability to generate if…then decision
rule that is very simple. Each FMNN's hyperbox transforms into a rule
described by quantifying min and max values of the data attributes.
However, the FMNN itself still has many shortcomings leading to
the difficulties and limited practical application. Main researches on
FMNN focus on major directions such as improving the network
structure, optimizing parameters, subscribing, reducing the number of
hyperbox in the network, improving the learning method or incorporating
other method to improve the quality.
Based on the research on FMNN's development process, to improve
the efficiency of FMNN, the thesis topic focuses on proposing and
improving methodology by semi-supervised learning method. In the new
methods presented in the thesis, additional information is defined as the
label assigned to a piece of data to guide and monitor the clustering
process. This is a new approach that earlier methods have not mentioned.


2
2. Objectives of the research
1) Develop advanced fuzzy semi-supervised clustering algorithm
based on label spreading. Additional information is a small percentage of
the samples labeled.
2) Propose a novel model of combined semi-supervised clustering, this
model automatically defines additional information. In our research, a part of

sample of the fuzzy semi-supervised clustering algorithm is labeled.
3) Develop a fuzzy clustering algorithm considering to the
distribution of data.
4) Apply fuzzy min-max neural network to the dump of fuzzy
if...then decision rule in design of the liver disease diagnostic support
system from data is data of the results of the liver enzyme test.
3. Object and scope of the research
The thesis focuses on the following issues:
- An overview of fuzzy min-max neural network and variations of
fuzzy min-max neural network.
- Analysis of limitations and solutions used by researchers to
overcome these limitations.
- Application of fuzzy min-max neural network with dump of fuzzy
if...then decision rule in disease diagnosis.
4. Research methods
The thesis uses theoretical research method, in particular, the thesis
has studied the FMNN model for classing and clustering data. Since then,
the thesis focuses on the proposed semi- supervised clustering algorithm.
The thesis also uses simulated empirical method in combination with
analysis, statistics and evaluation of empirical data.
5. Contribution of the thesis
- Develop the advanced SS-FMM algorithm for fuzzy semisupervised clustering based on label spreading progress.


3
- Propose a novel model of semi-supervised clustering combined
with FMNN and SS-FMM. This model automatically defined additional
information for semi-supervised clustering algorithms.
- Develop a fuzzy clustering algorithm considering to the
distribution of data.

6. Structure of the thesis
Apart from the introduction and conclusion, the main contents of the
thesis consists of three chapters:
- Chapter 1 presents an overview of the thesis, including the basic
concepts of FMNN and FMNN extensions. From general characteristics
of extensions, limitations, it shall provide the direction of the next
research. Throughout this chapter, the thesis gives an overview of the
research problem, concepts and basic algorithms used in the research.
- Chapter 2 presents suggestions for improving learning method in
the FMNN using the semi-supervised algorithm model for data
clustering. The additional information is labeled a part of the sample in
the training data set. Then labels from this part of data are spreading to
unlabeled data samples. Fuzzy semi-supervised clustering combining
with FMNN model automatically defines additional information. This is
also used as the input of the fuzzy semi-supervised algorithm. Data
clustering model in fuzzy min-max neural network takes into account
distribution of data as well.
- Chapter 3 presents the application of proposed model with the
generation of fuzzy decision rules formed if...then in support system of
liver disease diagnostic on a real dataset.
Chapter 1: Overview of fuzzy min-max neural network
1.1. Fundamental knowledge of fuzzy min-max neural network
* Hyperbox membership function
The degree determination of membership function bj(A,Bj) measures
the degree of belonging of sample A corresponding to hyperbox Bj. It is
defined by Eq. (1.2) or Ed. (1.3) below.


4






b j A, B j 







1 n 
 max 0,1  max 0,  min 1, ai  w ji
2n i 1 









+ max 0,1  max 0,  min 1, v ji  ai






b j A, B j 









(1.2)





1 n 
 1  f ai  w ji ,   f v ji  ai ,  

n i 1 

(1.3)

* Fuzzy min-max neural network structure
FMNN uses a straight-forward neural network structure, two-layer
structure (Fig. 1.4) with unsupervised learning and three-layer structure
(Fig. 1.5) with supervised learning.

Fig. 1.4 Two-layer neural network model


Fig. 1.5 Three-layer neural network model

* Overlapping between hyperboxes
The FMNN algorithm is aimed at creating and modifying hyperboxes
in n-dimensional spaces. If the expansion creates overlap between the
hyperboxes, the contraction process is performed to eliminate overlap. The
overlap happens between Bj and Bk if one of the four following cases occurs:
- Case 1: max of Bj overlapped with min of Bk
- Case 2: min of Bj overlapped with max of Bk
- Case 3: Bk contained within Bj
- Case 4: Bj contained within Bk
If Bj and Bk are overlapped, the contraction process of hyperboxes is
performed in the corresponding direction to eliminate overlap:
- Case 1. If v ji  vki  w ji  wki then:

new
vkinew   vkiold  wold
  vkiold  wold
ji  / 2 wki
ji  / 2


5
- Case 2. If vki  v ji  wki  w ji then:

old
new
old
v new
  vold

  vold
ji
ji  wki  / 2 wki
ji  wki  / 2

- Case 3. If v ji  vki  wki  w ji , considerring following cases:
+ If (wki  v ji  wji  vki ) , then: vnew
 wkiold
ji
+ If (wki  v ji  wji  vki ) , then: wnew
 vkiold
ji
- Case 4. If vki  v ji  w ji  wki , considerring following cases:
+ If (wki  v ji  wji  vki ) , then: wkinew  vold
ji
new
old
+ If (wki  v ji  wji  vki ) , then: vki  w ji
* The learning algorithm in fuzzy min-max neural netwwork
Algorithm in fuzzy min-max neural network only include creation
and modification of hyperboxes in the sample space. The learning
algorithm in FMNN consists of 3 steps: creation and expansion of
hyperboxes, overlapping test, hyperbox contraction. Each step is repeated
for all samples in the dataset.
1.2. Some researches to improve quality of FMNN
* Adjust size limit of hyperbox
In order to overcome the phenomenon of exceeding size limit of
hyperbox for network training due to the averaging method, D. Ma
proposed an alternative solution of size limit function to be compared in
all dimensions calculated according to formula (1.24) using the formula

(1.29).
 A, B  
j

 A , B   
h

j

    
 max  w , a   min v , a 

1 n
 max w ji , ai  min v ji , ai ,
n i 1
i 1,...,n

ji

hi

ji

hi

(1.24)
(1.29)

* Modify FMNN structure to manage overlapping areas
The FMCN (Fuzzy Min-max neural network classifier with

Compensatory Neurons) and DCFMN (Data-Core-Based Fuzzy Min–
Max Neural Network) models overcome the problems caused by
contraction of the hyperboxes that created the additional hyperboxes.
Rather than adjusting contraction of the hyperboxes, the FMCN and


6
DCFMN handle overlapping areas by using hyperboxes to manage
separate overlapping area.
* Improve learning method in FMNN
The semi-supervised model of GFMM (General Fuzzy Min-Max) and
RFMN (Reflex Fuzzy Min-max Neural network) uses additional information
as the labels accompanying with some input patterns. GFMM and RFMN
used prior knowledge to monitor and guide clustering.
1.5. Conclusion of Chapter 1
Chapter 1 presented the overview research on FMNN and
development trend of FMNN, synthesized and compared the case
researches on structural improvement of FMNN algorithm.
The following chapters will present proposals on some issues that
remain in development of FMNN and application of FMNN to support
disease diagnosis.
Chapter 2: The development of semi-supervised clustering algorithm
using fuzzy min-max neural network
This chapter presents three algorithms to improve learning method
and the experimental results used to evaluate proposed algorithms. The
novel models include:
- An improvement of SS-FMM semi-supervised learning method,
results announced in [3].
- A novel model of semi-supervised clustering combined with
FMNN and SS-FMM, results announced in [5].

- A fuzzy clustering algorithm considering to the distribution of data.
In addition, the algorithm uses a set of additional rules in the training
process. Results announced in [2, 4].
2.1. SS-FMM semi-supervised fuzzy clustering algorithm
The GFMM model and the modified model (RFMN) have the
advantage of using more prior information to monitor the clustering
process, thereby improving the clustering quality. However, both GFMM
and RFMN are capable of producing hyperboxes with their own attributes


7
that are not labeled. Because when GFMM and RFMN create new
hyperboxes for the first sample with out label, the new hyperbox is not
labeled. This hyperbox will wait for labeled samples to edit the label of
the hyperbox by the label of the sample. However, there may still be
unlabeled hyperboxes that are not edited due to the absence of labeled
samples. Figure 2.1 is an illustrative example of the case of GFMM and
RFMN producing unlabeled hyperboxes.

Hyperbox
Siêu hộp U

Hyperbox
Siêu
hộp V V

Fig. 2.1 Failed hyperboxes of GFMM and RFMN
Where: V is a hyperbox created from labeled samples or be adjusted
label by labeled samples, U is a hyperbox created from unlabeled samples
or without label adjustment.

The SS-FMM algorithm proposes the method to overcome this
disadvantage of GFMM and RFMN. SS-FMM prevents the algorithm
from making unlabeled hyperboxes using the β-limit threshold. The
initial threshold is defined by user, but the algorithm has the ability to
manually redefine the threshold for fit during training process. The
framework diagram is described in Figure 2.2.
When creating a new hyperbox from the unlabeled pattern, SS-FMM
only creates a new hyperbox if it satisfies β criteria defined in (2.2).


max  E A , B : j  1,..., q    ,


h
j



(2.2)

The SS-FMM operates under the label spreading scheme to label
hyperboxes made by unlabeled samples. Algorithms generate hyperboxes


8
from labeled data samples and spread the labels from labeled hyperboxes
to the hyperboxes created by unlabeled samples. SS-FMM incorporates
all the hyperboxes with the same label that form a full cluster.
Begin
Input: D,  , 

Snew = |D|; Sold = 0;
m = |D|; h = 1
Input pattern {Ah ,dh}D
Does Ah
belong BjB?

y

y

n
Is BjB
that is able to conver
Ah?
y
Expand of Bj
n

n

dh = 0?

d h  Blj

n

dh ≠ 0?

y


l
Create Hnew, H new
 dh ,

B  B  Hnew

n

D  D \  Ah 

dh = 0 ?



y
d h  Blj



n

max E A , B  j  1,..., q  
h

j

S

old


S

old

1

y

Is there
overlapping?

n

y
Hyperbox contraction

l
Create Hnew, H new
 Blj

B  B  Hnew

D  D \ Ah 

k=k+1
y

n

h < m?


n

Snew = Sold ?
n

y

  .

D  ?

y
Calculate C according to (1.7)
Output: B, C
End

Fig. 2.2 General diagram of SS-FMM algorithm.
* Complexity evaluation of the SS-FMM algorithm
The SS-FMM algorithm has time complexity that is
O(M(M(M-1)/2+NK). Where M is the total number of samples in the
training data set, N is the number of attributes of the data sample, K is
the total number of hyperboxes generated in the SS-FMM network.


9
2.2. Combined fuzzy semi-supervised clustering algorithm (SCFMN)
The algorithm of SS-FMM generated hyperboxes, with each
hyperbox as a cluster. SS-FMM used many small hyperboxes to classify
samples on the boundary. However, when the value of parameter max

decreases, the number of hyperboxes in the network will increase and the
complexity of algorithm increases as well. SS-FMM should have a
certain rate of labeled sample in the training set.
To against this limitation of SS-FMM, SCFMN uses the max
parameter for different values in two stages to improve clustering results
1
2
with fewer hyperboxes. Value of  max
and  max
are the maximum size of
the large and small hyperboxes, respectively. In the first stage, SCFMN
generates hyperboxes and label for fully attached samples with
hyperboxes. In the second stage, SCFMN spreads label from hyperboxes
created in previous stage to hyperboxes created from unlabeled samples.
Large and small hyperboxes with the same label will form a full cluster.
Figure 2.3 shows the idea of using large hyperboxes at the center of
clusters in conjunction with the smaller hyperboxes in the boundary.
These hyperboxes are expressed in 2-dimensional space and data sets
consists of two clusters. Denote B is a large hyperbox, G is a small
hyperbox (dashed line) obtained from labeled samples, R is a small
hyperboxes (dot- cross line) obtained from unlabeled samples.
* * *
*
*
*
* * *
*
*
** *
*

*
*
*
* **
*
*
*
* * **
* **
* *
*
* * ** *
* *
*
* * * * ** ** **
*** *
** *

Hyperbox R

*

Hyperbox B

*

+ ++
++
+ + + ++ + + + +
+ + ++

+
*+
++
+
+
* ++
++ + + +
** ** + + +
+ +
+
*
+ +
*
+
++
++ + ++ +
* * **
+
+
+
+ +
*
* +
+ +
++
+ ++
+ ++
+
++ +++
*

*

Fig. 2.3 SCFMN uses the large and small hyperboxes.

Hyperbox G


10
2.2.2. Methodology of SCFMN algorithm
Figure 2.5 is general diagram of SCFMN algorithm.
* The complexity of the SCFMN algorithm
SCFMN has a time complexity of O(KN(M(K+1)+1)+M(M-1)/2).
Where M is the total number of samples in the training data set, N is the
number of attributes of the data sample, K is the total number of
hyperboxes generated in the SCFMN network.
Begin

Input pattern {Ah ,dh}D

Input pattern AhD
y
Is BjB
that is able to conver
Ah?
n
Create new Bj, B  j
l
j

Does Ah

belong BjB?

Mở rộng hyperbox
Is there
overlapping?

n

n

dh ≠ 0?

y



y
Create Hnew,
l
H new
 dh
G = G{Hnew}

dh = 0 ?

dh  Blj



max E A , H  : s  1,..., q  


y
Is there
overlapping?

n

dh  0
D2=D2{Ah,dh}

dh  H sl

n
n

Dữ liệu vào AhD
Ah có
thuộc vào BjB ?

y

Is BjB
that is able to conver
Ah?
y
Expand of Bj

Hyperbox
contraction


Dữ liệu vào
đã hết?
y

n

dh = 0?

n
n

y
n

y

dh  Blj

y
Hyperbox contraction

D1=D1{Ah,dh}

Are allthe data
has labeled?
y
D = D1D2
Phase 1: Additional information

n


n

h

s

y

n
l
Tạo Hnew, H new
 dh

R = R{Hnew}

Are all the
data has
labeled?
y
End
Phase 2: Apply SS-FMM for data clustering

Fig. 2.5 General diagram of SCFMN algorithm.
2.3. CFMNN fuzzy min-max clustering algorithm based on data
cluster center
The value of FMNN membership function does not decrease as the
samples are far away from the hyperbox. To overcome these
disadvantages, CFMNN relies on the distances between the samples and



11
centroids of the corresponding hyper-boxes. Centroid value is calculated
until the sample is far away from the hyperbox and its membership is less
than 0.6, when the membership function value does not decrease. Apart
from the min and max points, each hyperbox has the center of the
hyperbox defined as in (2.8).





c ji  v ji  w ji / 2

(2.8)

The Euclidean distance between the input pattern Ah and the center
of hyperbox j, E A , B  is given by (2.9):
h

j

1
E A , B   1 
h
j
n


n


i 1

c ji  ahi



2

(2.9)

For each sample Ah satisfies the size limit condition (1.24) where the
membership function value is bj < 0.6, its distance is calculated and
compared with others. Samples will belong to the closest hyperboxes.
* Complexity of the CFMNN algorithm
CFMNN algorithm has a time complexity of O(MKN). Where M is
the total number of samples in the training data set, N is the number of
attributes of the data sample, K is the total number of hyperboxes
generated in CFMNN.
2.4. Experiment and evaluation
* Experimental method
To evaluate the performance of these proposed algorithms, the
experiments were performed on the Benchmark data set.
The objective of experiment is to evaluate the ability to improve
performance, quantity, and distribution of the hyperboxes when changing
the value of parameter max in the SS-FMM, CFMNN, SCFMN algorithms.
This also evaluates the mitigation capability of hyperboxes as well.
Accuracy and CCC (Cophenetic Correlation Coefficient)
measurements are used to evaluate the performance of algorithms and
compare them to other ones. Accuracy is calculated by (2.12), CCC is

calculated by (2.13).


12
Details of the experimental results are presented in Table 2.2 to
Table 2.14, from Figure 2.9 to Figure 2.20.
* Experimental results

(a). Spiral

(b). Aggregation

(c). Jain

(d). Flame

(e). Pathbased

(b) R15

Fig. 2.9 Graphical distribution of hyperboxes on data sets


13

(a)

(b)

(c)


(d)
Fig. 2.10 Accuracy obtained when changing the ratio of labeled sample
of SS-FMM.


14

(a). Data set R15

(b). Jain data set

(c). Iris data set

(d). Flame data set
Fig. 2.11 Accuracy obtained when changing max of SS-FMM and SCFMN


15

(a) Jain dataset

(b) Flame dataset

(c) Iris dataset

(d) R15 dataset
Hình 2.17. NoH obtained when changing max of SS-FMM and SCFMN

The experimental results show that:



16
- Accuracy decreases when ratio of the labeled sample decreases but it
is not as much as the decreasing ratio of the labeled sample in training set.
- Accuracy decreases when the max size of max increases. When max
is too small, the Accuracy measurement decreases. max affects to the
performance of the algorithm.
- The total number of hyperboxes decreases when max increases.
* Comparisons of proposed algorithm results with some other
algorithms
Table 2.7 compares the GFMM, RFMN and SS-FMM Accuracy
measurements on the Iris data set.
Table 2.7 Values of Accuracy with the changing of ratio of sample labeled
Accuracy (%)
Ratio of sample
labeled

GFMM

RFMN

SS-FMM

2%

36

52


94

10%

49

83

96

50%

84

92

97

Table 2.8 compares the GFMM, RFMN and SS-FMM Accuracy
measurements on a set of experimental data sets. Sample ratio in label in
training training is 10%.
Table 2.8 Values of Accuracy obtained by using SS-FMM, GFMM and
RFMN on different data sets
Data set
Aggregation
Flame
Jain
Sprial
Pathbased
R15

Iris
ThyroidNew
Wine

Accuracy (%)
GFMM
RFMN
SS-FMM
48.25
79.56
98.86
49.74
84.47
98.75
56.32
55.19
52.47
48.28
49.36
51.83
52.54

85.35
82.61
82.52
84.78
83.92
80.12
80.73


100
100
98.72
99.50
96.00
91.69
93.33


17
Table 2.9 Comparison of Accuracy obtained by using SCFMN, CFMNN,
FMNN and MFMM
Accuracy (%)

Data
set

FMNN

MFMM

CFMNN

SCFMN

Flame

85.13

91.78


91.25

99.17

Jain

86.07

91.18

91.20

100

R15

87.24

93.54

93.76

99.50

Iris

86.97

93.01


92.77

95.98

Wine

85.58

93.12

92.83

94.35

PID

68.35

70.08

70.49

74.58

Table 2.10 Compare of CCC obtained by using SCFMN, CFMNN, MFMN
and MFMM
CCC
Data set
Glass

Iris
Wine

MFMM

MFMN

CFMNN

SCFMN

0.94

0.94

0.93

0.94

-

0.97

0.97

0.98

0.83

-


0.84

0.89

Table 2.11 Compare Time obtained by using SCFMN, CFMNN, FMNN
and MFMM
Dataset

Time (s)
FMNN

MFMM

CFMNN

SCFMN

Flame

0.483

0.532

0.487

0.876

Jain


0.635

0.724

0.648

0.923

R15

0.701

0.798

0.712

0.967

Iris

0.215

0.231

0.221

0.623

Wine


0.274

0.283

0.276

0.692

525.132

732.945

543.675

913.657

PID


18

Hình 2.19. Values of Accuracy comparison chart of SCFMN, CFMNN with
FMNN, MFMM

Figure 2.20. NoH comparison chart of SCFMN with some other methods

2.5. Conclusion of Chapter 2
Chapter 2 presents the improvements of FMNN algorithm including:
- Propose improvements of semi-supervised learning with labeled a
part of the data in training set and label spreading methods (SS-FMM).

Learning algorithm in SS-FMM uses the information contained in both of
labeled and unlabeled data for training. SS-FMM performs well even
with low ratio of labeled samples. This proposal was published in [3].
- Propose a novel semi-supervised clustering model combined
(SCFMN). SCFMN uses semi-supervised learning method with additional
information defined automatically. SCFMN uses structure of hyperbox
with large size at the center of the cluster to minimize the number of
hyperboxes and small hyperboxes at the boundary among the clusters to
increase clustering performance. This proposal was published in [5].


19
- Propose an improved algorithm CFMNN considering to the
distribution of data. In the forecasting and adjusting stages, the hyperbox is
not completely dependent on its membership degree, especially when the
model is far away from the hyperbox. In addition, the CFMNN uses a new
set of 10 rules to adjust hyperboxes during training. This proposal has been
published in [2, 4].
Chapter 3: Application of Fuzzy min-max neural network in
supporting liver disease diagnosis
3.1. Liver disease diagnosis methods
* Diagnosed using APRI
APRI is calculated by the formula (3.1):
APRI =

AST / ULN
 100
PLT

(3.1)


* Diagnosed using FIB-4
FIB-4 is calculated by the formula (3.2):
FIB-4 =

Age  AST
PLT  ALT

(3.2)

3.2. Liver disease diagnosis support using fuzzy min-max neural network
* Problem modeling
CDS (Cirrhosis Diagnosis System) is a diagnostic model for liver
disease based on a combination of fuzzy min-max theory, artificial neural
networks and fuzzy inference method to build a decision support system
via data of liver enzyme test. The model of CDS in liver disease
diagnostic support system is shown in Figure 3.1.
* Model analysis
- CDS creates an combined approach between data clustering
algorithm and decision-making methods for the liver disease diagnosis.
- CDS offers a view to combine clustering algorithm using FMNN
with the decision-making system. This has great significance for liver
disease diagnosis problem in particular and the fields of Medical
Informatics in general.


20
Begin
Liver enzyme test
Extract and select

features

Expansion of
hyperbox

Data

Hyperbox
Overlap Test

Hyperbox
Contraction

Fuzzy min-max neural network training
Pruning Hyperboxes
Generating the Rules
Disease summary table from
the test results
Diagnostic
End

Fig. 3.1. Liver disease diagnostic support system by CDS
* Pruning hyperbox using the HCF index
Each hyperbox is associated with an HCF (Hyperbox Confidence
Factor) to measure usage level. Hyperboxes with a HCF index lower than
the threshold will be pruned.
* Decision rule extracting
Each hyperbox generates a fuzzy decision rule. The min and max
values are quantified as Q levels that equivalent to the number of fuzzy
partitions in the quantitative rule. Each input pattern is assigned to

quantum dots by using (3.8):
Aq  (q  1) / (Q  1)

Fuzzy rules formed if…then are defined by (3.9):
Rule R j : If x p1 is Aq and  x pn is Aq
Then x p is C j

(3.8)

(3.9)


21
3.3. Experiment and evaluation
* Experimental data sets
Information on liver disease data is shown in Table 3.3. This
information is extracted from the medical records related to the test
results and disease diagnosis from doctors.
* Objectives of experiments
- To evaluate the ability of improving the performance.
- To evaluate the number of hyperboxes before and after prunning
process.
- To evaluate the decision rules, computation time.
* Measurements and evaluation criteria
Measurements include Accuracy, AccSe, AccSp, NPV, PPV,
Jaccard, Rand, FM, NoH.
* Experimental results
Details of the experimental results are presented in Tables 3.4 to
Table 3.15, from Figure 3.2 to Figure 3.10.


(a) SS-FMM

(b) SCFMN
Fig. 3.5 Accuracy of SCFMN, SS-FMM when changing max on Liverdisease
dataset


22

Fig. 3.6 NoH of SCFMN and SS-FMM when changing max on real dataset
Table 3.9. Fuzzy rules on Cirrhosis dataset generated by SCFMN
IF
Rule
Then
CF
A1
A2
A3
1
1
1
2-3
2
0.300
2
1-3
1
2-3
1
0.114

3
1-2
1
3-4
1
0.075
4
3-4
1-2
1
1
0.039
5
1-3
1-4
1-2
1
0.834
6
1
1
1-4
2
0.43
Table 3.13. An example of a diagnostic results using SCFMN on real dataset
If

Then
(C)


A1
81

A2
0

A3
A4
97.1 104.1

A5
A6
3.1 154.4

A7
36.7

A8
27.3

A9
10.1

A10
37

53

0


94.1 100.9

3.1 266.4

25.2

37.6

10.7

28

1

53

0

87.9

94.3

3.1 249.0

23.5

35.1

10.0


28

1

81

0

86.1

92.3

3.1 136.9

32.5

24.2

9.0

37

1

24

1

592.3 200.6


3.0 195.6

38.3 359.5 139.3

39

1

37

0

568.6 208.7

2.7

82.6

27.5

65.3

15.3

23

1

46


1

60.4

57.0

1.1

87.8

37.4

19.0

3.5

18

0

57

0

60.5

45.4

1.3 196.2


39.2

12.1

3.5

29

0

57

0

60.5

45.4

1.3 196.4

39.2

12.1

3.5

29

0


1

3.4. Conclusion of Chapter 3
In chapter 3, the applications of proposed models in the design of
support system for diagnosing liver disease from data which includes the
information of liver enzyme tests.


23
The implement of proposed models on the live disease data set.
Obtained results show that proposed models get better results comparing
with giving good results with predicted values. Especially the ability to
extract the fuzzy if...then decision rule with quantitative values are the
min-max points of the fuzzy hyperbox. The results were evaluated
through measurements, and at the same time, through these experimental
results, once again test the correctness of the propositions when
constructed using theoretical models.
CONCLUSION
From the research contents, the thesis has achieved the following
results:
* Main results:
- Propose algorithm improvements with semi-supervised learning
using additional information is labeled with part of the data in the training
set and label spreading methods (SS-FMM). It gradually forms and
corrects the hyperboxes (clusters) during training. Labeled samples are
pre-populated to form hyperboxes, and then spread the labels to
unlabeled samples to form hyperboxes from unlabeled training samples
Learning in SS-FMM uses the information contained in the labeled data
and also unlabeled data for training. SS-FMM performs well even with
low labeled sample rates. This proposal was published in [3].

- Propose fuzzy semi-supervised clustering model combined
between SS-FMM and FMNN. The proposed model uses semi-supervised
learning method with additional information provided by automatically
defined algorithms. The algorithm uses structure of hyperbox with large
size at the center of the cluster to minimize the number of hyperboxes
and small hyperboxes at the boundary among the clusters to increase
clustering performance. This proposal was published in [5].
- Propose algorithm for improving CFMNN considering to the
distribution of data. During the forecasting and adjustment phase, the
hyperbox is not completely dependent on its dependency, especially
when the model is far away from the hyperbox. In addition, the CFMNN


×