Tải bản đầy đủ (.pdf) (12 trang)

DSpace at VNU: On the performance evaluation of intuitionistic vector similarity measures for medical diagnosis

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (120.53 KB, 12 trang )

1597

Journal of Intelligent & Fuzzy Systems 31 (2016) 1597–1608
DOI:10.3233/JIFS-151654
IOS Press

On the performance evaluation
of intuitionistic vector similarity
measures for medical diagnosis1
Le Hoang Sona,∗ and Pham Hong Phongb
a VNU

University of Science, Vietnam National University, Hanoi, Vietnam
University of Civil Engineering, Hanoi, Vietnam

b National

Abstract. Intuitionistic fuzzy recommender system (IFRS), which has been recently presented based on the theories of
intuitionistic fuzzy sets and recommender systems, is an efficient tool for medical diagnosis. IFRS used the intuitionistic
fuzzy similarity degree (IFSD) regarded as the generalization of the hard user-based, item-based and the rating-based similarity
degrees in recommender systems to calculate the analogousness between patients in the system. In this paper, we firstly extend
IFRS by using a new term - the intuitionistic fuzzy vector (IFV) instead of the existing intuitionistic fuzzy matrix (IFM) in
IFRS. Then, the intuitionistic value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) are
defined on the basis of the intuitionistic fuzzy vector. Some mathematical properties of these new terms are examined, and
several IVSM functions are proposed. The performances of these IVSM functions for medical diagnosis are experimentally
validated and compared with the existing similarity degrees of IFRS. The suggestion and recommendation of this paper
involve the most efficient IVSM function(s) that should be used for medical diagnosis.
Keywords: Intuitionistic fuzzy recommender systems, intuitionistic fuzzy vector, intuitionistic vector similarity measure,
medical diagnosis, performance evaluation

1. Introduction


Medical diagnosis is an important and necessary process to issue appropriate medical figures for
patients in health care support systems. It involves
the determination of the possible relations between
1 The authors are greatly indebted to the editor-in-chief: Prof.
Reza Langari and anonymous reviewers for their comments and
suggestions which improved the quality and clarity of the paper.
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant
number 102.05-2014.01.
∗ Corresponding author. Le Hoang Son, VNU University of
Science, Vietnam National University, Hanoi, Vietnam. Tel.:
+84 904 171 284; E-mails: , chinhson2002@
gmail.com.

patients and diseases from those between patients and
symptoms. The answer of medical diagnosis for a certain disease is often yes/no that eventually leads to
the final specification of the most acquiring disease
and appropriate treatments. The medical diagnosis
indeed must ensure the accuracy, which raises great
interests of researchers to enhance as far as possible.
Recent advances of the health care support systems
have raised a great concentration to enhancing the
accuracy of medical diagnosis both in theory and
practice [2]. An effort of this theme has presented
an efficient tool namely Intuitionistic Fuzzy Recommender System (IFRS), which was designed based
on the theories of intuitionistic fuzzy sets and recommender systems [9]. In IFRS, the intuitionistic

1064-1246/16/$35.00 © 2016 – IOS Press and the authors. All rights reserved


1598


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

fuzzy similarity degree (IFSD) is utilized as a generalization of the hard user-based, item-based and
the rating-based similarity degrees to calculate the
analogousness between patients. A hybrid similarity
degree between IFSD and the degree produced by a
picture fuzzy clustering method has been proposed
to enhance the accuracy of prediction. These relevant
researches mostly investigated on improving the similarity degree of IFRS to ensure the high accuracy of
the system.
In this paper we propose intuitionistic fuzzy
vector (IFV) instead of the existing intuitionistic
fuzzy matrix (IFM) in IFRS. Then, a generalization of the existing multi-criteria IFRS so-called
the Modified multi-criteria IFRS (MMC-IFRS) that
takes into account the IFV is presented. Two new
measures namely the intuitionistic value similarity
measure (IvSM) and the intuitionistic vector similarity measure (IVSM) are defined. Some mathematical
properties of these new terms are examined, and several IVSM functions are proposed. The performances
of these IVSM functions for medical diagnosis are
experimentally validated and compared with the
existing similarity degrees of IFRS.
The rests of the paper are organized as follows. Section 2 recalls some previous works. Section 3 presents
the new contributions of this paper. Section 4 validates the proposed model by experiments. Section 5
gives the conclusions and future works of the paper.

2. Preliminaries
2.1. Related works
Assume that P, S and D being the sets of patients,
symptoms and diseases, respectively. Each patient Pi

(i = 1, n) (resp. symptom Sj , j = 1, m) is assumed
to have some features (resp. characteristics). For
the simplicity, we consider the recommender system
including a feature of the patient and a characteristic
of the symptom denoted as X and Y , respectively. X
and Y both consist of s intuitionistic linguistic labels.
Analogously, disease Dk (k = 1, p) also contains s
intuitionistic linguistic labels. Thus, the definition
of Multi-criteria Intuitionistic Fuzzy Recommender
Systems (MC-IFRS) was given as follows.
Definition 1. [9] (Multi-criteria Intuitionistic Fuzzy
Recommender Systems – MC-IFRS). The utility function R is a mapping specified on (X, Y ) as in
Equation (1).

R : X × Y → D1 × · · · × Dp ,
(μ1X (x) , γ1X (x))
···
(μsX (x) , γsX (x))

(μ1Y (y) , γ1Y (y))
×

···
(μsY (y) , γsY (y))

(μ1D (Dk ) , γ1D (Dk ))

p



k=1

···
.
(μsD (Dk ) , γsD (Dk ))

(1)

In Equation (1), (μiX (x) , γiX (x)) is an intuitionistic
fuzzy value (IFv) of the patient to the i-th linguistic
label of feature X, (μiY (y) , γiY (y)) represents the
IFv of the symptom to the i-th linguistic label of character Y , and (μiD (Dk ) , γiD (Dk )) stands for the IFv
of the disease Dk to the linguistic label i-th (i = 1, s,
k = 1, p).
MC-IFRS is the system that provides two basic
functions below.
a) Prediction: determine the values of
μiD (Dk ) , γiD (Dk ) , i = 1, s, k = 1, p.
b) Recommendation: choose i∗ = 1, s which maximize(s) the expression
p

wk (μiD (Dk ) + μiD (Dk ) πiD (Dk )),
k=1

where πiD (Dk ) = 1 − μiD (Dk ) − γiD (Dk )
and wk ∈ [0, 1] is the weight of Dk satisfying
p
the constraint: k=1 wk = 1.
MC-IFRS could be compressed in a matrix form
as in Definition 2.

Definition 2. [9] An intuitionistic fuzzy matrix (IFM)
Z in MC-IFRS is defined as,


a11 a12 · · · a1s
⎜b b · · · b ⎟
2s ⎟
⎜ 21 22



c
c
·
·
·
c
3s ⎟
(2)
Z = ⎜ 31 32
⎟.
⎜ · · · · · · · · · · · ·⎟


ct1 ct2 · · · cts
In Equation (2), t = p + 2 where p ∈ N∗ is the number of diseases in Definition 1. The value s ∈ N∗ is the
number of intuitionistic linguistic labels. a1i , b2i , chi ,
h = 3, t, i = 1, s are the IFvs consisting of the membership and non-membership values as in Definition
1: a1i = (μiX (x) , γiX (x)), b2i = (μiY (y) , γiY (y))
and

chi = (μiD (Dh−2 ) , γiD (Dh−2 )) , i = 1, s,


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

h = 3, t. Each line from the third one to the last in
Equation (2) is related to a given disease.
Based on IFM, the intuitionistic fuzzy similarity
matrix (IFSM) and the intuitionistic fuzzy similarity
degree (IFSD) were defined as in Definitions 3 and 4.
Definition 3. [9] Suppose that Z1 and Z2 are two
IFM in MC-IFRS. The intuitionistic fuzzy similarity matrix (IFSM) between Z1 and Z2 is defined as
follows.

⎛˜ ˜
S11 S12 · · · S˜ 1s
⎜S˜ S˜ · · · S˜ ⎟
2s ⎟
⎜ 21 22


˜S = ⎜S˜ 31 S˜ 32 · · · S˜ 3s ⎟ ,


⎜ · · · · · · · · · · · ·⎟


S˜ t1 S˜ t2 · · · S˜ ts
(1)


(2)

(1)

(2)

where S˜ 1i = sim a1i , a1i , S˜ 2i = sim b2i , b2i ,
(1)

(2)

and S˜ hi = sim chi , chi , i = 1, s, h = 3, t. sim is
a measure specifying the similarity between two intuitionistic values u = (μu , γu ) and v = (μv , γv ),

another recent paper [11], the authors have defined a
hybrid similarity degree between IFSD and the degree
produced by a picture fuzzy clustering method [8] to
enhance the accuracy of prediction as in Definition 5.
Definition 5. [11] Let us denote the IFSD in Equation
(4) as SIM (a, b). The hybrid similarity degree is
history

then calculated as
SIM (a, b) = (1 − λ) SIM (a, b) + λSIM (a, b) ,
history

=1−

1 − exp




μu −





μ v + γu − γv

.

1 − exp (−1)

(3)

Definition 4. [9] Suppose that Z1 and Z2 are two
IFM in MC-IFRS. The intuitionistic fuzzy similarity
degree (IFSD) between Z1 and Z2 is
s

s

w1i S˜ 1i + β

SIM (Z1 , Z2 ) = α
i=1

w2i S˜ 2i


group

where λ ∈ [0, 1] is an adjustable coefficient, and
SIM (a, b) is the similarity degree from the picture
group

fuzzy clustering [8] as in the equations below.
SIM (a, b)
group

=

1
NC

NC

P (a, i) − P (a)

P (b, i) − P (b)

,

i=1

P (j, k) = 1 −

CS (j, k)
,
max {CS (i, k)}

i

sim (u, v)
− 21

1599

CS (j, k) = 1 −

i

Xj

Xji Vki
Vk

,

where P (j, k) is the possibility of patient j belonging to the cluster k, CS (j, k) is the counter similarity
between the patient j and the cluster k with Xj and Vk
being the patient j and the center of cluster k respectively. P (a) is the mean value of P (a, k), k = 1, NC .
NC is the number of groups used in the picture fuzzy
clustering – DPFCM method [8].

i=1
t

2.2. Some remarks

s




whi S˜ hi ,

(4)

h=3 i=1

where S˜ is the IFSM between Z1 and Z2 . W = wji
(j = 1, t, i = 1, s) is the weight matrix of IFSM
between Z1 and Z2 satisfying
s

wji = 1, j = 1, t,
i=1

α + β + χ = 1.
IFSD is used to calculate the analogousness
between patients in the system, and to make the prediction of possible diseases for a patient. It is obvious
that the better the IFSD is, the higher of accuracy the
health care support system may be achieved. Thus, in

The methods recalled in sub-section 2.1 achieved
better accuracies than the relevant ones such as the
standalone algorithms of intuitionistic fuzzy sets
namely [4, 7, 10] and recommender systems, e.g.
[3, 5]. These relevant researches mostly investigated
on improving the similarity degree of IFRS to ensure
the high accuracy of the system. Being noticed that

the most important assumption in IFRS is the numbers of intuitionistic linguistic labels in the features
of the patients, in the characteristics of the symptoms and in the diseases being the same and denoted
as s (See some lines before Definition 1). In practical applications, this situation may not happen and
brings out the difficulty to apply IFSD in Definition
4 and the hybrid similarity degree in Definition 5 to


1600

L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

them. This motivates us to extend MC-IFRS in Definition 1 and the equivalent similarity degrees to the
new context. Thus, in this paper we firstly extend
MC-IFRS by using a new term – the intuitionistic
fuzzy vector (IFV) instead of the existing intuitionistic fuzzy matrix (IFM) in IFRS. Then, the intuitionistic
value similarity measure (IvSM) and the intuitionistic vector similarity measure (IVSM) are defined on
the basis of the IFV. Some mathematical properties
of these new terms are examined, and several IVSM
functions are proposed. The performances of these
IVSM functions for medical diagnosis are experimentally validated and compared with the existing
similarity degrees of IFRS. The suggestion and recommendation of this paper involve the most efficient
IVSM function(s) that should be used for medical diagnosis. Hence, the contributions of this paper
occupy an important role to not only the theoretical
aspects of recommender systems but also the applicable roles to the health care support system.

3. The proposed method
In this section, we firstly propose a new MC-IFRS
so-called the Modified MC-IFRS (MMC-IFRS) to
handle the problem of different numbers of intuitionistic linguistic labels in the features of patients, the
characteristics of symptoms and the diseases in subsection 3.1. An illustrated example of MMC-IFRS

and the conversion of MMC-IFRS to the intuitionistic fuzzy value (IFV) are also given herein. Secondly,
we define the intuitionistic value similarity measure
(IvSM) and the intuitionistic vector similarity measure (IVSM) accompanied with some mathematical
properties in sub-section 3.2. Several IVSM functions
for the validation in the experiments are also proposed
in this sub-section.
3.1. Modified multi-criteria intuitionistic fuzzy
recommender system
Recall that P, S and D being the sets of patients,
symptoms and diseases having the cardinalities of n,
m and p, respectively. Each patient Pi (i = 1, n) is
assumed to have N features X1 , ..., XN . Each feature
Xe consists of re linguistic labels (e = 1, N). Each
symptom Sj (j = 1, m) is assumed to have M characteristics Y1 ,..., YM . Each characteristic Yf consists
of sf linguistic labels (j = 1, m ). Each disease Dg
contains tg intuitionistic linguistic labels (g = 1, p).

Definition 6. (Modified Multi-criteria Intuitionistic
Fuzzy Recommender Systems – MMC-IFRS). The
utility function R is a mapping:


N

Xe

×⎝

M


p

Yf ⎠ →

f =1

e=1
N

μ1Xe (xe ) , γ1Xe (xe )
...

e=1

μre Xe (xe ) , γre Xe (xe )
M

×

Dg ,
g=1

μ1Yf yf , γ1Yf yf
...

f =1

μsf Yf yf , γsf Yf yf

p


μ1D Dg , γ1D Dg
...

g=1

μtg D Dg , γtg D Dg



,

(5)

where μxXe (xe ) , γxXe (xe ) is the IFv of the patient
to the x-th linguistic label of the feature Xe (x =
1, re , e = 1, N). μyYf yf , γyYf yf is the IFv
of the symptom to the y-th linguistic label of
the characteristic Yf (y = 1, sf , f = 1, M). Finally,
μzD Dg , γzD Dg is the IFv of the disease Dg to
the z-th linguistic label (z = 1, tg , g = 1, p). MMCIFRS provides two basic functions:
a) Prediction: determine the values of
μzD Dg , γzD Dg

, z = 1, tg , g = 1, p.

b) Recommendation: choose z∗ = 1, tg which
maximize(s) the expression
p


wg μzD Dg + μzD Dg πzD Dg ,
g=1

where πzD Dg = 1 − μzD Dg − γzD Dg
and wg ∈ [0, 1] is the weight of Dg satisfying
p
the constraint: g=1 wg = 1.
It is obvious that MMC-IFRS in Definition 6 is a
generalization of MC-IFRS in Definition 1. Consider
the example below to illustrate the new definition to
medical diagnosis.
Example 1. In a medical diagnosis system, there
are 4 patients. The feature X is “Age” consisting
of 5 linguistic labels: “VL=very low”, “L=low”,
“M=medium”, “H=high”, “VH=very high”. By


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

using the trapezoidal intuitionistic fuzzy numbers –
TIFNs [1] characterized by a1 , a2 , a3 , a4 ; a1 , a4
with a1 ≤ a1 ≤ a2 ≤ a3 ≤ a4 ≤ a4 , the membership
(non-membership) functions of patients to the linguistic labels of the feature X are:

1
x ≤ 10


μVL (x) = (20 − x)/10 10 < x ≤ 20 ,



0
x > 20

0


γVL (x) = (x − 10)/10


1

μL (x) =

γL (x) =


0




⎨(x − 10)/10
1




⎩(40 − x)/10


1




⎨(20 − x)/10
0




⎩(x − 30)/10

1601


0


(x

70)/10
μVH (x) =


1

x ≤ 70
70 < x ≤ 80 ,
x > 80



1


γVH (x) = (80 − x)/10


0

x ≤ 70
70 < x ≤ 80 .
x > 80

Based on the membership and non-membership functions, we calculate the information of patients as
follows.

x ≤ 10
10 < x ≤ 20 ,
x > 20

Al (18) : VL (0.2, 0.8) , L (0.8, 0.2) , M (0, 1) ,

x ≤ 10, x > 40
10 < x ≤ 20
,
20 < x ≤ 30
30 < x ≤ 40
x ≤ 10, x > 40
10 < x ≤ 20

,
20 < x ≤ 30
30 < x ≤ 40

H (0, 1) , VH (0, 1) ,
Bob (39) : VL (0, 1) , L (0.1, 0.9) , M (0.9, 0.1) ,
H (0, 1) , VH (0, 1) ,
Joe (53) : VL (0, 1) , L (0, 1) , M (0.7, 0.3) ,
H (0.3, 0.7) , VH (0, 1) ,
Ted (74) : VL (0, 1) , L (0, 1) , M (0.6, 0.4) ,

μM (x) =

γM (x) =

μH (x) =

γH (x) =


0




⎨(x − 30)/10
1





⎩(60 − x)/10

1




⎨(40 − x)/10
0




⎩(x − 50)/10

0




⎨(x − 50)/10

x ≤ 30, x > 60
30 < x ≤ 40
,
40 < x ≤ 50
50 < x ≤ 60
x ≤ 30, x > 60
30 < x ≤ 40

,
40 < x ≤ 50
50 < x ≤ 60

1




⎩(80 − x)/10

x ≤ 50, x > 80
50 < x ≤ 60
,
60 < x ≤ 70
70 < x ≤ 80


1




⎨(60 − x)/10

x ≤ 50, x > 80
50 < x ≤ 60

0





⎩(x − 70)/10

60 < x ≤ 70
70 < x ≤ 80

,

H (0.4, 0.6) , VH (0, 1) .
The symptom’s characteristic Y is “Temperature” including three linguistic labels: “C=cold”,
“M=medium”, “H=hot”. Similarly, the membership
(non-membership) functions of the symptom to the
linguistic labels of characteristic are defined using
TIFNs as follows.

1
x≤5


μC (x) = (20 − x) /15 5 < x ≤ 20 ,


0
x > 20

0



γC (x) = (x − 5) /15


1

μM (x) =

x≤5
5 < x ≤ 20 ,
x > 20


0




⎨(x − 5) /15

x ≤ 5, x > 40
5 < x ≤ 20

1




⎩(40 − x) /5

20 < x ≤ 35

35 < x ≤ 40

,


1602

L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

γM (x) =


1




⎨(20 − x) /15
0




⎩ (x − 35) /5


0


(x


35) /5
μH (x) =


1

x ≤ 5, x > 40
5 < x ≤ 20
,
20 < x ≤ 35
35 < x ≤ 40
x ≤ 35
35 < x ≤ 40 ,
x > 40


1
x ≤ 35


γH (x) = (40 − x) /5 35 < x ≤ 40 .


0
x > 40
The information of symptom is shown as follows.
4◦ C : C (1, 0) , M (0, 1) , H (0, 1) ,



16 C : C (0.267, 0.733) ,
M (0.733, 0.267) , H (0, 1) ,

Table 1
A MMC-IFRS for medical diagnosis with ∗ being the values to
be predicted
Age

Flu

Headache

4◦ C :
C (1, 0)
M (0, 1)
H (0, 1)

L1 (.8, .1)
L2 (.6, .3)
L3 (.2, .6)
L4 (.1, .9)

L1 (.1, .8)
L2 (.2, .7)
L3 (.5, .35)
L4 (.6, .2)
L5 (.4, .5)
L6 (.3, .55)

Bob (39) :

VL (0, 1)
L (.1, .9)
M (.9, .1)
H (0, 1)
VH (0, 1)

39◦ C :
C (0, 1)
M (.2, .8)
H (.8, .2)

L1 (.4, .5)
L2 (.6, .2)
L3 (.3, .6)
L4 (.1, .9)

L1 (0, .9)
L2 (.2, .75)
L3 (.4, .55)
L4 (.55, .35)
L5 (.7, .2)
L6 (.6, .3)

Joe (53) :
VL (0, 1)
L (0, 1)
M (.7, .3)
H (.3, .7)
VH (0, 1)


16◦ C :
C (.267, .733)
M (.733, .267)
H (0, 1)

L1 (0, 1)
L2 (.2, .7)
L3 (.4, .5)
L4 (1, 0)

L1 (0, 0.9)
L2 (.4, .6)
L3 (.4, .45)
L4 (.7, .2)
L5 (.3, .6)
L6 (.1, .85)

Ted (74) :
VL (0, 1)
L (0, 1)
M (.6, .4)
H (.4, .6)
VH (0, 1)

25◦ C :
C (0, 1)
M (1, 0)
H (0, 1)

L1 (∗, ∗)

L2 (∗, ∗)
L3 (∗, ∗)
L4 (∗, ∗)

L1 (∗, ∗)
L2 (∗, ∗)
L3 (∗, ∗)
L4 (∗, ∗)
L5 (∗, ∗)
L6 (∗, ∗)

Al (18) :
VL (.2, .8)
L (.8, .2)
M (0, 1)
H (0, 1)
VH (0, 1)

Temperature

39◦ C : C (0, 1) , M (0.2, 0.8) , H (0.8, 0.2) ,


25 C : C (0, 1) , M (1, 0) , H (0, 1) .
The diseases (D1 , D2 ) are “Flu” and “Headache”,
where D1 contains four linguistic labels: “L1=Level
1”, “L2=Level 2”, “L3=Level 3” and “L4=Level
4”, D2 contains six linguistic labels: “L1=Level
1”, “L2=Level 2”, “L3=Level 3”, “L4=Level 4”,
“L5=Level 5” and “L6=Level 6”. We would like to

verify which ages of users and types of temperature
are likely to cause the diseases of flu and headache.
In this case we have a MMC-IFRS system. We have
a MMC-IFRS described in Table 1. In this table, the
cells having asterisk marks are needed to predict the
intuitionistic fuzzy values μzD Dg , γzD Dg
(z = 1, tg , g = 1, 2]). A compression form of MMCIFRS is shown in Definition 7.
Definition 7. An intuitionistic fuzzy vector (IFV) in
MMC-IFRS is defined as follows.
V = (v1 , v2 , . . . , vK ) ,
where K = K1 + K2 + K3 , K1 = N
e=1 re , K2 =
p
M
f =1 sf , K3 =
g=1 tg . The first K1 elements of
V are
a11 , . . . , a1r1 , . . . ae1 , . . . , ae re , . . . aN1 , . . . , aNrN ,

with aex represents for an IFv of the patient to the linguistic label x-th of feature Xe (x = 1, re , e = 1, N).
The next K2 elements of V are
b11 ,. . . ,b1s1 , . . . , bf 1 , . . . , bfsf , . . . , bM1 , . . . , bMsM,
where bfy means an IFv of the symptom to the
linguistic label y-th of characteristic Yf (y = 1, sf ,
f = 1, M). And the last K3 elements of V are
c11 , . . . , c1t1 , . . . , cg1 , . . . , cgtg , . . . , cp1 , . . . , cptp ,
where cgz is an IFv of the disease Dg to the linguistic
label z-th (z = 1, tg , g = 1, p).
3.2. Intuitionistic value similarity measure and
intuitionistic vector similarity measure

In the following definition, θ denotes the set of all
intuitionistic fuzzy values (IFVs).
Definition 8. (Intuitionistic value similarity
measure–IvSM) Let R be the set of all real number,
sim : θ × θ → R is called an intuitionistic value
similarity measure (IvSM) if it satisfies the following
conditions:
(A1) sim (u, v) = sim (v, u), for all u, v ∈ θ;
(A2) 0 ≤ sim (u, v) ≤ 1, for all u, v ∈ θ;


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

(A3) sim (u, v) = 1 ⇔ u = v, for all u, v ∈ θ;
(A4) If u ≤ v ≤ w, then sim (u, v) ≥ sim (u, w)
and sim (v, w) ≥ sim (u, w), for all u, v, w ∈
θ (u ≤ v means μu ≤ μv and γu ≥ γv ).
Theorem 1. For all u, v ∈ θ, we define:
sim1 (u, v) = 1 −
sim2 (u, v) =

1
(|μu − μv | + |γu − γv |) ; (6)
2

min {μu , μv } + min {γu , γv }
; (7)
max {μu , μv } + max {γu , γv }

sim3 (u, v)

=

exp − 21 (|μu − μv | + |γu − γv |) − exp (−1)
1 − exp (−1)

;

(8)

exp − 21



μu −




μ v + γ u − γv

− exp (−1)

1 − exp (−1)

Definition 9. (Intuitionistic vector similarity
measure–IVSM) Let SIM : × → R. SIM is
called an intuitionistic vector similarity measure
(IVSM) if it satisfies the following conditions:
(B1) SIM (U, V ) = SIM (V, U), for all U, V ∈ ;
(B2) 0 ≤ SIM (U, V ) ≤ 1, for all U, V ∈ ;

(B3) SIM (U, V ) = 1 ⇔ U = V , for all U,
V ∈ ;
(B4) If
U ≤ V ≤ T,
then
SIM (U, V ) ≥
SIM (U, T ) and SIM (V, T ) ≥ SIM (U, T ),
for all U, V , T ∈ (let U = (u1 , . . . , uK ),
V = (v1 , . . . , vK ). U ≤ V means u ≤ v
for all = 1, K).
Definition 10. Let U, V ∈ , sim is an IvSM, and
W = (w1 , . . . , wK ) is weight vector satisfying w ≥
0 ( = 1, K) and K=1 w = 1. We define:
1) The quadric intuitionistic fuzzy similarity
degree between U and V :

sim4 (u, v)
=

1603

.

(9)

1
2

K


SIMQ (U, V ) =

w (sim (u , v ))2

.

=1

Then, sim1 , sim3 , sim3 and sim4 are IvSMs. Notice
that to avoid the denominator being zero, set 00 = 1
in the definition of sim2 .
Proof. We consider sim1 , the remainders are also
proved by analogous calculation.
(A1) and (A3) are straightforward.
(A2) We have 0 ≤ |μu − μv | + |γu − γv | ≤ 2. It
follows that 0 ≤ sim1 (u, v) ≤ 1.
(A4) We prove sim1 (u, v) ≥ sim1 (u, w) with condition of u ≤ v ≤ w. By the definition of the relation
≤ of IFvs, we get μu ≤ μv ≤ μw and γu ≥ γv ≥ γw
which implies
1
((μv − μu ) + (γu − γv ))
2
1
≥ 1 − ((μw − μu ) + (γu − γw ))
2
1
= 1 − (|μw − μu | + |γw − γu |)
2
= sim1 (u, w) .


sim1 (u, v) = 1 −

By similar argument, we get sim1 (v, w) ≥
sim1 (u, w).
In the following definition, denotes the set of all
intuitionistic fuzzy vectors (IFVs) having the lengths
of K in MMC-IFRS.

(10)
2) The arithmetic intuitionistic fuzzy similarity
degree between U and V :
K

SIMA (U, V ) =

w sim (u , v ).

(11)

=1

3) The geometric intuitionistic fuzzy similarity
degree between U and V :
K

SIMG (U, V ) =

(sim (u , v ))w .

(12)


=1

4) The harmonic intuitionistic fuzzy similarity
degree between U and V :
K

SIMH (U, V ) =
=1

w
sim (u , v )

−1

.
(13)

Theorem 2. Let U, V ∈ . We have SIMQ (U, V ) ≥
SIMA (U, V ) ≥ SIMG (U, V ) ≥ SIMH (U, V ).
Proof. The proof is done by using
inequalities: the Cauchy-Schwarz and the
AM-GM inequalities. For example, we
SIMQ (U, V ) ≥ SIMA (U, V ). Using the
Schwarz inequality,

classical
weighted
consider
Cauchy-



1604

L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

or SIMQ (U, V ) ≥ SIMQ (U, T ). The remainders of
proof are analogous.

2

K



= 1K y2

x2

= 1K x y

,

=1

Definition 11. Let SIM is an IVSM. The formulas
to predict the values of linguistic labels of the patient
P ∗ to the diseases Dg (g = 1, p) in MMC-IFRS are:

for all (x1 , . . . , xK ), (y1 , . . . , yK ) ∈ RK , we have

K

w (sim (u , v ))2

n

=1

P∗

K

=

w

K

1/2 2

w

=1

1/2

μzD Dg =

2


sim (u , v )

w

1/2

w

1/2

n

sim (u , v )

P∗

γzD Dg =

=1

v=1

Pv
Dg
SIM (P ∗ , Pv ) × γzD
n

2

K


=

w sim (u , v )

,
SIM (P ∗ , Pv )

v=1

2



,
SIM (P ∗ , Pv )

v=1

.

for all ∀z ∈ 1, tg , g = 1, p.

=1
2

That means SIMQ (U, V ) ≥ (SIMA (U, V ))2 , or
SIMQ (U, V ) ≥ SIMA (U, V ).
Theorem 3. Assume that w > 0 for all = 1, K.
SIMQ , SIMA , SIMG and SIMH are IVSM.

Proof. Obviously, SIMQ , SIMA , SIMG and SIMH
satisfy (B1).
(B2) For all U, V ∈ . By Theorem 2, it is
sufficient to prove that SIMQ (U, V ) ≤ 1. From
sim (u , v ) ≤ 1, for all = 1, K, we obtain

Theorem 4. For all z ∈ 1, tg , g = 1, p and patient




P D
P ∗ , we have μPzD Dg , γzD
g

is an IFv.


P
Dg ≥ 0, and
Proof. It is easily seen that μzD

P D

0.
Moreover,
γzD
g





P
μPzD Dg + γzD
Dg
n

=

v=1

Pv
Dg
SIM (P ∗ , Pv ) × μPzDv Dg + γzD

.

n

SIM

SIMQ (U, V )

(P ∗ , Pv )

v=1
1
2

K


w (sim (u , v ))

1
2

K



2

=1

w

= 1.

=1

(B3) For all U, V ∈

sim (u , v ) ≥ sim (u , t ) , ∀ = 1, K.
Thus, (sim (u , v ))w ≥ (sim (u , t ))w , for all
1, K. Hence
1
2

2


w (sim (u , v ))

is an IFv, then μPzDv Dg +




Pv
P D
γzD
Dg ≤ 1. Thus, μPzD Dg + γzD
g ≤ 1.

4. Evaluation

By Theorem 2, if one in the values SIMQ (U, V ),
SIMA (U, V ) and SIMG (U, V ) equals to 1, then
SIMH (U, V ) equals to 1. Then, SIMQ (U, V ),
SIMA (U, V ) and SIMG (U, V ) satisfy (B3).
(B4) The condition U ≤ V ≤ T yields that

K

Pv
μPzDv Dg , γzD
Dg

, it is easily to show that

SIMH (U, V ) = 1 ⇔ U = V.


=1

n

=1

K

=

v=1

SIM (P ∗ , Pv ) × μPzDv Dg

=
1
2

K



2

w (sim (u , t ))
=1

,


4.1. Experimental design
In this part, we describe the experimental environments such as,
Experimental tools: We have implemented 16 variants of the prediction algorithm for medical diagnosis
by matching each IVSM function in Equations (6–9)
with each IvSM function given in Equations (10–
13) in PHP programming language. Notice that the
variant combining Equations (9, 11) is exactly the
IFSD function of MC-IFRS defined in Equations
(3–4) of Definitions 3 & 4, respectively. Thus, we
clearly recognize that IFSD is a special case of


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

1605

Table 2
The MAE values of the variants by k-fold cross validation with the best values being marked as bold
k

A1

A2

A3

A4

A5


A6

A7

A8

2
3
4
5
6
7
8
9
10

0.49778
0.49649
0.49269
0.49117
0.4918
0.49461
0.49481
0.49092
0.49012

0.49593
0.4948
0.49075
0.48895

0.4896
0.49245
0.4923
0.48866
0.48793

0.48619
0.48123
0.47834
0.47588
0.47414
0.47895
0.47481
0.47362
0.4726

0.4946
0.49524
0.49241
0.49131
0.49297
0.49384
0.49705
0.4899
0.49059

0.49781
0.49657
0.49266
0.49115

0.49179
0.49462
0.49484
0.49098
0.49017

0.49569
0.49456
0.49046
0.48867
0.48931
0.49212
0.49212
0.48843
0.48776

0.48365
0.4761
0.47035
0.47284
0.46134
0.47172
0.47531
0.46213
0.46502

0.49275
0.49368
0.49125
0.4898

0.49144
0.49192
0.49599
0.48804
0.48898

A9
0.49779
0.49647
0.49268
0.49115
0.49179
0.4946
0.49482
0.49095
0.49015

A10

A11

A12

A13

A14 (IFSD)

A15

A16


Average

0.49573
0.49462
0.49057
0.48872
0.48938
0.4922
0.49213
0.48849
0.48778

0.48508
0.48031
0.47735
0.47471
0.47313
0.47783
0.47373
0.47258
0.47154

0.49318
0.49418
0.49149
0.4903
0.49197
0.49241
0.49629

0.48856
0.48952

0.49798
0.49662
0.49287
0.49138
0.49204
0.49489
0.49512
0.4912
0.49038

0.49621
0.49499
0.49098
0.48921
0.48987
0.49278
0.49268
0.48903
0.48826

0.4867
0.48148
0.47864
0.47614
0.47432
0.47932
0.47519

0.47417
0.47294

0.4953
0.49578
0.49315
0.49191
0.49382
0.49453
0.49794
0.49093
0.49154

0.49327
0.49144
0.48792
0.48646
0.48617
0.4893
0.4897
0.48491
0.48471

the proposed IVSM functions in this work. The
variants are denoted from A1 to A16 with A1
being matched between Equations (6, 10), A2 being
matched between Equations (6, 11), ... and A16 being
matched between Equations (9, 13). A14 is replaced
with the IFSD function [9] as explained above. Notice
that the hybrid similarity degree [9] described in

Definition 5 is just a derivative of IFSD with the
supplement of information from a picture fuzzy clustering method so that for the accurate comparison
between the original similarity degrees, it should not
be mentioned herein. Further hybridization between
the IVSM functions and the degree from a picture
fuzzy clustering method is considered in another
work. These algorithms are executed on a PC Intel(R)
core(TM) 2 Duo CPU T6400 @ 2.00GHz 2GB RAM.
The results are taken as the average value of 50 runs.
Evaluation indices: Mean Absolute Error (MAE) and
the computational time.
Datasets: The benchmark medical diagnosis da-taset
namely HEART from UCI Machine Learning Repository [12] consisting of 270 patients characterized by
13 attributes. This dataset was also used for experiments in [9, 11].
Cross validation: The cross-validation method for the
experiments is the k-fold validation with k from 2 to
10. Besides testing with the k-fold validation, the random experiments with the cardinalities of the testing
being from 10 to 100 random elements are also performed. In order to validate the results with accurate

classes, the intuitionistic defuzzification method of
[1] as in Example 1 is used for experimental algorithms.
Parameter setting: the weights of the degrees are set
up as in [9, 11].
Objective: To validate the performance of IVSM
functions in terms of accuracy through evaluation
indices.
4.2. Assessment
In Tables 2 and 3, we illustrate the MAE values
and the computational time of the variants by k-fold
cross validation respectively. From Table 2, we calculate the average MAE values of variants by the

numbers of folds. This Tab. shows the MAE values
of the A7 variant is the best among all. Besides A7,
other variants such as A3, A11 and A15 should be
used for the best MAE values of the algorithm. It is
clear that a large number of folds do not correspond
to the better MAE value of algorithm. For the sake
of both the computational time and MAE values, the
number of folds should be selected within the range
[8, 10] especially when it is equal to 9, the average
and the best MAE values of all variants are 0.484 and
0.462 respectively, which hold the best trials among
all. In Table 3, the average computational time of all
variants by various numbers of folds are illustrated.
Apparently, the processing time of these algorithms
is from 0.68 to 1.44 seconds (sec). Furthermore,


1606

L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis
Table 3
The computational time of the variants by k-fold cross validation with the best values being marked as bold (sec)

k

A1

A2

A3


A4

A5

A6

A7

A8

2
3
4
5
6
7
8
9
10

0.62368
1.02375
1.0243
0.96545
0.88094
0.79568
0.72382
0.68045
0.61986


0.53082
0.86802
0.85899
0.81039
0.74145
0.67237
0.61374
0.56868
0.51953

0.6361
1.04342
1.03488
0.97614
0.89037
0.80626
0.7345
0.68627
0.62579

0.64182
1.05719
1.0515
1.18887
0.91322
0.97744
0.75013
0.69806
0.63991


1.11738
1.81597
1.80621
1.70328
1.55427
1.40876
1.2764
1.19186
1.09598

0.99672
1.65204
1.6414
1.55404
1.41456
1.28346
1.16276
1.08773
0.99844

1.04728
1.73033
1.72061
1.71526
1.63634
1.43476
1.48867
1.55723
1.11612


1.11162
1.85574
1.84013
1.73646
1.58472
1.43311
1.2996
1.22066
1.11776

A9
0.74955
1.23906
1.23272
1.16188
1.06067
0.96116
0.87313
0.81649
0.74981

A10

A11

A12

A13


A14 (IFSD)

A15

A16

Average

0.65647
1.08841
1.08387
1.02141
0.93403
0.84421
0.76648
0.7164
0.65793

0.73759
1.21238
1.20848
1.14285
1.04405
0.94368
0.85827
0.80007
0.73414

0.77081
1.26825

1.27078
1.20027
1.08885
0.98992
0.89568
0.83835
0.76777

0.87228
1.44838
1.44172
1.36625
1.24264
1.16437
1.02154
0.96022
0.88082

0.87036
1.38778
1.89513
2.20693
1.39472
1.16454
0.94968
0.90273
0.8214

0.88133
1.46329

1.45699
1.37967
1.25416
1.13861
1.03229
0.96385
0.88472

0.87031
1.45111
1.44352
1.36779
1.24258
1.12948
1.02651
0.95893
0.8783

0.81963
1.35032
1.3757
1.34356
1.17985
1.07174
0.96707
0.9155
0.81927

Table 4
The MAE values of the variants by random experiments with the best values being marked as bold

Dataset
10
20
30
40
50
60
70
80
90
A9
0.4987
0.4942
0.49168
0.49056
0.49366
0.49295
0.4936
0.49604
0.49624

A1

A2

A3

A4

A5


A6

A7

A8

0.49872
0.49422
0.49167
0.49055
0.49372
0.49296
0.49359
0.49602
0.49624

0.4966
0.49222
0.49049
0.48833
0.49261
0.49106
0.4916
0.49409
0.49446

0.48595
0.48074
0.4841

0.47383
0.4824
0.47807
0.47891
0.48219
0.4847

0.50024
0.49817
0.491
0.49299
0.49411
0.49316
0.49389
0.49624
0.49567

0.49872
0.49421
0.4917
0.49057
0.49365
0.49295
0.49362
0.49606
0.49626

0.49635
0.49202
0.49038

0.48815
0.49241
0.49081
0.49139
0.49392
0.49425

0.4801
0.47099
0.47765
0.4648
0.48028
0.46979
0.47604
0.47761
0.48173

0.49907
0.49757
0.48979
0.4921
0.49356
0.49182
0.49268
0.49488
0.494

A10

A11


A12

A13

A14 (IFSD)

A15

A16

Average

0.49641
0.49205
0.49038
0.48819
0.49245
0.49089
0.49143
0.49395
0.49427

0.48488
0.47994
0.48348
0.47282
0.48193
0.47711
0.47799

0.48128
0.48379

0.49953
0.49777
0.49027
0.49242
0.49382
0.4922
0.49303
0.49542
0.49462

0.49894
0.49438
0.4919
0.49072
0.49378
0.49321
0.49384
0.4963
0.49646

0.49685
0.49238
0.49081
0.48854
0.4927
0.49138
0.49191

0.49444
0.49473

0.48616
0.4807
0.48444
0.47398
0.48258
0.47843
0.47919
0.48258
0.48501

0.50106
0.4985
0.49165
0.49357
0.49432
0.49402
0.49459
0.49715
0.49632

0.49489
0.49063
0.48884
0.48576
0.4905
0.48818
0.48921

0.49176
0.49242

A2 is the best variant in term of the computational
time.
In order to validate the efficiencies of variants, we
have made experiments on another cross validation
method. Tables 4 and 5 demonstrate the MAE values
and the computational time of the variants by random experiments respectively. The remarks about the
superior of A7 and other variants such as A3, A11 and
A15 are kept intact. The results have clearly shown
that the ideal cardinality of the testing set should be
selected as 40 or in the range [20, 60].

5. Conclusions
In this paper, we concentrated on improving the
accuracy of medical diagnosis in the health care
support system. We have shown that Intuitionistic Fuzzy Recommender System (IFRS) and the
hybridization between IFRS and a picture fuzzy clustering method are the efficient tools to achieve the
desired goal. Nonetheless, both these methods were
relied on an important assumption in IFRS confirming that the numbers of intuitionistic linguistic


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

1607

Table 5
The computational time of the variants by random experiments with the best values being marked as bold (sec)
Dataset

10
20
30
40
50
60
70
80
90
A9
0.33304
0.61239
0.82699
1.01144
1.1541
1.24604
1.2845
1.30123
1.26806

A1

A2

A3

A4

A5


A6

A7

A8

0.27702
0.50971
0.70282
0.84714
0.9358
0.96953
1.0147
1.03242
1.002

0.22733
0.42011
0.57596
0.70389
0.80834
0.86197
0.89381
0.90212
0.88085

0.27488
0.50733
0.69197
0.82985

0.93538
1.00034
1.04289
1.05397
1.02964

0.27221
0.49151
0.67851
0.82521
0.94585
1.00412
1.04741
1.05864
1.03186

0.46429
0.83186
1.14413
1.39252
1.58242
1.70361
1.76746
1.7912
1.74536

0.43112
0.7901
1.08339
1.32589

1.50208
1.62146
1.6842
1.7068
1.66037

0.48753
0.87638
1.19806
1.466
1.66725
1.78988
1.85808
1.89346
1.84491

0.46789
0.86537
1.18301
1.44347
1.63351
1.75908
1.83507
1.857
1.80743

A10

A11


A12

A13

A14 (IFSD)

A15

A16

Average

0.28602
0.52829
0.72495
0.88395
1.00826
1.07822
1.12394
1.14231
1.10832

0.33092
0.60442
0.81761
0.99718
1.12683
1.21263
1.26343
1.27956

1.24964

0.32278
0.59332
0.81626
0.99415
1.1277
1.21261
1.26634
1.28096
1.24248

0.37994
0.69138
0.9456
1.16695
1.32588
1.42422
1.48078
1.49851
1.46627

0.33401
0.61902
0.84867
1.0418
1.17979
1.26936
1.31853
1.33657

1.30283

0.37816
0.69674
0.95819
1.17251
1.33447
1.43551
1.49154
1.51599
1.47189

0.37537
0.69527
0.95693
1.16493
1.32474
1.42471
1.48255
1.49864
1.46174

0.35266
0.64583
0.88457
1.07922
1.22453
1.31333
1.36595
1.38434

1.34835

labels in the features of patients, in the characteristics of symptoms and in the diseases are the
same, which results in unrealistic practical applications. Therefore, in this paper we proposed a new
term – the intuitionistic fuzzy vector (IFV) instead
of the existing intuitionistic fuzzy matrix (IFM) in
IFRS. Then, a generalization of the existing multicriteria IFRS so-called the Modified multi-criteria
IFRS (MMC-IFRS) that takes into account the IFV
has been presented. Two new measures namely the
intuitionistic value similarity measure (IvSM) and
the intuitionistic vector similarity measure (IVSM)
have been defined. Some mathematical properties of
these new terms were examined, and several IVSM
functions were proposed. The performances of these
IVSM functions for medical diagnosis were experimentally validated and compared with the existing
similarity degrees of IFRS. Our contributions including the definitions of the new system for medical
diagnosis, some interesting theorems and the performance evaluation of 16 IVSM functions were
presented accordingly.
The findings from the research are summarized
as follows. Firstly, the modification did make the
improvement of accuracy of the system. Clearly, the
MAE values of the new IVSM functions are better
than that of IFSD, which is currently used in IFRS to
make the prediction of possible diseases for patients.
The results from Tables 2 to 5 demonstrated that the
values of IFSD in the variant A14 are worse than some
values of other variants. Secondly, the best variants
in term of the accuracy are A7, A3, A11 and A15.

Thirdly, the best variants in term of the computational

time are A1, A2, A3, A4 and A10. Fourthly, the ideal
number of folds for validation should be selected
within the range [8, 9] especially 9. Fifthly, the ideal
cardinality of the testing set should be selected as
40 or in the range [20, 60]. Lastly, all variants are
stable by various cross validation methods. These
findings would help researchers choose appropriate
algorithms and variants for specific purposes in the
health care support system.
Further works of this research could be lay into the
following directions. Firstly, a variation of the IVSM
function based algorithm that tackles with the deficiency of processing small-sized real datasets should
be studied. Secondly, a hybrid algorithm with a picture fuzzy clustering method to enhance the accuracy
is considered. Thirdly, more theoretical analyses of
the MMC-IFRS especially the adaptation with other
fuzzy operators such as t-norm and t-conorm are
examined. Lastly, applications of the variants in this
paper for other problems, e.g. the time series forecast and the nowcasting could be performed. These
future works will enrich the knowledge of deploying
advanced fuzzy recommender systems for practical
problems.

References
[1] G. Albeanu and F.L. Popentiu-Vladicescu, Intuitionistic
fuzzy methods in software reliability modelling, Journal of
Sustainable Energy 1(1) (2010), 30–34.


1608


L.H. Son and P.H. Phong / On the performance evaluation of IVSMs for medical diagnosis

[2] R. Basu, U. Fevrier-Thomas and K. Sartipi, Incorporating
hybrid CDSS in primary care practice management, McMaster eBusiness Research Centre (MeRC), DeGroote School of
Business, McMaster University, Canada, 2011.
[3] D.A. Davis, N.V. Chawla, N. Blumm, N. Christakis and A.L.
Barab´asi, Predicting individual disease risk based on medical history, in: Proceedings of the 17th ACM Conference on
Information and Knowledge Management, New York, NY,
USA, 2008, pp. 769–778.
[4] S.K. De, R. Biswas and A.R. Roy, An application of intuitionistic fuzzy sets in medical diagnosis, Fuzzy Sets and Systems
117(2) (2001), 209–213.
[5] S. Hassan and Z. Syed, From netflix to heart attacks: Collaborative filtering in medical datasets, in: Proceedings of the
1st ACM International Health Informatics Symposium, New
York, NY, USA, 2010, pp. 128–134.
[6] E.C. Kyriacou, C.S. Pattichis and M.S. Pattichis, An
overview of recent health care support systems for eEmergency and mHealth applications, in: Proceedings of the
Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, Minneapolis, Minnesota,
USA, 2009, pp. 1246–1249.

[7] A.E. Samuel and M. Balamurugan, Fuzzy max-min composition technique in medical diagnosis, Applied Mathematical
Sciences 6(35) (2012), 1741–1746.
[8] L.H. Son, DPFCM: A novel distributed picture fuzzy clustering method on picture fuzzy sets, Expert Systems with
Applications 42(1) (2015), 51–66.
[9] L.H. Son and N.T. Thong, Intuitionistic fuzzy recommender
systems: An effective tool for medical diagnosis, KnowledgeBased Systems 74 (2015), 133–150.
[10] E. Szmidt and J. Kacprzyk, A similarity measure for
intuitionistic fuzzy sets and its application in supporting
medical diagnostic reasoning, in: Artificial Intelligence and
Soft Computing-ICAISC, Springer Berlin Heidelberg, 2004,

pp. 388–393.
[11] N.T. Thong and L.H. Son, HIFCF: An effective hybrid model
between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis, Expert Systems
With Applications 42(7) (2015), 3682–3701.
[12] University of California (2007). UCI Repository of
Machine Learning Databases. Available at: http://archive.
ics.uci.edu/ml/



×