Tải bản đầy đủ (.pdf) (20 trang)

DSpace at VNU: HIFCF: An effective hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.32 MB, 20 trang )

Expert Systems with Applications 42 (2015) 3682–3701

Contents lists available at ScienceDirect

Expert Systems with Applications
journal homepage: www.elsevier.com/locate/eswa

HIFCF: An effective hybrid model between picture fuzzy clustering
and intuitionistic fuzzy recommender systems for medical diagnosis
Nguyen Tho Thong, Le Hoang Son ⇑
VNU University of Science, Vietnam National University, Hanoi, Viet Nam

a r t i c l e

i n f o

Article history:
Available online 31 December 2014
Keywords:
Fuzzy sets
Hybrid Intuitionistic Fuzzy Collaborative
Filtering
Intuitionistic fuzzy recommender systems
Medical diagnosis
Picture fuzzy clustering

a b s t r a c t
The health care support system is a special type of recommender systems that play an important role in
medical sciences nowadays. This kind of systems often provides the medical diagnosis function based on
the historic clinical symptoms of patients to give a list of possible diseases accompanied with the membership values. The most acquiring disease from that list is then determined by clinicians’ experience
expressed through a specific defuzzification method. An important issue in the health care support system is increasing the accuracy of the medical diagnosis function that involves the cooperation of fuzzy


systems and recommender systems in the sense that uncertain behaviors of symptoms and the clinicians’
experience are represented by fuzzy memberships whilst the determination of the possible diseases is
conducted by the prediction capability of recommender systems. Intuitionistic fuzzy recommender systems (IFRS) are such the combination, which results in better accuracy of prediction than the relevant
methods constructed on either the traditional fuzzy sets or recommender system only. Based upon the
observation that the calculation of similarity in IFRS could be enhanced by the integration with the information of possibility of patients belonging to clusters specified by a fuzzy clustering method, in this paper
we propose a novel hybrid model between picture fuzzy clustering and intuitionistic fuzzy recommender
systems for medical diagnosis so-called HIFCF (Hybrid Intuitionistic Fuzzy Collaborative Filtering). Experimental results reveal that HIFCF obtains better accuracy than IFCF and the standalone methods of intuitionistic fuzzy sets such as De, Biswas & Roy, Szmidt & Kacprzyk, Samuel & Balamurugan and
recommender systems, e.g. Davis et al. and Hassan & Syed. The significance and impact of the new
method contribute not only the theoretical aspects of recommender systems but also the applicable roles
to the health care support systems.
Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction
In recent years, the health care support system or the clinical
decision support system has emerged as an important tool in medical sciences to assist clinicians in decision making especially medical diagnosis specifying which diseases could be found from a list
of measured symptoms of a patient as well as the most acquiring
disease among them. Physicians, nurses and other healthcare professionals use the health care support system to prepare a diagnosis and to review the diagnosis as a means of improving the final
result. According to Basu, Fevrier-Thomas, and Sartipi (2011),
Foster, McGregor, and El-Masri (2005), and Kyriacou, Pattichis,
and Pattichis (2009), the health care support system can be defined
⇑ Corresponding author at: 334 Nguyen Trai, Thanh Xuan, Hanoi, Viet Nam. Tel.:
+84 904 171 284.
E-mail addresses: (N.T. Thong), sonlh@vnu.
edu.vn, (L.H. Son).
/>0957-4174/Ó 2014 Elsevier Ltd. All rights reserved.

as computer applications that support and assist clinicians in
improved decision-making by providing evidence-based knowledge with respect to patient data. This type of computer-based system consists of three components: a language system, a knowledge
system and a problem processing system. It is able to handle complex problems, applying domain-specific expertise to assess the
consequences of executing its recommendations. There are two

main types of the health care support system (Rouse, 2014). The
first one uses a knowledge base, applies rules to patient data using
an inference engine and displays the results to the end user. Systems without a knowledge base, on the other hand, rely on
machine learning to analyze clinical data (Fig. 1). Machine learning
methods are conducted to examine patients’ medical history in
conjunction with relevant clinical researches, which are able to
predict potential events ranging from drug interactions to disease
symptoms. Utilizing the medical diagnosis process, characteristics
of an individual patient are matched to a computerized clinical


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

knowledge base and patient-specific assessment and recommendations are then presented to the clinical or the patient for a decision (Rajalakshmi, Mohan, & Babu, 2011).
An important issue in the health care support system is increasing the accuracy of the medical diagnosis. Previous researches concentrated on improving the machine learning methods/knowledge
systems appeared in Phase 2 of the medical diagnosis process in
Fig. 1. A brief summary is shown as follows:
 A hybrid evolutionary algorithm between genetic programming
and genetic algorithms (Tan, Yu, Heng, & Lee, 2003).
 Genetic algorithm (Anbarasi, Anupriya, & Iyengar, 2010).
 The combination of a type-2 fuzzy logic with genetic algorithm
(Hosseini, Ellis, Mazinani, & Dehmeshki, 2011).
 An evolutionary artificial neural network approach based on the
Pareto differential evolution algorithm augmented with local
search (Abbass, 2002).
 Neuro-fuzzy inference system – CANFIS (Parthiban &
Subramanian, 2008).
 Complex modular neural network (Kala, Janghel, Tiwari, &
Shukla, 2011).
 Bayesian networks (Gevaert, De Smet, Timmerman, Moreau, &

De Moor, 2006; Roberts, Kahn, & Haddawy, 1995).
 Hierarchical Association Rule Model – HARM (McCormick,
Rudin, & Madigan, 2011).
 C4.5 Rule-PANE, which combines an artificial neural network
ensemble with rule induction (Zhou & Jiang, 2003).
 Support vector machines (Kampouraki, Vassis, Belsis, &
Skourlas, 2013).
However, these methods often fail to achieve high accuracy of
prediction with real medical diagnosis datasets. This is because
that the relations between the patients – the symptoms and the
symptoms – the diseases (Fig. 1) are often vague, imprecise and
uncertain. For instance, doctors could faced with patients who
are likely to have personal problems and/or mental disorders so
that the crucial patients’ signs and symptoms are missing, incomplete and vague even though the supports of patients’ medical histories and physical examination are provided within the diagnosis.
Even if information of patients are clearly provided, how to give
accurate evaluation to given symptoms/diseases is another challenge requiring well-trained, copious-experienced physicians.
These evidences raise the need of using fuzzy set or its extension
to model and assist the techniques that improve the accuracy of
diagnosis. The definition of fuzzy set is stated below.

Clinical Data
(Patients-Symptoms)

Phase 1

Phase 2
Knowledge System

Phase 3


Phase 4

Machine Learning

Consequent Rules

Results
(Patients-Diseases)

Fig. 1. The medical diagnosis process of the health care support system.

3683

Definition 1. A Fuzzy Set (FS) (Zadeh, 1965) in a non-empty set X is
a function

l : X ! ½0; 1Š;
x#lðxÞ;

ð1Þ

where lðxÞ is the membership degree of each element x 2 X. A fuzzy
set can be alternately defined as,

A ¼ fhx; lðxÞijx 2 Xg:

ð2Þ

An extension of FS that is widely applied to the medical prognosis
problem is Intuitionistic Fuzzy Set (IFS), which is defined as follows.

Definition 2. An Intuitionistic Fuzzy Set (IFS) (Atanassov, 1986) in a
non-empty set X is,

n
o
e ¼ hx; l ðxÞ; c ðxÞijx 2 X ;
A
e
e
A

A

ð3Þ

where le ðxÞ and ce ðxÞ are the membership and non-membership
A
A
degrees of each element x 2 X, respectively.

leA ðxÞ; ceA ðxÞ 2 ½0; 1Š; 8x 2 X;

ð4Þ

0 6 leðxÞ þ ce ðxÞ 6 1;

ð5Þ

A


A

8x 2 X:

The intuitionistic fuzzy index of an element showing the non-determinacy is denoted as,

peA ðxÞ ¼ 1 À leA ðxÞ þ ceA ðxÞ; 8x 2 X:

ð6Þ

when pe ðxÞ ¼ 0 for 8x 2 X, IFS returns to the FS set of Zadeh.
A
Various researches utilizing FS and IFS for the medical diagnosis
process can be found in the literature. De, Biswas, and Roy (2001)
extended the Sanchez’s approach with the notion of intuitionistic
fuzzy set theory for medical diagnosis. The information of symptoms – patients and symptoms – diseases are fuzzified by intuitionistic fuzzy memberships, and the possibilities of acquired
diseases are calculated based on those membership values and
intuitionistic fuzzy relations. Szmidt and Kacprzyk (2001), Szmidt
and Kacprzyk (2003, Szmidt and Kacprzyk (2004) used the concept
of intuitionistic fuzzy set to express new aspects of imperfect information between the sets of symptoms and diagnoses and defined a
new similarity measure between intuitionistic fuzzy sets for the
applications of medical diagnostic reasoning. Khatibi and
Montazer (2009) employed five similarity measures of fuzzy sets
and intuitionistic fuzzy sets to encounter uncertainty in medical
pattern recognition. The experimental results showed that both
fuzzy sets and intuitionistic fuzzy sets have powerful capabilities
to cope with the uncertainty in the medical pattern recognition
problems but intuitionistic fuzzy sets especially the measure of
Hausdorf and Mitchel yield better detection rate as a result of more
accurate modeling which is involved with incurring more computational cost. Own (2009) studied the switching relation between

type-2 fuzzy sets and intuitionistic fuzzy sets to deal with the
vagueness and insufficient information. Moein, Monadjemi, and
Moallem (2009) offered a hybrid fuzzy-neural automatic system
for medical diagnosis without concerning about how to calculate
the best membership function for each fuzzy data. Neog and Sut
(2011) introduced a matrix representation of fuzzy soft set and
extended Sanchez’s approach for medical diagnosis using the
notion of fuzzy soft complement. Xiao et al. (2012) proposed the
concept of D–S generalized fuzzy soft sets by combining Dempster–Shafer theory of evidence and generalized fuzzy soft sets. A
new method of evaluation based on D–S generalized fuzzy soft sets
was presented and applied to the medical diagnosis. Agarwal,
Hanmandlu, and Biswas (2011) introduced a generalized intuitionistic fuzzy soft set and a new scoring function to compare two


3684

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

intuitionistic fuzzy numbers for multi-criteria medical diagnosis.
Meenakshi and Kaliraja (2011) presented a method that extends
Sanchez’s approach for medical diagnosis through the arithmetic
mean of an interval valued fuzzy matrix, which is a simpler technique than that of using intuitionistic fuzzy sets. Ahn, Han, Oh,
and Lee (2011) developed an interview chart with interval fuzzy
degrees based on the relation between symptoms and diseases
(three types of headache), and utilized the interval-valued intuitionistic fuzzy weighted arithmetic average operator to aggregate
fuzzy information from the symptoms. A measure based on distance between interval-valued intuitionistic fuzzy sets for medical
diagnosis was also presented. Samuel and Balamurugan (2012)
proposed a new technique named intuitionistic fuzzy max–min
composition to study the Sanchez’s approach for medical diagnosis. Shinoj and John (2012) introduced a new concept namely intuitionistic fuzzy multisets, which are the combination of
intuitionistic fuzzy sets and fuzzy multisets of Yager. Intuitionistic

fuzzy multisets are characterized by the count membership and
the count non-membership functions, and when the sum of these
functions is equal to one, intuitionistic fuzzy multisets returns to
intuitionistic fuzzy sets. Intuitionistic fuzzy multisets are used to
model the symptoms by various timestamps. Other recent works
could be found in Ahn (2014), Bora, Bora, Neog, and Sut (2014),
Bourgani, Stylios, Manis, and Georgopoulos (2014), Das and Kar
(2014), Muthuvijayalakshmi, Kumar, and Venkatesan (2014),
Nguyen, Khosravi, Creighton, and Nahavandi (2014), Sanz, Galar,
Jurio, Brugos, Pagola, et al. (2014), Shanmugasundaram and
Seshaiah (2014), Sharaf-El-Deen, Moawad, and Khalifa (2014).
The limitations of the relevant researches utilizing FS and IFS for
the medical diagnosis process are: Firstly, these works calculate the
relation between the patients and the diseases solely from those
between the patients – the symptoms and the symptoms – the diseases. In some practical cases where the relation between the
patients – the symptoms or the symptoms – the diseases is missing, those works could not be performed. This fact is happened in
reality since clinicians somehow do not accurately express the values of membership and non-membership degrees of symptoms to
diseases or vive versa; secondly, the information of previous diagnoses of patients could not be utilized. That is to say, a patient
has had some records in the patients-diseases databases beforehand. Nevertheless, the calculation of the next records of this
patient is made solely on the basis of both the relations between
the patients – the symptoms and the symptoms – the diseases.
Historic diagnoses of patients are not taken into account so that
the accuracy of diagnosis may not be high as a result; thirdly, the
determination of the most acquiring disease is dependent from
the defuzzification method. For instance, De et al. (2001) used
the hybrid function of membership and non-membership values
for the defuzzification, Samuel and Balamurugan (2012) relied on
the reduction matrix from W PD and Szmidt and Kacprzyk (2001),
Szmidt and Kacprzyk (2003, Szmidt and Kacprzyk (2004), Khatibi
and Montazer (2009) and Shinoj and John (2012) employed the

distance functions. Independent determination from the defuzzification method should be investigated for the stable performance of
the algorithm.
Due to these reasons, a combination of fuzzy sets and a machine
learning method is a good choice to eliminate the disadvantages of
the relevant works using FS and IFS. Recommender Systems – RS
(Ricci, Rokach, & Shapira, 2011) are such the machine learning
method, which can give users information about predictive ‘‘rating’’ or ‘‘preference’’ that they would like to assess an item; thus
helping them to choose the appropriate item among numerous
possibilities. This kind of expert systems is now commonly popularized in numerous application fields such as books, documents,
images, movie, music, shopping and TV programs personalized systems. Recommender Systems have been applied to medical diag-

nosis. Davis, Chawla, Blumm, Christakis, and Barabási (2008)
proposed CARE, a Collaborative Assessment and Recommendation
Engine, which relies only on a patient’s medical history in order to
predict future diseases risks and combines collaborative filtering
methods with clustering to predict each patient’s greatest disease
risks based on their own medical history and that of similar
patients. An iterative version of CARE so-called ICARE that incorporates ensemble concepts for improved performance was also introduced. These systems required no specialized information and
provided predictions for medical conditions of all kinds in a single
run. Hassan and Syed (2010) employed a collaborative filtering
framework expressed in Eq. (7) that assessed patient risk both by
matching new cases to historical records and by matching patient
demographics to adverse outcomes so that it could achieve a
higher predictive accuracy for both sudden cardiac death and
recurrent myocardial infraction than popular classification
approaches such as logistic regression and support vector
machines.
Ã

Rða; i Þ ¼ r a þ


P

b2Unfag SIMða; bÞ

P

à ðr b;ià À r b Þ

b2Unfag jSIMða; bÞj
Ã

;

ð7Þ

where a; b are patients and i is the considered disease. The similarity between two patients – SIMða; bÞ is calculated by the Pearson
Ã
coefficient from the demographic information of patients. Rða; i Þ
Ã
Ã
and rb;i are the possibilities of acquiring disease i of patient a and
b, respectively. ra and rb are the average possibilities of acquiring
all diseases of patient a and b, respectively. More works on the applications of RS to the medical diagnosis could be referenced in Duan,
Street, and Xu (2011), Meisamshabanpoor and Mahdavi (2012),
West and Marion (2014) and our previous works in Cuong, Son,
and Chau (2010), Son, Cuong, Lanzi, and Thong (2012), Son, Lanzi,
Cuong, and Hung (2012), Son, Cuong, and Long (2013), Son, Linh,
and Long (2014), Thong and Son (2014), Son (2014a), Son (2014b,
Son (2014c, Son (2015) and Son and Thong (2015).

The standalone RS methods such as the works of Davis et al.
(2008), Hassan and Syed (2010), Duan et al. (2011),
Meisamshabanpoor and Mahdavi (2012) and West and Marion
(2014) are solely effective with the crisp dataset but not the fuzzy
one. Moreover, they work only if the historic diagnoses of patients
for the prediction are provided, and their accuracies of diagnosis
are depended on the defuzzification method. Therefore, a cooperation of fuzzy systems and recommender systems is regarded as an
effective strategy to exclude the drawbacks of both the researches
using FS and IFS only in the sense that uncertain behaviors of
symptoms and the clinicians’ experience are represented by fuzzy
memberships whilst the determination of the possible diseases is
conducted by the prediction capability of recommender systems.
Intuitionistic fuzzy recommender systems – IFRS (Son & Thong,
2015) are such the combination, which results in better accuracy
of prediction than the relevant standalone methods constructed
on either the traditional fuzzy sets or recommender systems only.
This work is the first effort to initiate fuzzy-based recommender
systems for the health care support system. In this research, new
definitions of single-criterion IFRS (SC-IFRS) and multi-criteria IFRS
(MC-IFRS) that extend the definition of RS taking into account a
feature of a user and a characteristic of an item expressed by intuitionistic linguistic labels were proposed. Next, new definitions of
intuitionistic fuzzy matrix (IFM), which is a representation of SCIFRS and MC-IFRS in the matrix format and the intuitionistic fuzzy
composition matrix (IFCM) of two IFMs with the intersection/
union operation were presented and used to design some new similarity degrees of IFMs such as the intuitionistic fuzzy similarity
matrix (IFSM) and the intuitionistic fuzzy similarity degree (IFSD).
From these similarity functions, a novel Intuitionistic Fuzzy Collaborative Filtering method so-called Intuitionistic Fuzzy Collaborative
Filtering (IFCF) was presented for the medical diagnosis problem.


3685


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

IFCF has been validated on benchmark medical diagnosis datasets
from UCI Machine Learning Repository in terms of the accuracy of
diagnosis and showed better performance than the standalone
methods of FS and RS.
The motivation and contributions of this paper are elicited as
follows. IFCF used IFSD to calculate the similarity between two
patients. This measure is the generalization of the hard user-based,
item-based and the rating-based similarity degrees in RS (Ricci
et al., 2011). Nonetheless, IFSD could be enhanced by the integration with the information of possibility of patients belonging to
clusters specified by a fuzzy clustering method. That is to say, if
we know the new patient belongs to which group then the similarities of this patient with others in the group should be given a high
influence in the calculation of IFSD. Therefore, in this paper we propose a novel hybrid model between picture fuzzy clustering and
intuitionistic fuzzy recommender systems for medical diagnosis
so-called Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF).
HIFCF makes uses of a newest picture fuzzy clustering method
namely Distributed Picture Fuzzy Clustering Method – DPFCM (Son,
2015) to classify the patients into some groups according to the
relations information of patients. Then, the possibility of a patient
belonging to a certain cluster is used to calculate the similarity
degrees between users. They are supplemented into IFSD to give
the final similarity between patients. The new hybrid algorithm
HIFCF will be validated experimentally on benchmark UCI Machine
Learning Repository dataset and compared with the relevant methods in terms of accuracy. The rests of the paper are organized as
follows. Section 2 presents the new algorithm HIFCF. Section 3 validates the proposed model by experiments. Section 4 gives the conclusions and future works of the paper.

The medical diagnosis problem aims to determine the relation
between the patients and the diseases described by the set –

RPD ¼ fRPD ðP i ; Dj Þj 8i ¼ 1; . . . ; n; 8j ¼ 1; . . . ; kg where RPD ðPi ; Dj Þ is
either 0 or 1 showing that patient Pi acquires disease Dj or not. The
medical diagnosis problem can be shortly represented by the
implication fRPS ; RSD g ! RPD .
Definition 4 (Single-criterion intuitionistic fuzzy recommender systems – SC-IFRS (Son & Thong, 2015)). The utility function R is a
mapping specified on ðX; YÞ as follows.

R:XÂY !D
ðl1X ðxÞ; c1X ðxÞÞ;
*

Definition 3 (Medical diagnosis (Son & Thong, 2015)). Given three
lists: P ¼ fP1 ; . . . ; P n g; S ¼ fS1 ; . . . ; Sm g and D ¼ fD1 ; . . . ; Dk g where
P is a list of patients, S a list of symptoms and D a list of diseases,
respectively. Three values n; m; k 2 N þ are the numbers of patients,
symptoms and diseases, respectively. The relation between the
patients and the symptoms is characterized by the setRPS ¼ fRPS ðP i ; Sj Þj 8i ¼ 1; . . . ; n; 8j ¼ 1; . . . ; mg
where
RPS ðP i ; Sj Þ
shows the level that patient Pi acquires symptom Sj and is
represented by either a numeric value or a (intuitionistic) fuzzy
value depending on the domain of the problem. Analogously, the
relation between the symptoms and the diseases is expressed as
RSD ¼ fRSD ðSi ; Dj Þj 8i ¼ 1; . . . ; m; 8j ¼ 1; . . . ; kg where RSD ðSi ; Dj Þ
reflects the possibility that symptom Si would lead to disease Dj .

*

ðl2Y ðyÞ; c2Y ðyÞÞ;


+

*

ðl2D ðDÞ; c2D ðDÞÞ;

+

!

...

...

...

ðlsX ðxÞ; csX ðxÞÞ

ðlsY ðyÞ; csY ðyÞÞ

ðlsD ðDÞ; csD ðDÞÞ
ð8Þ

where liX ðxÞ 2 ½0; 1Š (resp. ciX ðxÞ 2 ½0; 1Š), 8i 2 f1; . . . ; sg is the membership (resp. non-membership) value of the patient to the linguistic label ith of feature X: ljY ðyÞ 2 ½0; 1Š (resp. cjY ðyÞ 2 ½0; 1Š),
8j 2 f1; . . . ; sg is the membership (resp. non-membership) value of
the symptom to the linguistic label jth of characteristic Y: Finally,
llD ðDÞ 2 ½0; 1Š (resp. clD ðDÞ 2 ½0; 1Š), 8l 2 f1; . . . ; sg is the membership
(resp. non-membership) value of disease D to the linguistic label lth.
SC-IFRS provides two basic functions:


(a) Prediction: determine the values of ðllD ðDÞ; clD ðDÞÞ;
8l 2 f1; . . . ; sg;
Ã
(b) Recommendation:
choose
i 2 ½1; sŠ
satisfying
Ã
i ¼ arg maxi¼1;s fliD ðDÞ þ liD ðDÞð1 À liD ðDÞ À ciD ðDÞÞg.

Definition 5 (Multi-criteria intuitionistic fuzzy recommender
systems – MC-IFRS (Son & Thong, 2015)). The utility function R is
a mapping specified on ðX; YÞ below.

R : X Â Y ! D1 Â Á Á Á Â Dk
ðl1X ðxÞ; c1X ðxÞÞ;
*

ðl2X ðxÞ; c2X ðxÞÞ;

2.1. Intuitionistic fuzzy recommender system
Firstly, the definition of medical diagnosis under the light of
intuitionistic fuzzy sets is described as follows.

+

ðl1D ðDÞ; c1D ðDÞÞ;

Â


2. The proposed method
In this section, we firstly recall some principal terms and algorithms of Intuitionistic fuzzy recommender system – IFRS (Son &
Thong, 2015) especially the Intuitionistic Fuzzy Collaborative Filtering – IFCF algorithm in Section 2.1. Secondly, we recall one of the
best recently-published picture fuzzy clustering methods namely
Distributed Picture Fuzzy Clustering Method – DPFCM (Son, 2015)
used to classify the patients into some groups according to their
relations information in Section 2.2. Thirdly, the main contribution
of the paper regarding a novel hybrid model between DPFCM
and IFRS for medical diagnosis so-called Hybrid Intuitionistic Fuzzy
Collaborative Filtering (HIFCF) is presented in Section 2.3. Lastly,
some theoretical analyses of the new algorithm are made in
Section 2.4.

ðl2X ðxÞ; c2X ðxÞÞ;

ðl1Y ðyÞ; c1Y ðyÞÞ;

ðl1Y ðyÞ; c1Y ðyÞÞ;
+

*
Â

ðl2Y ðyÞ; c2Y ðyÞÞ;

ðl1D ðD1 Þ; c1D ðD1 ÞÞ;
+

*
!


ðl2D ðD1 Þ; c2D ðD1 ÞÞ;

.. .

.. .

.. .

ðlsX ðxÞ; csX ðxÞÞ

ðlsY ðyÞ; csY ðyÞÞ

ðlsD ðD1 Þ; csD ðD1 ÞÞ

+

ðl1D ðDk Þ; c1D ðDk ÞÞ;
*
 Á ÁÁ Â

ðl2D ðDk Þ; c2D ðDk ÞÞ;

+
ð9Þ

. ..
ðlsD ðDk Þ; csD ðDk ÞÞ
MC-IFRS is the system that provides two basic functions below.
(a) Prediction:

determine
the
values
of
ðllD ðDi Þ; clD ðDi ÞÞ; 8l 2 f1; . . . ; sg; 8i 2 f1; . . . ; kg;
Ã
Ã
(b) Recommendation:
choose
i 2 ½1; sŠ
satisfying
i ¼
Pk
arg maxi¼1;s f j¼1 wj ðliD ðDj Þ þ liD ðDj Þð1 À liD ðDj Þ À ciD ðDj ÞÞÞg
where wj 2 ½0; 1Š is the weight of Dj satisfying the constraint:
Pk
j¼1 wj ¼ 1.


3686

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Table 1
The relation between the patients and the symptoms.

Table 4
The recommended diseases.

P


Temperature

Headache

Stomach_pain

Cough

Chest_pain

P

Viral_Fever

Malaria

Typhoid

Stomach

Chest

Ram
Mari
Sugu
Somu

(0.8, 0.1)
(0, 0.8)

(0.8, 0.1)
(0.6, 0.1)

(0.6, 0.1)
(0.4, 0.4)
(0.8, 0.1)
(0.5, 0.4)

(0.2, 0.8)
(0.6, 0.1)
(0, 0.6)
(0.3, 0.4)

(0.6, 0.1)
(0.1, 0.7)
(0.2, 0.7)
(0.7, 0.2)

(0.1, 0.6)
(0.1, 0.8)
(0, 0.5)
(0.3, 0.4)

Sugu
Somu

0.5537
0.5358

0.6552

0.6552

0.4032
0.4068

0.504
0.4446

0.122
0.122

Table 2
The training dataset with

Definition 7 (Son & Thong, 2015). Suppose that Z 1 and Z 2 are two
IFM in MC-IFRS. The intuitionistic fuzzy similarity matrix (IFSM)
between Z 1 and Z 2 is defined as follows.


0

being the values to be predicted.

P

Viral_Fever

Malaria

Typhoid


Stomach

Chest

Ram
Mari
Sugu
Somu

(0.4, 0.1)
(0.3, 0.5)

(0.7, 0.1)
(0.2, 0.6)

(0.6, 0.1)
(0.4, 0.4)

(0.2, 0.4)
(0.6, 0.1)

(0.2, 0.6)
(0.1, 0.7)






















A representation of MC-IFRS in the matrix format is demonstrated as follows.

e
S 11
Be
B S 21
B
Be
BS
e
S ¼ B 31
Be
B S 41
B
@ ...

e
S t1

0

a12

. . . a1s

Bb
B 21
B
B c31
Z¼B
Bc
B 41
B
@ ...

b22

. . . b2s C
C
C
. . . c3s C
C:
. . . c4s C
C
C
... ... A


ct1

c32
c42
...
ct2

...

cts

In Eq. (10), t ¼ k þ 2 where k 2 N is the number of diseases in Definition 5. The value s 2 Nþ is the number of intuitionistic linguistic
labels. a1i ; b2i ; chi ; 8h 2 f3; . . . ; tg; 8i 2 f1; . . . ; sg are the intuitionistic
fuzzy values (IFV) consisting of the membership and non-membership values as in Definition 5. a1i ¼ ðliX ðxÞ; ciX ðxÞÞ; 8i 2 f1; . . . ; sg
represents for the IFV value of the patient to the linguistic label
ith of feature X. b2i = (liY(y), ciY(y)), "i e {1, . . . , s} stands for the IFV
value of the symptom to the linguistic label ith of characteristic Y.
chi = (liD(Dh-2), ciD(Dh-2)), "i e {1, . . . , s}, "h e {3, . . . , t} is the IFV value
of the disease to the linguistic label ith. Each line from the third one
to the last in Eq. (10) is related to a given disease.

Table 3
The extracted SC-IFRS dataset with



being the values to be predicted.

P


S

Ram

Temperatureð0:8; 0:1Þ;
*
+
Headacheð0:6; 0:1Þ
Stomach painð0:2; 0:8Þ;
Coughð0:6; 0:1Þ
Chest painð0:1; 0:6Þ
Temperatureð0:0; 0:8Þ;
*
+
Headacheð0:4; 0:4Þ
Stomach painð0:6; 0:1Þ;
Coughð0:1; 0:7Þ
Chest painð0:1; 0:8Þ
Temperatureð0:8; 0:1Þ;
*
+
Headacheð0:8; 0:1Þ
Stomach painð0:0; 0:6Þ;
Coughð0:2; 0:7Þ
Chest painð0:0; 0:5Þ
Temperatureð0:6; 0:1Þ;
*
+
Headacheð0:5; 0:4Þ

Stomach painð0:3; 0:4Þ;
Coughð0:7; 0:2Þ
Chest painð0:3; 0:4Þ

Mari

Sugu

Somu

D
Viral feverð0:4; 0:1Þ;
+
Malariað0:7; 0:1Þ
Typhoidð0:6; 0:1Þ;
Stomach problemð0:2; 0:4Þ
Chest problemð0:2; 0:6Þ
Viral feverð0:3; 0:5Þ;
*
+
Malariað0:2; 0:6Þ
Typhoidð0:4; 0:4Þ;
Stomach problemð0:6; 0:1Þ
Chest problemð0:1; 0:7Þ

...
e
S t2

...


...
...

ð11Þ

;

ð12Þ

8i 2 f1; . . . ;sg;

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi 

 


ð1Þ
ð2Þ
ð1Þ
ð2Þ
1 À exp À1=2ð liY ðyÞ À liY ðyÞ þ  ciY ðyÞ À ciY ðyÞÞ
1 À expðÀ1Þ

ð13Þ

8i 2 f1;... ;sg;

e
S hi ¼ 1 À

þ

...

...

1 À expðÀ1Þ

e
S 2i ¼ 1 À

ð10Þ

e
S 32
e
S 42

1
e
S 1s
C
e
S 2s C
C
C
e
S 3s C
C;
e

S 4s C
C
C
... A
e
S ts


qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi

 

ð1Þ
ð2Þ
ð1Þ
ð2Þ
1 À exp À1=2  liX ðxÞ À liX ðxÞ þ  ciX ðxÞ À ciX ðxÞ

1

a11

...

where,

e
S 1i ¼ 1 À

Definition 6 (Son & Thong, 2015). An intuitionistic fuzzy matrix

(IFM) Z in MC-IFRS is defined as,

e
S 12
e
S 22


qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

 

  cð1Þ ðDhÀ2 Þ À cð2Þ ðDhÀ2 Þ
1 À exp À1=2  lð1Þ
lð2Þ
iD ðDhÀ2 Þ À
iD ðDhÀ2 Þ þ 
iD
iD

1 À expðÀ1Þ

;

ð14Þ

8i 2 f1; . . . ; sg; 8h 2 f3; . . . ;tg:

Definition 8 (Son & Thong, 2015). Suppose that Z 1 and Z 2 are two
IFM in MC-IFRS. The intuitionistic fuzzy similarity degree (IFSD)

between Z 1 and Z 2 is
s
s
t X
s
X
X
X
SIMðZ 1 ; Z 2 Þ ¼ a w1i e
whi e
S 1i þ b w2i e
S 2i þ v
S hi ;
i¼1

i¼1

ð15Þ

h¼3 i¼1

e
where
and
S
is
the
IFSM
between
Z1

Z 2 : W ¼ ðwij Þð8i 2 f1; . . . ; tg; 8j 2 f1; . . . ; sg) is the weight matrix
of IFSM between Z 1 and Z 2 satisfying,
s
X
w1i ¼ 1;

s
X
w2i ¼ 1;

s
X
whi ¼ 1;

i¼1

i¼1

i¼1

8h 2 f3; . . . ; tg;

a þ b þ v ¼ 1:

ð16Þ
ð17Þ

*




Definition 9 (Son & Thong, 2015). The formulas to predict the values of linguistic labels of patient P u ð8u 2 f1; . . . ; ngÞ to symptom
Sj ð8j 2 f1; . . . ; mgÞ according to diseases ðD1 ; D2 ; . . . ; Dk Þ in MCIFRS are:

lPiDu ðDj Þ ¼

Pn

Pv

SIMðPu ; Pv Þ Â liD ðDj Þ
v ¼1P
;

8i 2 f1; . . . ; sg;

n

v ¼1 SIMðP u ; P v Þ

8j 2 f1; . . . ; kg; 8u 2 f1; . . . ; ng;


cPiDu ðDj Þ ¼

Pn

ð18Þ

Pv


SIMðPu ; Pv Þ Â ciD ðDj Þ
v ¼1P
;
n

v ¼1 SIMðP u ; P v Þ

8j 2 f1; . . . ; kg; 8u 2 f1; . . . ; ng:

8i 2 f1; . . . ; sg;
ð19Þ


3687

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Example 1. We illustrate the steps of IFCF by an example in Son
and Thong (2015). Assume that the system has four patients
namely P = {Ram, Mari, Sugu, Somu}, five symptoms S = {Temperature, Headache, Stomach-pain, Cough, Chest-pain} and five diseases D = {Viral-Fever, Malaria, Typhoid, Stomach, Heart}. The
relation between the patients and the symptoms is illustrated in
Table 1. The training dataset is demonstrated in Table 2 where ⁄
values in this table are needed to be predicted. Motivated by Definition 4, we extract the results in Table 3 from those in Tables 1
and 2.

From Definition 8 and a ¼ 0; b ¼ c ¼ 1=2; w1i ¼ w2i ¼
w3i ¼ 0:2, the IFSD between Sugu (Somu) and Ram & Mari are
shown below.


IFSDðSugu; RamÞ ¼ 0:87;

ð20Þ

IFSDðSugu; MariÞ ¼ 0:57;

ð21Þ

IFSDðSomu; RamÞ ¼ 0:83;

ð22Þ

IFSDðSomu; MariÞ ¼ 0:58:

ð23Þ

Table 5
The pseudo-code of DPFCM.
Distributed Picture Fuzzy Clustering Method (DPFCM)
I:
– Data X whose number of elements (N) in r dimensions
– Number of clusters: C
– Number of peers: P þ 1
– Fuzzifier m
– Threshold e > 0
– Parameters: c; a1 ; a2 ; a; max Iter
n
o n
o
O:

V ljh jl ¼ 1; P; j ¼ 1; C; h ¼ 1; r ;
ðulkj ; glkj ; nlkj Þjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; C
DPFCM
1S:

n

o
wljh jl ¼ 1; P; j ¼ 1; C; h ¼ 1; r :

Initialization:
– Set the number of iterations: t ¼ 0
– Set Dlijh ðtÞ ¼ hlijh ðtÞ ¼ 0, (8i–l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; r)
– Randomize fðulkj ðtÞ; glkj ðtÞ; nlkj ðtÞÞjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; Cg satisfying (31)
– Set wljh ðtÞ ¼ 1=rðl ¼ 1; P; j ¼ 1; C; h ¼ 1; r)

2S:

Calculate cluster centers V ljh ðtÞ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; r) from ðulkj ðtÞ; glkj ðtÞ; nlkj ðtÞÞ; wljh ðtÞ and hlijh ðtÞ by (39)

3S:

Calculate attribute-weights wljh ðt þ 1Þ; ðl ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ from ðulkj ðtÞ; glkj ðtÞ; nlkj ðtÞÞ; V ljh ðtÞ and Dlijh ðtÞ by (41)

4S:

Send fDlijh ðtÞ; hlijh ðtÞ; V ljh ðtÞ; wljh ðt þ 1Þji; l ¼ 1; P; i–l; k ¼ 1; Y l ; j ¼ 1; Cg to Master

5M:


Calculates fDlijh ðt þ 1Þ; hlijh ðt þ 1Þji; l ¼ 1; P; i–l; k ¼ 1; Y l ; j ¼ 1; Cg by (38) and (40) and send them to Slave peers

6S:

Calculate cluster centers V ljh ðt þ 1Þ, (l ¼ 1; P; j ¼ 1; C; h ¼ 1; r) from ðulkj ðtÞ; glkj ðtÞ; nlkj ðtÞÞ; wljh ðt þ 1Þ and hlijh ðt þ 1Þ by (39)

7S:

Calculate positive degrees fulkj ðt þ 1Þjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; Cg from ðglkj ðtÞ; nlkj ðtÞÞ; wljh ðt þ 1Þ and V ljh ðt þ 1Þ by (37)

8S:

Compute neutral degrees fglkj ðt þ 1Þjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; Cg from ðulkj ðt þ 1Þ; nlkj ðtÞÞ; wljh ðt þ 1Þ and V ljh ðt þ 1Þ by (42)

9S:

Calculate refusal degrees fnlkj ðt þ 1Þjl ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; Cg from ðulkj ðt þ 1Þ; glkj ðt þ 1ÞÞ; wljh ðt þ 1Þ and V ljh ðt þ 1Þ by (43)

10S:

If maxl fmaxfkulkj ðt þ 1Þ À ulkj ðtÞk; kglkj ðt þ 1Þ À glkj ðtÞk; knlkj ðt þ 1Þ À nlkj ðtÞkgg < e or t > max Iter then stop the algorithm,
Otherwise set t ¼ t þ 1 and return Step 3S.

S: Operations in Slave peers.
M: Operations in the Master peer.

Fig. 2. The working flow of the hybrid model – HIFCF.


3688


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 3. MAE values of algorithms by 2-fold cross validation.

Fig. 4. MAE values of algorithms by 3-fold cross validation.

Next, use Definition 9 to calculate the predictive IFM results of Sugu
and Somu.

Viral feverð0:49; 0:38Þ;
*
DiseaseðSuguÞ ¼

Malariað0:52; 0:22Þ

+

Typhoidð0:36; 0:52Þ;

;

ð24Þ

Stomach problemð0:40; 0:34Þ

2.2. Distributed Picture Fuzzy Clustering Method

Chest problemð0:10; 0:68Þ


Son (2015) has proposed a novel Distributed Picture Fuzzy Clustering Method on picture fuzzy sets so-called DPFCM. Firstly, we
raise the definition of picture fuzzy sets.

Viral feverð0:47; 0:39Þ;
*
DiseaseðSomuÞ ¼

Malariað0:52; 0:22Þ

+

Typhoidð0:36; 0:51Þ;

:

Stomach problemð0:39; 0:47Þ
Chest problemð0:10; 0:68Þ

Based on the recommendation function of Definition 4 and Eqs. (24)
and (25), we recommend the disease those patients suffer the most
as in Table 4.
From this table, we conclude that Sugu and Somu both suffer
from the Malaria.

ð25Þ

Definition 10. A Picture Fuzzy Set (PFS) (Cuong & Kreinovich, 2013)
in a non-empty set X is,

È

É
A_ ¼ hx; lA_ ðxÞ; gA_ ðxÞ; cA_ ðxÞijx 2 X ;

ð26Þ


3689

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 5. MAE values of algorithms by 4-fold cross validation.

Fig. 6. MAE values of algorithms by 5-fold cross validation.

where lA_ ðxÞ is the positive degree of each element x 2 X; gA_ ðxÞ is the
neutral degree and cA_ ðxÞ is the negative degree satisfying the
constraints,

lA_ ðxÞ; gA_ ðxÞ; cA_ ðxÞ 2 ½0; 1Š; 8x 2 X;

ð27Þ

0 6 lA_ ðxÞ þ gA_ ðxÞ þ cA_ ðxÞ 6 1;

ð28Þ

8x 2 X:

The refusal degree of an element is calculated as
nA_ ðxÞ ¼ 1 À ðlA_ ðxÞ þ gA_ ðxÞ þ cA_ ðxÞÞ; 8x 2 X. In cases nA_ ðxÞ ¼ 0 PFS

returns to intuitionistic fuzzy sets (IFS) (Atanassov, 1986), and
when both gA_ ðxÞ ¼ nA_ ðxÞ ¼ 0, PFS returns to fuzzy sets (FS) (Zadeh,
1965).
In DPFCM, the communication model is the facilitator or the
Master–Slave model having a Master peer and P Slave peers, and
each Slave peer is allowed to communicate with the Master only.
Each Slave peer has a subset of the original dataset X consisting

of N data points in r dimensions. We call the subset Y j ðj ¼ 1; PÞ
PP
and [Pj¼1 Y j ¼ X;
j¼1 jY j j ¼ N. The number of dimensions in a subset is exactly the same as that in the original dataset. The clustering
problem is to divide the dataset X into C groups satisfying the
objective function below.



Yl X
P X
C
X
l¼1 k¼1 j¼1

ulkj
1 À glkj À nlkj

!m

r
X


wljh kX lkh À V ljh k2

h¼1

P X
C X
r
X
þc
wljh log wljh ! min;

ð29Þ

l¼1 j¼1 h¼1

where ulkj ; glkj and nlkj are the positive, the neutral and the refusal
degrees of data point kth to cluster jth in the Slave peer lth. This
reflects the clustering in the PFS set expressed through Definition
10. wljh is the attribute-weight of attribute hth to cluster jth in the


3690

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 7. MAE values of algorithms by 6-fold cross validation.

Fig. 8. MAE values of algorithms by 7-fold cross validation.


Slave peer lth. V ljh is the center of cluster jth in the Slave peer lth
according to attribute hth. X lkh is the kth data point of the Slave peer
lth according to attribute hth. m and c are the fuzzifier and a positive scalar, respectively. The constraints for (29) are shown below.

ulkj ; glkj ; nlkj 2 ½0; 1Š;

ð30Þ

ulkj þ glkj þ nlkj 6 1;

ð31Þ

C
X

ulkj
1 À glkj À nlkj

j¼1
C 
X
j¼1

n
glkj þ lkj
C

!
¼ 1;


ð32Þ


¼ 1;

ð33Þ

r
X
wljh ¼ 1;

ð34Þ

h¼1

V ljh ¼ V ijh ;

ð8i – l; i; l ¼ 1; PÞ

ð35Þ

wljh ¼ wijh

ð8i–l; i; l ¼ 1; PÞ

ð36Þ

The clustering model in Eqs. (29)–(36) relies on the principles of the
PFS set and the facilitator model. By using the Lagranian method
and the Picard iteration, the optimal solutions of this model are

shown as in Eqs. (37)–(43).

1 À glkj À nlkj
ulkj ¼ 
 1 ; ð8l ¼ 1;P; k ¼ 1;Y l ; j ¼ 1;CÞ;
Pr
PC
w kX lkh ÀV ljh k2 mÀ1
h¼1 ljh
P
r
2
i¼1
h¼1

wlih kX lkh ÀV lih k

ð37Þ


3691

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 9. MAE values of algorithms by 8-fold cross validation.

Fig. 10. MAE values of algorithms by 9-fold cross validation.

hlijh ¼ hlijh þ a1 ðV ljh À V ijh Þ;


ð8i – l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;
ð38Þ


!
PY l  ulkj m
P
exp À 1c Â
kX lkh À V ljh k2 þ c þ 2 Pi¼1 Dlijh
k¼1 1Àglkj Ànlkj
i–l

! ;
wljh ¼
PY l  ulkj m
P
Pr
2
1
0
0
kX lkh À V ljh k þ c þ 2 Pi¼1 Dlijh0
h0 ¼1 exp À c Â
k¼1 1Àg Àn
lkj

lkj

ð8l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;


ð41Þ

ð8l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;

PY l 
V ljh ¼

m
P
ulkj
wljh X lkh À Pi¼1 hlijh
k¼1 1Àglkj Ànlkj
i–l
PY l  ulkj m
w
ljh
k¼1 1Àglkj Ànlkj

i–l

;

glkj ¼ 1 À nlkj þ
ð39Þ

CÀ1
C

PC


PC


ulkj Pr

i¼1 ulki

i¼1 nlki

wlih kX lkh ÀV lih k2
h¼1 w kX ÀV k2
ljh
lkh
ljh

;
1
mþ1

ð8l ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; CÞ;

Dlijh ¼ Dljih þ a2 ðwljh À wijh Þ;

ð8i–l; i; l ¼ 1; P; j ¼ 1; C; h ¼ 1; rÞ;
ð40Þ

ð42Þ
1=a

nlkj ¼ 1 À ðulkj þ glkj Þ À ð1 À ðulkj þ glkj Þa Þ

ð8l ¼ 1; P; k ¼ 1; Y l ; j ¼ 1; CÞ:

;
ð43Þ


3692

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 11. MAE values of algorithms by 10-fold cross validation.

Fig. 12. The computational time of algorithms by various folds.

Details of the DPFCM algorithm (Son, 2015) are shown in the
pseudo-code below.
DPFCM has advanced the clustering quality of the relevant algorithms by experiments on the benchmark UCI Machine Learning
Repository datasets (Son, 2015).
2.3. Hybrid Intuitionistic Fuzzy Collaborative Filtering
Consider the IFSD function in Eq. (15) of Definition 8. We
must remind that IFSD could be enhanced by the integration
with the information of possibility of patients belonging to clusters specified by a fuzzy clustering method. That is to say, if we
know the new patient belongs to which group then the similarities of this patient with others in the group should be given a
high influence in the calculation of IFSD. Thus, in the new algo-

rithm – Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF),
DPFCM expressed in Table 5 is used to classify the patients into
N C groups according to the relations information of patients. The
relations are determined by the relationship between patients
and symptoms as in Table 1. They are defuzzified into crisp values before using the clustering algorithm. For some crisp datasets such as HEART (University of California, 2007), which is

then used later in the experiments, the relations are specified
by all attributes of the dataset except the class attribute. Notice
that in this algorithm, some parameters are set up for the
medical diagnosis as follows: P ¼ 0; c ¼ a1 ¼ a2 ¼ 1; a ¼ 0:5;
max Iter ¼ 1000; m ¼ 2; e ¼ 0:001. Then, the possibility of a
patient belonging to a certain cluster expressed in Eqs. (44)
and (45) is used to calculate the similarity degrees between
users.


3693

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 13. MAE values of algorithms with the cardinalities of testing being 10.

Fig. 14. MAE values of algorithms with the cardinalities of testing being 20.

PROðj; kÞ ¼ 1 À

CSðj; kÞ
;
MaxCSði; kÞ

ð44Þ

P
CSðj; kÞ ¼ 1 À

i i

iXjV k

kX j kkV k k

:

ð45Þ

where PROðj; kÞ is the possibility of patient j belonging to the cluster
k; CSðj; kÞ is the counter-similarity between the patient j and the cluster k calculated in Eq. (45) with X j and V k being the patient j and the
center of cluster k respectively. Based upon the fPROðj; kÞg information, we calculate the similarity degrees between users in Eq. (46).
C
1 X
SIMða; bÞ ¼
ðPROða; iÞ À PROðaÞÞðPROðb; iÞ À PROðbÞÞ;
group
NC i¼1

SIMða; bÞ ¼ SIM ða; bÞ Â ð1 À kÞ þ SIMða; bÞ Â k;
history

group

ð47Þ

where k 2 ½0; 1Š is an adjustable coefficient. Clearly, we recognize
that the similarity degrees between users derived from the picture
fuzzy clustering are supplemented into IFSD to give the final similarity between patients. Thus, higher accuracy of prediction is
achieved with the integrated approach than other existing ones.
The overall process is illustrated in Fig. 2.

2.4. Theoretical analyses of the HIFCF algorithm
We clearly recognize the following advantages of HIFCF.

N

ð46Þ

where PROðaÞ is the mean value of fPROða; kÞg; 8k ¼ 1; N C . Let us
denote the IFSD degree in Eq. (15) as SIMhistory ða; bÞ. The final similarity degree is calculated in Eq. (47).

(a) HIFCF gives better accuracy of prediction than other relevant
methods. Apparently, it supplements the similarity degree
conducted from the picture fuzzy clustering method so that
better reflection of analogous patients to the considered one
is obtained.


3694

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 15. MAE values of algorithms with the cardinalities of testing being 30.

Fig. 16. MAE values of algorithms with the cardinalities of testing being 40.

(b) HIFCF does not take much computational time in comparison with IFCF and other existing methods. In fact, the clustering algorithm is executed one time only to determine
the groups of users so that HIFCF is little slower than IFCF
and other relevant clustering algorithms.
(c) Easy parameters control through the adjustable coefficient –
k 2 ½0; 1Š. That is to say, we can reduce the impact of a specific similarity degree through the use of this coefficient.

(d) Easy to implement and inherit from the existing methods.
Obviously, HIFCF is inherited mostly from IFCF but with the
different similarity degree. The steps of the algorithm are
clear and simple so that they can be implemented quickly.
(e) Foster more advanced works on designing general similarity
degrees to the IFRS to achieve high accuracy of prediction as
in this research.

3. Evaluation
3.1. Experimental design
In this part, we describe the experimental environments such
as,
 Experimental tools: We have implemented the proposed hybrid
algorithm – HIFCF in addition to IFCF (Son & Thong, 2015) and
the typical standalone methods of IFS such as De et al. (2001),
Szmidt and Kacprzyk (2004), Samuel and Balamurugan (2012)
and RS such as Davis et al. (2008) and Hassan and Syed
(2010) in PHP programming language and executed them on a
PC Intel(R) core(TM) 2 Duo CPU T6400 @ 2.00 GHz 2 GB RAM.
The results are taken as the average value of 50 runs.


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

3695

Fig. 17. MAE values of algorithms with the cardinalities of testing being 50.

Fig. 18. MAE values of algorithms with the cardinalities of testing being 60.


 Evaluation indices: Mean Absolute Error (MAE) and the computational time.
 Datasets: The benchmark medical diagnosis dataset namely
HEART from UCI Machine Learning Repository (University of
California, 2007) consisting of 270 patients characterized by
13 attributes.
 Cross validation: The cross-validation method for the experiments is the k-fold validation with k from 2 to 10. Besides testing
with the k-fold validation, the random experiments with the cardinalities of the testing being from 10 to 100 random elements
are also performed. In order to validate the results with accurate
classes, the intuitionistic defuzzification method (Albeanu &
Popentiu-Vladicescu, 2010) is used for experimental algorithms.
 Parameter setting: N C ¼ 2; k ¼ 0:4 and the weights in IFSD are
set up as in Example 1.

 Objective:
& To evaluate HIFCF in comparison with the relevant algorithms in terms of accuracy through evaluation indices.
& To evaluate HIFCF by various parameters.
3.2. Assessment
In this section, we compare HIFCF with other algorithms in
terms of accuracy and computational time by various numbers of
folds determined by the cross-validation method. The results are
illustrated from Figs. 3–12.
Obviously, we recognize that MAE values of HIFCF are better
than those of other algorithms in all cases of folds. The average
MAE value of HIFCF by folds is 0.395 whilst those of IFCF, DAVIS,
HASSAN, DE, SAMUEL and SZMIDT are 0.491, 0.495, 0.487, 0.481,
0.519 and 0.656, respectively. HIFCF takes little computational


3696


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 19. MAE values of algorithms with the cardinalities of testing being 70.

Fig. 20. MAE values of algorithms with the cardinalities of testing being 80.

time to process, i.e. 2.68 s (sec) on average. It is longer than those
of other algorithms with the numbers of the list above being 2.49,
2.76, 1.18, 0.10, 0.08 and 0.09, respectively. In what follows, we
continue to validate these algorithms by the random experiments.
The results described from Figs. 13–23 also reaffirm the findings
above with the average MAE value of HIFCF being smaller than
those of other algorithms. The MAE values of algorithms in the
order above are 0.404, 0.492, 0.495, 0.488, 0.486, 0.514 and
0.486, respectively. The average time of algorithms is 2.63, 1.26,
1.84, 0.26, 0.06, 0.06 and 0.06, respectively.

3.3. Validation of HIFCF by parameters
In this section, we also made other experiments of HIFCF by various numbers of clusters – N C and various coefficients – k. The
results are shown from Figs. 24–27. From the achieved results,
the recommendations are choosing small number of clusters, e.g.
3 in Figs. 24 and 25 to get the best MAE value and reasonable computational time. Analogously, the coefficients- k should be smaller
than 0.5, ideally in the ranges [0.1, 0.2] and [0.4, 0.5] as expressed in
Figs. 26 and 27.


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

3697


Fig. 21. MAE values of algorithms with the cardinalities of testing being 90.

Fig. 22. MAE values of algorithms with the cardinalities of testing being 100.

4. Conclusions
In this paper, we concentrated on improving the accuracy of
prediction in medical diagnosis of the health care support system.
Medical diagnosis is regarded as the determination of diseases
could be found from a list of measured symptoms of a patient as
well as the most acquiring disease among them. Since the relations
between the patients – the symptoms and the symptoms – the diseases are often vague, imprecise and uncertain, most of the
machine learning methods failed to achieve high accuracy of prediction with real medical diagnosis datasets. Due to these reasons,
a combination of fuzzy sets and a machine learning method is a
good choice to eliminate these disadvantages and those of the

relevant works using standalone fuzzy sets and recommender systems. Intuitionistic fuzzy recommender systems (IFRS) were such
the combination, which resulted in better accuracy of prediction
than the relevant standalone methods constructed on either fuzzy
sets or recommender systems only. IFRS used IFSD to calculate the
similarity between two patients. This measure could be enhanced
by the integration with the information of possibility of patients
belonging to clusters specified by a fuzzy clustering method.
Therefore, our contribution in this paper was a novel hybrid model
between picture fuzzy clustering and intuitionistic fuzzy recommender systems for medical diagnosis so-called Hybrid Intuitionistic Fuzzy Collaborative Filtering (HIFCF), which makes uses of a
newest picture fuzzy clustering method namely DPFCM to classify


3698

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701


Fig. 23. The computational time of algorithms by various cardinalities of testing.

Fig. 24. MAE values of HIFCF by various numbers of clusters.

the patients into some groups according to the relations information of patients. Then, the possibility of a patient belonging to a
certain cluster is used to calculate the similarity degrees between
users. They are supplemented into IFSD to give the final similarity
between patients. This is the theoretical contribution of the paper
in Expert and Intelligent Systems compared to those in related
ones.
In the experiments, we have validated the new hybrid algorithm - HIFCF on the benchmark medical diagnosis dataset namely
HEART from UCI Machine Learning Repository consisting of 270
patients characterized by 13 attributes under different cross validation methods and parameters settings. The new algorithm was
compared with some relevant works such as the intuitionistic

fuzzy recommender systems and the standalone methods of intuitionistic fuzzy sets such as De, Biswas & Roy, Szmidt & Kacprzyk,
Samuel & Balamurugan and recommender systems, e.g. Davis
et al. and Hassan & Syed. The findings from the research are: (i)
HIFCF is better than other relevant methods in terms of accuracy
with the average mean absolute error being 0.4; (ii) HIFCF is stable
through various cross validation methods and parameters; (iii)
suitable parameters of HIFCF are: small number of clusters and
the coefficients which is smaller than 0.5; (iv) the computational
time of HIFCF is larger than those of other algorithms but is
acceptable.
The insightful and practical implications of the proposed
research work could be interpreted as follows. Firstly, this paper



N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

3699

Fig. 25. The computational time of HIFCF by various numbers of clusters.

Fig. 26. MAE values of HIFCF by various values of lamda.

presented a know-how method of extending the similarity degree
– IFSD used in intuitionistic fuzzy recommender systems by additional information of clusters. This could somehow guide a variety
of researches involving this kind of extension by other additional
information or different types of similarity functions. Secondly,
enhancing the accuracy of medical diagnosis by the new hybrid
method between picture fuzzy clustering and intuitionistic fuzzy
recommender systems as in this research work guarantees the
development of the health care support system. Thirdly, the theoretical contribution of this paper could expand a minor research
direction about picture fuzzy clustering and intuitionistic fuzzy
recommender systems that are application-oriented.
One of the limitations of this research is the time complexity of
the HIFCF algorithm. Even though the executing time of HIFCF is

approximately 2.68 s for a given dataset but in the context of the
practical health care support system using large and multi-dimensional datasets, it should be accelerated and fasten up. Thus, one
further work of this theme could investigate the parallel processing
of the hybrid model. Next, another research limitation is the capability of the hybrid algorithm when dealing with a new patient.
That is to say, if a new patient migrates into the system then the
clustering algorithm must re-run again for the whole dataset. This
could take lots of time and cannot be acceptable especially in the
context above. Indeed, developing a new semi-supervised hybrid
model taken into account this situation is our second further work.

Some remain further research directions of this article could be: (i)
building fuzzy rules derived from both picture fuzzy clustering and
intuitionistic fuzzy recommender systems to make the prediction


3700

N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701

Fig. 27. The computational time of HIFCF by various values of lamda.

of items; (ii) proposing a new algorithm to deal with the imbalanced and missing value medical diagnosis datasets; (iii) Applying
the hybrid algorithm for other group decision making problems.
Acknowledgments
The authors are greatly indebted to the editor-in-chief, Prof. B.
Lin and anonymous reviewers for their comments and their valuable suggestions that improved the quality and clarity of paper.
This work is sponsored by the NAFOSTED under contract No.
102.05-2014.01.
References
Abbass, H. A. (2002). An evolutionary artificial neural networks approach for breast
cancer diagnosis. Artificial Intelligence in Medicine, 25(3), 265–281.
Agarwal, M., Hanmandlu, M., & Biswas, K. K. (2011). Generalized intuitionistic fuzzy
soft set and its application in practical medical diagnosis problem. Proceedings
of 2011 IEEE international conference on fuzzy systems (pp. 2972–2978).
Ahn, J. Y. (2014). A comparison of distance measures for medical diagnosis. ICIC
Express letters. Part B, Applications: An International Journal of Research and
Surveys, 5(3), 871.
Ahn, J. Y., Han, K. S., Oh, S. Y., & Lee, C. D. (2011). An application of interval-valued
intuitionistic fuzzy sets for medical diagnosis of headache. International Journal
of Innovative Computing, Information and Control, 7(5), 2755–2762.

Albeanu, G., & Popentiu-Vladicescu, F. L. (2010). Intuitionistic fuzzy methods in
software reliability modelling. Journal of Sustainable Energy, 1(1), 30–34.
Anbarasi, M., Anupriya, E., & Iyengar, N. C. S. N. (2010). Enhanced prediction of heart
disease with feature subset selection using genetic algorithm. International
Journal of Engineering Science and Technology, 2(10), 5370–5376.
Atanassov, K. T. (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems, 20(1),
87–96.
Basu, R., Fevrier-Thomas, U., & Sartipi, K. (2011). Incorporating hybrid CDSS in
primary care practice management. McMaster eBusiness Research Centre.
Bora, M., Bora, B., Neog, T. J., & Sut, D. K. (2014). Intuitionistic fuzzy soft matrix
theory and its application in medical diagnosis. Annals of Fuzzy Mathematics and
Informatics, 7(1), 143–153.
Bourgani, E., Stylios, C. D., Manis, G., & Georgopoulos, V. C. (2014). Time dependent
fuzzy cognitive maps for medical diagnosis. In Artificial intelligence: Methods and
applications (pp. 544–554). Springer International Publishing.
Cuong, B. C., & Kreinovich, V. (2013). Picture fuzzy sets – a new concept for
computational intelligence problems. In Proceedings of 2013 third world congress
on information and communication technologies (pp. 1–6).
Cuong, B. C., Son, L. H., & Chau, H. T. M. (2010). Some context fuzzy clustering
methods for classification problems. In Proceedings of the 2010 ACM symposium
on information and communication technology (pp. 34–40).
Das, S., & Kar, S. (2014). Group decision making in medical system: An intuitionistic
fuzzy soft set approach. Applied Soft Computing, 24, 196–211.

Davis, D. A., Chawla, N. V., Blumm, N., Christakis, N., & Barabási, A. L. (2008).
Predicting individual disease risk based on medical history. In Proceedings of the
17th ACM conference on Information and knowledge management (pp. 769–778).
De, S. K., Biswas, R., & Roy, A. R. (2001). An application of intuitionistic fuzzy sets in
medical diagnosis. Fuzzy Sets and Systems, 117(2), 209–213.
Duan, L., Street, W. N., & Xu, E. (2011). Healthcare information systems: Data mining

methods in the creation of a clinical recommender system. Enterprise
Information Systems, 5(2), 169–181.
Foster, D., McGregor, C., & El-Masri, S. (2005). A survey of agent-based intelligent
decision support systems to support clinical management and research. In
Proceedings of the second international workshop on multi-agent systems for
medicine, computational biology, and bioinformatics (pp. 16–34).
Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y., & De Moor, B. (2006).
Predicting the prognosis of breast cancer by integrating clinical and microarray
data with Bayesian networks. Bioinformatics, 22(14), e184–e190.
Hassan, S., & Syed, Z. (2010). From netflix to heart attacks: Collaborative filtering in
medical datasets. In Proceedings of the first ACM international health informatics
symposium (pp. 128–134).
Hosseini, R., Ellis, T., Mazinani, M., & Dehmeshki, J. (2011). A genetic fuzzy approach
for rule extraction for rule-based classification with application to medical
diagnosis. In Proceedings of the European conference on machine learning and
principles and practice of knowledge discovery in databases (ECML PKDD) (pp. 05–
09).
Kala, R., Janghel, R. R., Tiwari, R., & Shukla, A. (2011). Diagnosis of breast cancer by
modular evolutionary neural networks. International Journal of Biomedical
Engineering and Technology, 7(2), 194–211.
Kampouraki, A., Vassis, D., Belsis, P., & Skourlas, C. (2013). E-doctor: A web based
support vector machine for automatic medical diagnosis. Procedia-Social and
Behavioral Sciences, 73, 467–474.
Khatibi, V., & Montazer, G. A. (2009). Intuitionistic fuzzy set vs. fuzzy set
application in medical pattern recognition. Artificial Intelligence in Medicine,
47(1), 43–52.
Kyriacou, E. C., Pattichis, C. S., & Pattichis, M. S. (2009). An overview of recent health
care support systems for eEmergency and mHealth applications. In Proceedings
of the IEEE annual international conference of the engineering in medicine and
biology society 2009 (pp. 1246–1249).

McCormick, T. H., Rudin, C., & Madigan, D. B. (2011). A hierarchical model for
association rule mining of sequential events: An approach to automated
medical symptom prediction. Annals of Applied Statistics, 1–19.
Meenakshi, A. R., & Kaliraja, M. (2011). An application of interval valued fuzzy
matrices in medical diagnosis. International Journal of Mathematical Analysis,
5(36), 1791–1802.
Meisamshabanpoor & Mahdavi, M. (2012). Implementation of a recommender
system on medical recognition and treatment. International Journal of eEducation, e-Business, e-Management and e-Learning, 2(4), 315–318.
Moein, S., Monadjemi, S. A., & Moallem, P. (2009). A novel fuzzy-neural based
medical diagnosis system. International Journal of Biological and Medical Sciences,
4(3), 146–150.
Muthuvijayalakshmi, M., Kumar, E., & Venkatesan, P. (2014). TB disease diagnosis
using fuzzy max-min composition technique. Fuzzy Systems, 6(1).
Neog, T. J., & Sut, D. K. (2011). An application of fuzzy soft sets in medical diagnosis
using fuzzy soft complement. International Journal of Computer Applications, 33,
30–33.


N.T. Thong, L.H. Son / Expert Systems with Applications 42 (2015) 3682–3701
Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S. (2014). Medical diagnosis by
fuzzy standard additive model with wavelets. In Proceedings of the 2014 IEEE
international conference on fuzzy systems (pp. 1937–1944).
Own, C. M. (2009). Switching between type-2 fuzzy sets and intuitionistic fuzzy
sets: An application in medical diagnosis. Applied Intelligence, 31(3), 283–291.
Parthiban, L., & Subramanian, R. (2008). Intelligent heart disease prediction system
using CANFIS and genetic algorithm. International Journal of Biological,
Biomedical and Medical Sciences, 3(3).
Rajalakshmi, K., Mohan, S. C., & Babu, S. D. (2011). Decision support system in
healthcare industry. International Journal of Computer Applications, 26(9), 42–44.
Ricci, F., Rokach, L., & Shapira, B. (2011). Introduction to recommender systems

handbook. US: Springer (pp. 1–35).
Roberts, L. M., Kahn, E., & Haddawy, P. (1995). Development of a Bayesian network
for diagnosis of breast cancer. In Proceedings of the IJCAI-95 workshop on building
probabilistic networks, Montréal, Québec, Canada.
Rouse, M. (2014). Clinical decision support system (CDSS). Available at: searchhealthit.techtarget.com/definition/clinical-decision-support-systemCDSS>.
Samuel, A. E., & Balamurugan, M. (2012). Fuzzy max–min composition technique in
medical diagnosis. Applied Mathematical Sciences, 6(35), 1741–1746.
Sanz, J. A., Galar, M., Jurio, A., Brugos, A., Pagola, M., & Bustince, H. (2014). Medical
diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based
classification system. Applied Soft Computing, 20, 103–111.
Shanmugasundaram, P., & Seshaiah, C. V. (2014). An application of intuitionistic
fuzzy technique in medical diagnosis. Australian Journal of Basic & Applied
Sciences, 8(9).
Sharaf-El-Deen, D. A., Moawad, I. F., & Khalifa, M. E. (2014). A new hybrid case-based
reasoning approach for medical diagnosis systems. Journal of Medical Systems,
38(2), 1–11.
Shinoj, T. K., & John, S. J. (2012). Intuitionistic fuzzy multi sets and its application in
medical diagnosis. World Academy of Science, Engineering and Technology, 6,
1418–1421.
Son, L. H. (2014a). Enhancing Clustering quality of geo-demographic analysis using
context fuzzy clustering type-2 and particle swarm optimization. Applied Soft
Computing, 22, 566–584.
Son, L. H. (2014b). HU-FCF: A hybrid user-based fuzzy collaborative filtering
method in recommender systems. Expert Systems with Applications, 41(15),
6861–6870.
Son, L. H. (2014c). Optimizing municipal solid waste collection using chaotic
particle swarm optimization in GIS based environments: A case study at
Danang City, Vietnam. Expert Systems with Applications, 41(18), 8062–8074.
Son, L. H. (2015). DPFCM: A novel distributed picture fuzzy clustering method on

picture fuzzy sets. Expert Systems with Applications, 42(1), 51–66.

3701

Son, L. H., Cuong, B. C., Lanzi, P. L., & Thong, N. T. (2012). A novel intuitionistic fuzzy
clustering method for geo-demographic analysis. Expert Systems with
Applications, 39(10), 9848–9859.
Son, L. H., Cuong, B. C., & Long, H. V. (2013). Spatial interaction–modification model
and applications to geo-demographic analysis. Knowledge-Based Systems, 49,
152–170.
Son, L. H., Lanzi, P. L., Cuong, B. C., & Hung, H. A. (2012). Data mining in GIS: A novel
context-based fuzzy geographically weighted clustering algorithm.
International Journal of Machine Learning and Computing, 2(3), 235–238.
Son, L. H., Linh, N. D., & Long, H. V. (2014). A lossless DEM compression for fast
retrieval method using fuzzy clustering and MANFIS neural network.
Engineering Applications of Artificial Intelligence, 29, 33–42.
Son, L. H., & Thong, N. T. (2015). Intuitionistic fuzzy recommender systems: An
effective tool for medical diagnosis. Knowledge-Based Systems, 74, 133–150.
Szmidt, E., & Kacprzyk, J. (2001). Intuitionistic fuzzy sets in some medical
applications. In Computational intelligence. Theory and applications
(pp. 148–151). Berlin, Heidelberg: Springer.
Szmidt, E., & Kacprzyk, J. (2003). An intuitionistic fuzzy set based approach to
intelligent data analysis: An application to medical diagnosis. In Recent advances
in intelligent paradigms and applications (pp. 57–70). Physica-Verlag HD.
Szmidt, E., & Kacprzyk, J. (2004). A similarity measure for intuitionistic fuzzy sets
and its application in supporting medical diagnostic reasoning. In Artificial
intelligence and soft computing – ICAISC 2004 (pp. 388–393). Berlin Heidelberg:
Springer.
Tan, K. C., Yu, Q., Heng, C. M., & Lee, T. H. (2003). Evolutionary computing for
knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine,

27(2), 129–154.
Thong, P. H., & Son, L. H. (2014). A new approach to multi-variables fuzzy
forecasting using picture fuzzy clustering and picture fuzzy rules interpolation
method. In Proceeding of sixth international conference on knowledge and systems
engineering (pp. 679–690).
University of California (2007). UCI Repository of Machine Learning Databases.
Available at: < />West, T. A., & Marion, D. W. (2014). Current recommendations for the diagnosis and
treatment of concussion in sport: A comparison of three new guidelines. Journal
of Neurotrauma, 31(2), 159–168.
Xiao, Z., Yang, X., Niu, Q., Dong, Y., Gong, K., Xia, S., et al. (2012). A new evaluation
method based on D–S generalized fuzzy soft sets and its application in medical
diagnosis problem. Applied Mathematical Modelling, 36(10), 4592–4604.
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Zhou, Z. H., & Jiang, Y. (2003). Medical diagnosis with C4. 5 rule preceded by
artificial neural network ensemble. IEEE Transactions on Information Technology
in Biomedicine, 7(1), 37–42.



×