Tải bản đầy đủ (.pdf) (54 trang)

Adaptive neuro fuzzy network for recommendation

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.41 MB, 54 trang )

y
o

c u -tr a c k

.c

ADAPTIVE NEURO-FUZZY NETWORK FOR RECOMMENDATION

In Partial Fulfillment of the Requirements of the Degree of

MASTER OF INFORMATION TECHNOLOGY MANAGEMENT

In Computer Science and Engineer

By

Mr. Nguyen Duc Anh

ID: MITM05001

International University - Vietnam National University HCMC

March 2015

.d o

m

o


w

w

w

.d o

C

lic

k

to

bu

y
bu
to
k
lic
C

w

w

w


N

O
W

!

h a n g e Vi
e

N

PD

!

XC

er

O
W

F-

w

m


h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic


c u -tr a c k

ADAPTIVE NEURO-FUZZY NETWORK FOR RECOMMENDATION

.d o

m

w

o

.c

C

m

o

.d o

w

w

w

w


w

C

lic

k

to

bu

y

N

O
W

!

XC

er

O
W

F-


w

PD

h a n g e Vi
e

!

XC

er

PD

F-

c u -tr a c k

In Partial Fulfillment of the Requirements of the Degree of

MASTER OF INFORMATION TECHNOLOGY MANAGEMENT

In Computer Science and Engineer
By
Mr. Nguyen Duc Anh
ID: MITM05001
International University - Vietnam National University HCMC

March 2015


Under the guidance and approval of the committee, and approved by all its members, this thesis has
been accepted in partial fulfillment of the requirements for the degree.

Approved:

--------------------------------------------Chairperson

------------------------------------------Committee member

--------------------------------------------Committee member

------------------------------------------Committee member

--------------------------------------------Committee member

------------------------------------------Committee member

ii

.c


y
o

c u -tr a c k

.c


Acknowledgments

Throughout my thesis and development process, it is impossible for me to well
complete all my tasks and missions without the support and encouragement from the other
ones.
At first, I would like to thank Dr. Duong Trong Hai. He is always by my side to
support me identify the main ideas of this research this is the most important support for me.
He instructs me to be familiar with data-mining, machine-learning, etc. Moreover, He is
willing to give me helpful advices whenever I have difficulties or troubles with my thesis.
I am grateful to my family, who encourages and motivates me to keep moving
forward.
There are also my colleagues, schoolmates who also support and help me directly and
indirectly; I want to say thank all of them.

iii

.d o

m

o

w

w

w

.d o


C

lic

k

to

bu

y
bu
to
k
lic
C

w

w

w

N

O
W

!


h a n g e Vi
e

N

PD

!

XC

er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC


er

F-

c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k

Plagiarism Statements

I would like to declare that, apart from the acknowledged references, this thesis either
does not use language, ideas, or other original material from anyone; or has not been
previously submitted to any other educational and research programs or institutions. I fully
understand that any writings in this thesis contradicted to the above statement will

automatically lead to the rejection from the MITM program at the International University –
Vietnam National University Hochiminh City.

iv

.d o

m

w

o

.c

C

m

o

.d o

w

w

w

w


w

C

lic

k

to

bu

y

N

O
W

!

XC

er

O
W

F-


w

PD

h a n g e Vi
e

!

XC

er

PD

F-

c u -tr a c k

.c


h a n g e Vi
e

w

N
y

bu
to
k
lic

c u -tr a c k

Copyright Statement

This copy of the thesis has been supplied on condition that anyone who consults it is
understood to recognize that its copyright rests with its author and that no quotation from the
thesis and no information derived from it may be published without the author’s prior
consent.
© Nguyen Duc Anh/MITM05001/2012-2015

v

w

.d o

o

.c

m

C

m


o

.d o

w

w

w

w

w

C

lic

k

to

bu

y

N

O

W

!

XC

er

O
W

F-

w

PD

h a n g e Vi
e

!

XC

er

PD

F-


c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k

Plagiarism Statements ............................................................................................................................ iv
Copyright Statement ............................................................................................................................... v
This Thesis based on Publications .......................................................................................................... x
Abstract .................................................................................................................................................. xi
Chapter 1: Introduction ........................................................................................................................... 1
1.1 Motivation ..................................................................................................................................... 1
1.2 Goals of the Dissertation ............................................................................................................... 1
1.3 Overal Approach ........................................................................................................................... 1
1.4 Related Work ................................................................................................................................ 2
1.5 Thesis Outline ............................................................................................................................... 5
Chapter 2: User Behaviors-based CF Using Neuro-Fuzzy Network ...................................................... 7

2.1 Profile Modeling ........................................................................................................................... 7
2.2 Content-based Filtering Using Neuro-Fuzzy Network ................................................................. 8
Chapter 3. Experiments ......................................................................................................................... 12
3.1 Dataset Introduction .................................................................................................................... 12
3.1.1 Overview .............................................................................................................................. 12
3.1.2 Dataset analysis ........................................................................................................................ 13
3.2 Applied ANFIS to netflix dataset................................................................................................ 14
3.2.1 ANFIS Model....................................................................................................................... 14
3.2.2 Run the testing dataset ......................................................................................................... 25
3.3 Evaluation Methods .................................................................................................................... 27
3.4 Practice and Result ...................................................................................................................... 27
3.4.1 Movie 329 ............................................................................................................................ 28
3.4.2 Movie 30 .............................................................................................................................. 30
3.4.3Movie 2464 ........................................................................................................................... 31
vi

w

.d o

m

Table of Contents

o

.c

C


m

o

.d o

w

w

w

w

w

C

lic

k

to

bu

y

N


O
W

!

XC

er

O
W

F-

w

PD

h a n g e Vi
e

!

XC

er

PD

F-


c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k

3.4.4 Movie 2848 .......................................................................................................................... 33
3.4.5Movie 2548 ........................................................................................................................... 34
3.5 Evaluation results ........................................................................................................................ 36
Chapter 4: Conclusion........................................................................................................................... 38
References ............................................................................................................................................. 39

vii

.d o


m

w

o

.c

C

m

o

.d o

w

w

w

w

w

C

lic


k

to

bu

y

N

O
W

!

XC

er

O
W

F-

w

PD

h a n g e Vi
e


!

XC

er

PD

F-

c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k


Fg 2.1.1 1 Profile generation process .................................................................................... 8
Fg3.1.1 1 Netflix Dataset structure ...................................................................................... 12
Fg3.1.2 1 Rating-scores statistic .......................................................................................... 13
Fg3.1.2 2 The rating-scores comparison for top 10 movies have highest number of rating14
Fg3.2.1 1 The ANFIS’s structure ......................................................................................... 14
Fg3.2.1 2 The main workflow of ANFIS............................................................................... 15
Fg3.2.1 3Sample of user profile Level 1 .............................................................................. 16
Fg3.2.1 4 Sample of user profile Level 2 ............................................................................. 18
Fg3.2.1 5HyperBox dataset and PureBox Dataset where before and after clustered by NCP
18
Fg3.2.1 6 Samples of purebox clusters ................................................................................ 20
Fg3.2.1 7 The Max-Min PureBox......................................................................................... 21
Fg3.2.1 8 The final result of the user profile building steps ................................................ 22
Fg3.2.1 9 Samples of data use to training by Perceptron .................................................... 23
Fg3.3. 1 Distribution of Tranning set and Testing set in dataset ........................................ 28
Fg3.4.2. 1 Comparison between training data set and real dataset of movie 30 ................ 30
Fg3.4.2. 2 Result of 100 samples used to test for movie 30 ................................................. 31
Fg3.4.3. 1 Comparison between training data set and real dataset of movie 2464 ............ 32
Fg3.4.3. 2 Result of 100 samples used to test for movie 2464 ............................................. 32
Fg3.4.4. 1 Comparison between training data set and real dataset of movie 2848 ............ 33
Fg3.4.4. 2 Result of 100 samples used to test for movie 2848 ............................................. 34
Fg3.4.5. 1 Comparison between training data set and real dataset of movie 2548 ............ 35
Fg3.4.5. 2 Result of 100 samples used to test for movie 2548 ............................................. 35
Fg3.5. 1 MAE and RMSE of movies 2464,2548,30,2848,329 .............................................. 36

viii

w

.d o


m

List of Figures

o

.c

C

m

o

.d o

w

w

w

w

w

C

lic


k

to

bu

y

N

O
W

!

XC

er

O
W

F-

w

PD

h a n g e Vi

e

!

XC

er

PD

F-

c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k


Table 3.2.1 1 Samples of W had computed by Perception for Movie 329 ........................... 23
Table 3.2.2.1Predict Rating-scores for 5 userssamples, movie 329 .................................... 26
Table 3.4.1.1 Comparison between training data set and real dataset of movie 329 .......... 28
Table3.4.2. 1 Comparison between training data set and real dataset of movie 30 ............ 30
Table3.4.3. 1 Comparison between training data set and real dataset of movie 2464 ........ 31
Table3.4.4. 1 Comparison between training data set and real dataset of movie 2848 ........ 33
Table3.4.5. 1 Comparison between training data set and real dataset of movie 2548 ........ 34

ix

w

.d o

m

List of Table

o

.c

C

m

o

.d o


w

w

w

w

w

C

lic

k

to

bu

y

N

O
W

!


XC

er

O
W

F-

w

PD

h a n g e Vi
e

!

XC

er

PD

F-

c u -tr a c k

.c



h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k

This Thesis based on Publications
International Conference Publications (Accepted)
Duc Anh Nguyen and Trong Hai Duong, “Video Recommendation Using NeuroFuzzy on Social TV Environment”, International conference on Computer Science,
Applied Mathematics and Applications (ICCSAMA 2015) published in a volume of series
Advances in Intelligent Systems and Computing of Springer Verlag, indexed by ISI
Proceedings, DBLP, Ulrich's, EI-Compendex, SCOPUS, Zentralblatt Math, MetaPress,
Springerlink. Issues in ISI-SCI journals.
International Journal Publications (Submitted)
Trong Hai Duong and Duc Anh Nguyen, “User Behaviors-based Collaborative
Filtering for Video Recommendation Using Ontology-based Neuro-Fuzzy on Social TV”,
ELSEVIER, 03-2015.

x

w


.d o

o

.c

m

C

m

o

.d o

w

w

w

w

w

C

lic


k

to

bu

y

N

O
W

!

XC

er

O
W

F-

w

PD

h a n g e Vi

e

!

XC

er

PD

F-

c u -tr a c k

.c


h a n g e Vi
e

w

N
y
bu
to
k
lic

c u -tr a c k


Abstract
Recommendation systems are systems that seek for prediction and give users
recommendation about products or items that they might be interested in. There are two
common approaches, which have been proposed to perform recommendation system;
they are content-based filtering (CBF) and collaborative filtering (CF). CBF methods are
based on the description of previously preferred items to predict a target user’s rating. On
the other hand, CF methods are based on neighbors’ ratings to predict a target user’s
rating. In this work, we consider recommendation on the context of Social TV (STV).
The watchers/users may either share, comment, rate, or tag videos in which they are
interested in. Each video must be watched and rated by many users. For these
assumptions, we proposed a novel model-based collaborative filtering using a fuzzy
neural network to learn user’s social web behaviors to make video recommendation on
STV. We use Netflix data-set to evaluate the proposed method. The result shown that the
proposed approach is a significant effective method.
Keywords: ANFIS, Ontology, Smart TV, Video, Recommendation system, and Neural
network.

xi

w

.d o

o

.c

m


C

m

o

.d o

w

w

w

w

w

C

lic

k

to

bu

y


N

O
W

!

XC

er

O
W

F-

w

PD

h a n g e Vi
e

!

XC

er

PD


F-

c u -tr a c k

.c


y
o

c u -tr a c k

.c

Chapter 1: Introduction
1.1 Motivation
Recommendation is a subclass of information filtering, which uses data on
past user preferences to predict possible future likes and interests. There are few
approaches which applied in recommendation system such as Collaborativebased, Demographic-based, Content-based, Knowledge –based, Hybrid-based
Recommendation.
Prior collaborative filtering (CF) methods based on neighbors’ ratings to
predict a target user’s rating. A situation that there are no any neighbors, the
traditional CF’s result is gone downhill. To solve the aforementioned problem,
we proposed a novel model-based collaborative filtering using a fuzzy neural
network to learn user’s social web behaviors for video recommendation on STV.
1.2 Goals of the Dissertation
Our goals in this thesis focused on solve the problem of lack of neighbors in
the traditional CF. In that, we predict unknown rating from a target user to a
target video by adjusting users profile and rating-scale values using ANFIS.

1.3 Overal Approach
The idea of the proposed method is to adjust users’ social web behavior to
their owning ratings dual with a target video. In particular, a user profile is
learned by the user’s social web behavior. This user profile is presented by a
vector. For each target video, we collect all users’ profiles who rated on the
target video. Each user’s profile are considered as an input vector and his/her
corresponding rating-score is as output value of the fuzzy neural network. The

1

.d o

m

o

w

w

w

.d o

C

lic

k


to

bu

y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD


!

XC

er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k


.c


y
o

c u -tr a c k

.c

trained neural network is used to predict the rating of a user to the target video.
We use netflix data set to evaluate the proposed method.
1.4 Related Work
The trend for using online social networks to talk about TV programs and to
share their opinions with others, is increasing. This reflected with the
dissemination of platforms designed for Social TV [1]. The NoTube [1] brings
the social web and TV closer to the consumers. The social TV is able to provide
users’ social context that personalize users’ TV program and video with both of
content-based and collaborative-based filtering manners. Content-based filtering
(CBF)[4] relies on the description of previously preferred items of a target user
and generates recommended items with content are similar to those the target
user has preferred in the past without directly relying on the preferences of other
users. Collaborative filtering (CF) [5] relies on the basis of previously preferred
items of a large group of users’ rating information and make recommended items
to a target user based on the items that similar users have preferred in the past,
without relying on any information about the items themselves other than their
ratings. According to algorithms of CF, CF can be grouped into two types:
(a) Memory-based collaborative filtering methods recommend items are
those that were previously preferred by users who share similar preferences as

the target user [6]. These algorithms require all ratings, items, and users to be
stored in memory.
(b) Model-based collaborative filtering methods recommend items based on
models that are trained by using the collection of ratings to identify patterns in
the input data [7]. The memory-based collaborative filtering store the training
data in memory that is delayed until a recommendation is made to the system, as
2

.d o

m

o

w

w

w

.d o

C

lic

k

to


bu

y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!


XC

er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c



y
o

c u -tr a c k

.c

opposed to model-based collaborative filtering, where the system tries to
generalize a model using the training data before recommendation making.
The advantage of memory-based methods is deal with less parameters to be
tuned, while the disadvantage is that the approach cannot deal with data scarcity
in a principled manner [9].
In Social TV, recommendation systems have been developed to help users
access TV programs that are appropriate to their preferences by learning from
viewing history data, mapping social users’ preferences and TV program
attributes [15, 16, 9]. Authors [9] proposed hybrid approach combining contentbased methods with those based on collaborative filtering for TV program
recommendation.
To eliminate the overload computation of collaborative filtering, singular
value decomposition technique [17] is applied in order to reduce the dimension
of the user-item representation, and afterwards, how this low-rank representation
can be employed in order to generate item-based prediction, which has shown a
good behavior in the TV domain. Authors [10] proposed a framework for
adaptive news recommendation in social media by utilizing user’s comments.
User’s comments are collected to build a topic profile using a weighted graph.
To generate the weighted importance of topics, the standard TF/IDF model [11]
and variant of the PageRank algorithms [12] are applied. With the topic profile
constructed, it can be used to select relevant news from a collection of news
articles in the database by constructing a retrieval module using combination of

the strengths of two state-of-the-art news retrieve time factor [13] and language
model [14].

3

.d o

m

o

w

w

w

.d o

C

lic

k

to

bu

y

bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC


er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


y

o

c u -tr a c k

.c

In fact, there are many researches on recommendation systems. One of them
is the research named: “Neural Network Modeling for an Intelligent
Recommendation System Supporting SRM for Universities in Thailand”, [21]
proposed by Kanokwan Kongsakun and Chun Che Fung. This is a
recommendation system proposal, used to predict and recommend the
appropriate courses for students thereby increase their chance of success. Their
proposal is based on students' historic records and final results. The authors used
Neural Network techniques to find the structures and relationships within data
and final GPA of freshmen in subjects of interest. The authors [21] had come to
the conclusion that recommendation system is a useful service.

According to another research named: “A Hybrid Latent Variable Neural
Network Model for Item Recommendation” [22]. The authors [22] proposed
neural network model with latent input variables named Latent Neural Network
(LNN), as a hybrid collaborative filtering of both approaches CF and CBF. The
strong point of LNN is that it addressed the cold-start problem, but the
complexity of LNN requires more time to train than others. In additional, LNN
is capable of modeling higher-order dependencies and nonlinearities in the data;
but in fact the data in MovieLens data-set, Netflix data-set and the similar
datasets are inherently sparse and nonlinear models. Thus, their proposal is not
suitable as well for that kind of data.
Another method proposed by Christina Jianfeng Gao, Patrick Pantel,
Michael Gamon, Xiaodong He, Li Deng [23] named “Modeling Interestingness
with Deep Neural Networks”, this is a recommendation system to recommend

users a target document they may interested in, based on analyzing the

4

.d o

m

o

w

w

w

.d o

C

lic

k

to

bu

y
bu

to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC

er


O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


y
o


c u -tr a c k

.c

documents which they have read. According to this research, the authors [23]
used two interestingness tasks: automatic highlighting and contextual entity
search within their proposal.

Another interesting proposal named: “A Hybrid Movie Recommender
System Based on Neural Networks” [24], in which the authors [24] proposed a
hybrid filtering approach to combine CF and CBF. Their model had archived
overall 82% of successful recommendations, although the authors said it seems
strange that the precision falls as the user has evaluated many movies. They
came up with the final conclusion saying that the reason is as the watcher/user
keeps evaluating movies, it is possible that user has covered a wide range of
movies that share a common characteristic features (Kinds, Stars, Synopsis),
while being totally different and, subsequently, differently evaluated [24].

1.5 Thesis Outline
In this thesis, about which, the introduction in chapter 1 aims to reveal the
problems I have been conducting a research and the parameters included in my
thesis research paper.
The second part is chapter 2 named “User Behaviors-based CF Using NeuroFuzzy Network”. The main purpose of this part is to analyze in detail the
relevant theories such as User modeling, ANFIS, TF/IDF, Perceptron, etc. which
will apply in my thesis research paper.
Chapter 3 is Experiment. This chapter introduces about applying the
proposed novel ANFIS for Video recommendation system and introduces the

5


.d o

m

o

w

w

w

.d o

C

lic

k

to

bu

y
bu
to
k
lic

C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC

er

O
W


F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


y
o

c u -tr a c k


.c

Evaluation methods which I used to evaluate the results, In this thesis, I used
Netflix as a sample dataset.
Finally, chapter 4 is the last one of my thesis report, it presents the
conclusion.

6

.d o

m

o

w

w

w

.d o

C

lic

k

to


bu

y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD


!

XC

er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k


.c


y
o

c u -tr a c k

.c

Chapter 2: User Behaviors-based CF Using Neuro-Fuzzy Network
2.1 Profile Modeling
User profile can be static and dynamic information. In static, user permanent
information such as name, age, sex, educational background etc. is included,
whereas the dynamic user profile, the less permanent characteristics like user’s
current motions, locations are mentioned; however, the user interest which often
changes is mainly included. Here, we consider the profile with only user interest,
which is user’s social web behavior such as user’s posts, comments, share,
ratings, preferences, and tags. The user profile is represented by using a
weighted vector defined as follows:
Definition 1 (Profile Feature).Let
feature pi is defined as follows:

={(

be a profile of an user
), (

),…, (


. The profile
)} is a set of

pairs of concept and its weight.
The process used to generate a user’s profile, which is presented in Fig:
2.1.1.1. The tf /idf weight (term frequency inverse document frequency) is a
weight often used in information retrieval and text mining. This weight is a
statistical measure used to evaluate how important a word is to a document in a
collection or corpus. Here we use traditional vector space model (tf /idf) to
define the feature of the documents [18].

7

.d o

m

o

w

w

w

.d o

C

lic


k

to

bu

y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e


N

PD

!

XC

er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er


F-

c u -tr a c k

.c


y
o

c u -tr a c k

.c

Fg 2.1.1 1 Profile generation process

2.2 Content-based Filtering Using Neuro-Fuzzy Network
We assume that user ui interests in a set

={

,

, …,

} of movies

(he/she watched and made a rating to them). The user’s profile can be considered
as a feature vector:
from movies in


= {(
and

that each movie
{

,

,…,

), (

), ..., (

)}, where

is a genre

generated by using vector space tf/idf. We assume

; j = 1..k also can be interested by n users

. The rating-score of the movie

can be denoted as
is denoted by

with respect to the user


, so the rating-score set of a movie
={

,

,…,

=

with respect to

}. For each movie

, we consider a

black-box-typed model expressing a mathematical relationship between a input
of feature vectors of users in

={

,…,

,

} and a output of the rating-score space,
given data set of corresponding users
, is the feature vector of
from user

to movie


, denoted by {
={

,

as follows: (

user of the data set

,…,
,
and

,

, …,

}, based on a

), i = 1…k where
is rating-score

, as an output. This work can be seen as system-

identifying process, in which the model works as a mathematical function f
expressed by a mapping as follows:
8

.d o


m

o

w

w

w

.d o

C

lic

k

to

bu

y
bu
to
k
lic
C


w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC

er

O
W

F-


w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


y
o

c u -tr a c k

.c


f:
(1)
|

=f(

)

In this paper, the above mathematical model is expressed by a fuzzy-neuron
structure (FNS), which is a combination of a fuzzy inference system (FIS) and a
neural network structure (NNS).
Relating to the FIS, it can be summarized as follows. The FIS is built based
on the algorithm establishing an adaptive neuro-fuzzy system, ENFS [3]. By
using data driven method, the same features or characteristics of the object are
expressed by hyperbox-typed data clusters, which can be considered as a
structure upon which fuzzy sets and membership functions are established to
build the FIS. In the FIS, the fuzzy deducing rules are built based on constituting
clauses depicting the fuzzy relationships typing MISO as following:

=

IF
where

and

=

and … and


=

THEN =

(2)

is language variables expressing the result of clustering process;

= 1…M is maximum membership value of the

sample in

-labeled data

clusters, which is used to establish the corresponding hyperbox value
sample;

is constituting rule; and

sample.

We consider a set of the patterns

covered by the

h min-max hyperbox

.The

is determined using two vertexes, the max vertex


, …,

] and the min vertex

|( )

of this

is the constituting value, which is used to

calculate the predicting value of the

max(

;k

=[

and

,

, …,
. If

], where

,
=


consists of the

patterns associated with the cluster labeling m only, then the
9

=[

will be

.d o

m

o

w

w

w

.d o

C

lic

k


to

bu

y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD


!

XC

er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k


.c


y
o

c u -tr a c k

.c

considered as a pure hyperbox labeling m, and denoted

.An HB can be

considered as a crisp frame on which different types of membership functions
(MFs) can be adapted. Here, the original Simpson’s MF is adopted, in which the
slope outside the HB is established by the value of the fuzziness parameter .

=

f(x, ) =
Where t =

;

(3)

is the number of pure hyperboxes labeling m. Several pHB


can be associated with the same cluster labeling m, thus the overall input MF,

(

is calculated as follows:

(

, …,

= max{

}

(4)

The process of the ANFIS can be summarized as follows:
Choose the number of neurons of the hidden layer
Step 1. Separate the data set {(

,

.

), i = 1..k} (1) to build data clusters , i

= 1…m
Using the algorithm for parting data space, PDS [2], the given data set (1) is
separated into hyper box-typed data clusters in the input space and hyper planes,
, i = 1..m, in the output data space. Where, M is optimal number of data

clusters established by the clustering process.
Step 2. Build a new data set, named NN-set, for training the NN
The NN-set has k samples with input-output samples depicted by (1).
Step 3. Train the NN
10

.d o

m

o

w

w

w

.d o

C

lic

k

to

bu


y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC


er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c



y
o

c u -tr a c k

.c

The NN-set is used for train the NN based on the algorithm Le-venbergMarquardt.
- Calculate values of MFs based on equations (3) and (4);
- The output of the neuro-fuzzy network is calculated as following equations:

=

(5)

, k = 1…M

=

(6)

Step 4.Check for stopping condition
Calculate error between output of the NN-set and corresponding depicting
output of the NN

=
- If EN

[E] : the structure FNS based on the NN is chosen;


- If EN > [E] : N=N+1 then return to Step 3.

11

.d o

m

o

w

w

w

.d o

C

lic

k

to

bu

y
bu

to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC

er


O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


y
o


c u -tr a c k

.c

Chapter 3. Experiments
3.1 Dataset Introduction
3.1.1 Overview
The sample dataset used is netflix data set which contains 14,707,483ratings
which

performed

by

459,340

anonymous

NetFlix’s

customers

over

17,770movies, from 1999-11-11 to 2005-12-31. The rating scale has 5 values: 5
is excellent, 4 is very good, 3 is good, 2 is fair, and 1 is poor.

Fg3.1.1 1 Netflix Dataset structure
There are 2 primary tables named “rating” and “movie_info”. The first one is

rating, it has 9 columns: movie_id, genre,rating, director, writers, star,
image_link, host, content_rating.
The table “rating” has 4 columns: User_id, movie_id, rating, date.

12

.d o

m

o

w

w

w

.d o

C

lic

k

to

bu


y
bu
to
k
lic
C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC


er

O
W

F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c



y
o

c u -tr a c k

.c

3.1.2 Dataset analysis
The Customer’s rating-score was stored into a table named “rating”. It
records 14,7 million of user's rating, each record represents a single rating of one
movie_id by one user_id, and some additional information as user's rating score,
date. All of them are anonymous rates, the lowest rating score is the label of 1
rating-score with 632 thousand rates, after that is the label of 2 rating-scores with
1,4 million rates, the label of 5 rating-scores with 3 million rates and the highest
is the label of 4 rating-scores with 4,7 million rates, and the label of 3 ratingscores has 4 million rates. As shown in Fg3.1.2.1.

Fg3.1.2 1 Rating-scores statistic
Fg3.1.2.2 shown the statistic of top 10 movies which have number of rating
is highest

13

.d o

m

o

w


w

w

.d o

C

lic

k

to

bu

y
bu
to
k
lic
C

w

w

w

N


O
W

!

h a n g e Vi
e

N

PD

!

XC

er

O
W

F-

w

m

h a n g e Vi
e


w

PD

XC

er

F-

c u -tr a c k

.c


y
o

c u -tr a c k

.c

Fg3.1.2 2 The rating-scores comparison for top 10 movies have highest number of
rating
3.2 Applied ANFIS to netflix dataset
3.2.1 ANFIS Model

Fg3.2.1 1 The ANFIS’s structure
14


.d o

m

o

w

w

w

.d o

C

lic

k

to

bu

y
bu
to
k
lic

C

w

w

w

N

O
W

!

h a n g e Vi
e

N

PD

!

XC

er

O
W


F-

w

m

h a n g e Vi
e

w

PD

XC

er

F-

c u -tr a c k

.c


×