Tải bản đầy đủ (.pdf) (26 trang)

Hệ tư vấn dựa trên mức độ quan trọng hàm ý thống kê ttta

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.07 MB, 26 trang )

UNIVERSITY OF DANANG
UNIVERSITY OF SCIENCE AND TECHNOLOGY

----------

PHAN PHUONG LAN

RECOMMENDATION SYSTEMS BASED ON
STATISTICAL IMPLICATIVE MEASURES

Specialization: Computer Science
Code: 9480101

DOCTORAL THESIS SUMMARY

Danang – 2019


The dissertation is completed at:
UNIVERSITY OF SCIENCE AND TECHNOLOGY UNIVERSITY OF DANANG

Academic Instructors:
1. Huynh Xuan Hiep, Assoc. Prof., PhD.
2. Huynh Huu Hung, PhD.

Opponent 1:……………………………..……………
Opponent 2:………………...………...………………
Opponent 3:………………...……...…………………

The dissertation will be defended before the Board of
thesis review. Meeting at: …………………………


At ........ hour ......... day ....... month ....... year ........

The dissertation is available at:
- National Library
- Information and Learning Center, University of Da Nang


1
PREFACE
1. The urgency of the thesis
The recommendation system (RS) is considered as one of the
effective solutions for the information explosion problem
because it can automatically analyze data to predict the ratings
of a user for products, services, etc. thereby recommending to
that user the list of items with the highest predicted ratings. The
main techniques used to build a RS are: Content-based,
collaborative filtering, knowledge-based, and hybrid methods. In
particular, collaborative filtering is the most important and
commonly used technique. Proposing and improving the
recommendation models to adapt to the diversity of application
areas, the difference of user requirements and the development
of technology are always the main research direction on RSs.
Applying the statistical implicative analysis method (SIA) to
other research fields is being one of the most interesting topics.
Not much research links that method to RSs. The research still
has some unresolved issues: Only focusing on building the
models on binary data and not paying the attention to non-binary
data; just focusing on the accuracy of the recommended good
items when evaluating RSs; using the association rules to make
the recommendation, as a result, the recommendation time may

be long and the computer may be overloaded; and not noticing
the combination among the characteristics of statistical
implicative measures to improve the recommendation accuracy.
Therefore, the PhD. thesis "Recommendation systems based
on statistical implicative measures" is conducted to contribute a
small part to the research field on RSs and SIA.


2
2. Objectives, objects and scope of research of the thesis
2.1. Research objectives
The objective of the thesis is to understand and apply the
statistical implicative measures and the collaborative filtering
technique to propose recommendation models as well as improve
the accuracy of proposed models. Thereby, the thesis contributes
to linking the SIA method to the research on RSs.
2.2. Research objects
Two main objects of the study are: Statistical implicative
measures; and recommendation models based on statistical
implicative measures and collaborative filtering technique.
2.3. Research scopes
The scope of the study is: To obtain the understanding on the
statistical implicative measures, collaborative filtering technique,
and the existing studies on RSs using the SIA method; and to
propose new recommendation models that can be applied on both
binary and non-binary data and improve the accuracy of
recommendation (the list of good items and the predicted ratings).
3. Research methodology
Literature review and experiment are two main research
methods to be used by this thesis.

4. Contribution of the thesis
- Firstly, two new measures developed on statistical
implicative measures: (1) k nearest neighbors/users based
implicative rating - KnnUIR; and (2) k nearest neighbors/items
based implicative rating - KnnIIR. These measures are used to
predict the ratings given to items by a user.


3
- Secondly, three new recommendation models: (1) based on
the statistical implicative measures and association rules; (2)
based on KnnUIR; and (3) based on KnnIIR. The proposed
models can be applied on both binary data and non-binary data.
- Thirdly, the Interestingness software tool including the
utility functions and the proposed recommendation models. This
tool is developed in the R language, and is used for experiment.
- Fourthly, the DKHP binary dataset storing the course
registration. DKHP is collected and used for evaluating the
accuracy of recommendation.
5. Thesis structure
The thesis is organized into four chapters and six appendices
as the followings.
Chapter 1: An overview of statistical implicative measures
and recommendation systems.
Chapter 2: Recommendation based on statistical implicative
measures and association rules.
Chapter 3: Recommendation based on users implicative rating
measure.
Chapter 4: Recommendation based on items implicative
rating measure.

Appendices include: (1) Interestingness tool and DKHP
dataset; (2) Algorithms used for developing and evaluating the
proposed recommendation models; and (3) Some additional
experiment scenarios.


4
CHAPTER 1. AN OVERVIEW
1.1. Statistical implicative measures
1.1.1. Definition
Statistical implicative measures (SIM) are measures proposed
by the statistical implicative analysis method. SIMs are used to
detect trends in a binary attribute set or non-binary attribute set.
SIMs are asymmetric, probability based and non-linear measures.
1.1.2. Statistical implicative measures for binary data
1.1.3. Statistical implicative measures for non-binary data
1.2. Statistical implicative ratings
Statistical implicative rating measures is proposed by the
thesis using some existing SIMs. We can consider these measures
as SIMs. Statistical implicative rating measures are used to
predict the rating of a user for an item; thereby contributing to
solving recommendation problems.
1.3. Recommendation based on statistical implicative
analysis
1.3.1. Recommendation systems and research directions
1.3.2. Collaborative filtering technique
1.3.2.1. Memory based methods
1.3.2.2. Model based methods
1.3.3. Evaluating recommendation systems
1.3.3.1. K-fold cross validation method

1.3.3.2. Classification accuracy metrics
1.3.3.3. Predictive accuracy metrics


5
1.3.3.4. Rank accuracy metrics
1.3.4. Statistical implicative analysis based recommendation
1.3.4.1. Existing recommendation methods
1.3.4.2. Recommendation based on statistical implicative
measures
1.4. Conclusion
Chapter 1 focuses on obtaining the understanding on SIMs,
RSs and the accuracy metrics used for evaluating RSs. The thesis
summarizes SIMs (such as implicative intensity, entropic version
of implicative intensity, cohesion, contribution) and identify
which measures should be used by RSs and to improve the
accuracy of recommendation result. Besides, Chapter 1 also
focuses on the collaborative filtering technique and the accuracy
metrics to be used for building and evaluating recommendation
models. Moreover, Chapter 1 also presents the research
directions on RSs as well as the existing research related to RSs
based on statistical implicative analysis; then identify the scope
of study and sketch the proposal.


6
CHAPTER 2. RECOMMENDATION BASED ON
STATISTICAL IMPLICATIVE MEASURES AND
ASSOCIATION RULES
Differing from the existing recommendation models based on

the statistical implicative analysis (SIA) and association rules,
the proposed model of this chapter: Can be applied on both
binary and non-binary data; provides more SIMs (such as
implicative intensity, entropic version of implicative intensity,
cohesion) to make the recommendation; and enables to combine
one of the above measure with the contribution measure to
improve the accuracy of RSs.
2.1. Statistical implicative rules based model - SIR
The statistical implicative rules based model SIR is developed
on SIMs and association rules. The proposed model SIR is shown
in Figure 2.1. This model consists of:
- A finite set of users 𝑈 = {𝑢1 , 𝑢2 , … , 𝑢𝑛 }.
- A finite set of items (e.g. products, movies, etc.) 𝐼 = {𝑖1 ,
𝑖2 , … , 𝑖𝑚 }.
- A rating matrix 𝑅 = (𝑟𝑗𝑘 )𝑛x𝑚 where 𝑗 = 1. . 𝑛 and 𝑘 =
1. . 𝑚 to be used for storing the feedback (ratings) of users on
items. In binary form, 𝑟𝑗𝑘 = 1 if user 𝑢𝑗 likes the item 𝑖𝑘 and
𝑟𝑗𝑘 = 0 (or 𝑁𝐴) if 𝑢𝑗 does not like/know 𝑖𝑘 . In non-binary form,
𝑟𝑗𝑘 ∈ [0,1] if 𝑢𝑗 rates 𝑖𝑘 and 𝑟𝑗𝑘 = 𝑁𝐴 if 𝑢𝑗 does not rate/know
𝑖𝑘 .
- A vector 𝑅𝑢𝑎 storing the known ratings of the user 𝑢𝑎 who
needs the recommendation. 𝑅𝑢 = {𝑟𝑢 𝑘 } where 𝑘 = ̅̅̅̅̅̅
1, 𝑚 ; in
𝑎

𝑎

which, 𝑟𝑢𝑎𝑘 = 𝑁𝐴 if 𝑢𝑎 does not rate 𝑖𝑘 .



7
(𝑢𝑎 , I, 𝑅𝑢𝑎 )

(U, I, R)

Support threshold s

Maximum length of a rule l
Confidence threshold c

{𝑎 → 𝑏 | 𝑎 ∈ 𝐼𝑘 , 𝑏 ∈ 𝐼, 𝑘 = ̅̅̅̅̅̅̅̅̅
1, 𝑙 − 1}
The ruleset is
presented by the
statistical
implicative
analysis method

{𝑎 → 𝑏} = {𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ }
Implicative intensity,
Entropic version of implicative
Cohesion measure
Improved
model:
Combining
these ones
simultaneously

{𝑎 → 𝑏} = {𝑣𝑎,𝑏 }


Contribution measure

List of good items to be
recommended to 𝑢𝑎

Figure 2.1: The statistical implicative rules based model.
To reduce the recommendation time, the SIR model in Figure
2.1 is improved by combining the follows simultaneously
(directly): Generating association rules, presenting those rules by
the set of four values {𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ }, calculating the implicative
value of those rules according to a specific SIM. We can solve
this problem by using and modifying the rchic package.


8
2.2. Operation of the statistical implicative rules based
model
The operation of SIR model includes two stages: Building the
filtered ruleset presented according to the SIA method; and
performing the recommendation as shown in Figure 2.2. To
reduce the recommendation time, we can pre-built the learning
model (offline).
i2



im

u1


r11

NA



r1m

u2

NA

r21



r2m











un


r11

rn2



NA

Inputs
Rating matrix

i1

Building model
(online/offline)

Ratings of user who requires the recommendation

ua

i1

i2



im-1

im


NA

ra2



ram-1

NA

Making recommendation
(online)

Generating rules

Pre-processing
data

Presenting rules
according to SIA

Recommending items
with the
highest
implicative
values

Filtering rules

The list of top N items

ua

{i1, i13,…, im-2}

Figure 2.2: The operational diagram of the SIR model.
2.3. Experiment
2.3.1. Data and tool
Three data sets used for the experiment are MSWeb,
MovieLens and DKHP (course registration). In which, MSWeb


9
and DKHP are binary datasets and MovieLens is a non-binary
dataset.
We developed the Interestingnesslab tool to conduct the
experimental scenarios. Besides, some recommendation models
of the recommenderlab package are used for comparing with the
SIR model. These models are: The association rule based on
model (AR); the item based collaborative filtering model (IBCF)
using Jaccard measure; the popular model (POPULAR).
The experimental scenarios are run on the computers with the
following configurations: (1) Window 8 OS, 16 GB RAM, and
Intel Pentium G630 2.7GHz processor; and (2) Windows 10 OS,
8 GB RAM, and Intel Core i5-6200U 2.5GHz CPU processor.
2.3.2. Evaluating the SIR model on binary data
The accuracy of the SIR model is compared with that of some
existing models by the 5-folds cross validation method and the
classification accuracy metrics (via Precision - Recall curve,
ROC curve and the F1 measure combining the precision and the
recall). The experimental results show that:

- The simultaneous combination of steps at the learning stage
(in the improved SIR model) reduces the recommendation time.
- The accuracy of SIR model is the highest when the entropic
version of implicative intensity and the contribution measure are
combined together to make the recommendation.
- The accuracy of the SIR model combining the entropic
version of implicative intensity and the contribution measure is
higher than that of the compared recommendation models (AR,
POPULAR, IBCF); Especially, when the user requiring the
recommendation is not a new user (i.e. the number of items that
were rated by that user, the number of known ratings, is not too
low).


10
2.3.2. Evaluating the SIR model on non-binary data
- The accuracy of SIR model is the highest when (1) the
entropic version of implicative intensity and the contribution
measure are combined together and the user does require many
recommended items. In reality, the user will be confused by a lot
of items to be recommended.
- The accuracy of SIR model is higher than that of POPULAR
- a recommendation model based on the most popular items.
2.4. Conclusion
Chapter 2 proposes the statistical implicative rules based
model SIR applied on both binary and non-binary data; and
improves the proposed model to reduce the recommendation time.
The ruleset represented by a set of four values can be pre-built
offline and used online when someone needs recommendation.
The SIR model provides many SIMs and can be expanded by

providing other objective interestingness measures. The SIR
model is coded and integrated in the Interestingnesslab tool. The
accuracy of SIR model is evaluated: By the classification
accuracy metrics such as ROC curve, Precision - Recall curve
and F1 measure; on two types of data: Binary (MSWeb, DKHP)
and non-binary (MovieLens); according to two groups of
scenarios: Internal comparison (using the same SIR model but
the different SIMs) and external comparison (the SIR model and
some existing recommendation models: AR, POPULAR and
IBCF). The experimental results show that the SIR model should:
(1) combine the entropic version of implicative intensity with the
contribution measure to make the recommendation; (2) be used
to build RSs because the accuracy of SIR model is higher than
that of compared models.


11
CHAPTER 3. RECOMMENDATION BASED ON USERS
IMPLICATIVE RATING MEASURE
The SIR model of Chapter 2 uses the association rules and
SIMs to recommend the list of good items to users. When the
number of rules is too large, the SIR model and the existing
models - also based the SIA and the association rules - have to
face some disadvantages: The recommendation time may be long
if the learning stage is performed online; and the computer may
be overloaded. Therefore, the thesis takes attention to the rules
with length of 2 to overcome those disadvantages. Besides, the
rating given by 𝑢𝑎 (a user requires the recommendation) to the
item 𝑖 maybe similar to the ratings given to 𝑖 by the nearest users
(neighbors) of 𝑢𝑗 . Moreover, each item owns the contribution to

the relationship of 𝑢𝑎 and his/her nearest user 𝑢𝑗 . As a result, the
thesis combines the above characteristics to improve the
accuracy of recommendation.
3.1. KnnUIR Definition
The k nearest neighbors (i.e. users) based implicative rating
measure 𝐾𝑛𝑛𝑈𝐼𝑅 is proposed to predict the rating given by a
user 𝑢𝑎 for an item 𝑖 ∈ 𝐼 . The purpose of this proposal is to
increase the recommendation accuracy. 𝐾𝑛𝑛𝑈𝐼𝑅 - defined by
(3.1) - is based on: (1) the number of nearest users of 𝑢𝑎 - 𝑘𝑛𝑛
(the nearest neighbors 𝑢𝑗 are identified by the implicative
intensities of 𝑢𝑎 and 𝑢𝑗 ); (2) the ratings of item 𝑖 that were rated
by those neighbors - 𝑟𝑢𝑗𝑖 ; (3) the typicality of 𝑖 contributing to
the relationship of 𝑢𝑎 and 𝑢𝑗 - 𝛾(𝑖, 𝑢𝑎 → 𝑢𝑗 ) . The value of


12
𝐾𝑛𝑛𝑈𝐼𝑅(𝑢𝑎 , 𝑖) has to be transformed to the range [0, 1] - the
same scale as elements of rating matrix.
𝑘𝑛𝑛

𝐾𝑛𝑛𝑈𝐼𝑅(𝑢𝑎 , 𝑖) = ∑

𝑗=1

𝑟𝑢𝑗𝑖 ∗ 𝛾(𝑖, 𝑢𝑎 → 𝑢𝑗 )

(3.1)

3.2. Users implicative rating based model - UIR
The users implicative rating based model UIR is developed

by using the proposed KnnUIR measure and the user based
collaborative filtering method. The UIR model shown in Figure
3.1 has the same components as the SIR model. However, this
UIR model not only predicts the rating given by a user to an item
but also recommends the list of top items to a user.
(𝑢𝑎 , I, 𝑅𝑢𝑎 )

(U, I, R)

Implicative intensity

𝑢𝑎 x U  {𝜑(𝑢𝑎 , 𝑢𝑗 ), 𝑗 = ̅̅̅̅̅̅̅̅
1, 𝑘𝑛𝑛}

K nearest neighbors/users based
implicative rating measure (KnnUIR)

Reclist={𝑖 |𝑖 ∈ 𝐼, 𝑟𝑢′ 𝑎𝑖 ∈ 𝑇𝑜𝑝𝑁}

𝑢𝑎 x I  𝑅𝑢′ 𝑎

Figure 3.1: The users implicative rating based model.
3.3. Operation of the users implicative rating based model
The operational diagram of the UIR model is presented in
Figure 3.2.


13
Ratings of user who requires the recommendation
ua


i1

i2



im-1

im

NA

ra2



ram-1

NA

Rating matrix
i1
u1
r11
u2
NA


un

rn1

i2
NA
r22

rn2







im
r1m
r2m

NA

Inputs

Pre-processing data

Presenting the relationship of ua and uj where ujU
according to SIA and calculating the implicative
intensity of (ua, uj)

Preparing for
calculating the

KnnUIR value

Finding the k nearest neighbors of ua

Calculating the typicality of i contributing to the
relationship (ua, uj)

Predicting the rating given by ua for iI using KnnUIR

No

Recommending

Recommend?
Yes
Recommending items with the highest predicted ratings toua

The list of top N items
Outputs
ua

{i1, i13, im-2}

Predicted ratings
i1
i2
ua
r’a1
r’i2





im
r’am

Figure 3.2: The operational diagram of the UIR model.
3.4. Experiment
3.4.1. Data and tool
The Interestingnesslab tool with the proposed UIR model; the
MSWeb, DKHP and MovieLens datasets; the recommenderlab


14
package with existing models (POPULAR, IBCF, AR, UBCF,
ALS_Implicit and SVD); and the computers (as described in
Section 2.3.1) are also used for the experiment of this chapter.
3.4.2. Evaluating the UIR model using the classification
accuracy metrics
- The accuracy of the proposed UIR model (via Precision Recall curve, ROC curve and the F1 measure) is higher than that
of the AR, IBCF and POPULAR models but not much higher
than that of the UBCF model.
- The accuracy of the UIR model is lower than that of the SIR
model (Chapter 2) if the user requiring the recommendation is a
new user (given = 1), the number of nearest users and the number
of good items to be recommended are low.
3.4.3. Evaluating the UIR model using the predictive accuracy
metrics
- The contribution of an item to the relationship of two users
increases the recommendation accuracy.

- The accuracy of the proposed UIR model is higher than that
of the UBCF model (i.e. the mean absolute error MAE and the
root mean squared error RMSE are lowest) if the user requiring
the recommendation is not a new user. In the opposite case, the
accuracy of the UIR model still higher than that of UBCF model
if the number of nearest neighbors to be used for predicting
ratings is high.
3.4.4. Evaluating the UIR model using the rank accuracy metrics
The experiment is conducted for the case where the active
user rated a few of items and requires a few of recommended


15
items. The experimental result shows that the accuracy of the
proposed UIR model (via the nDCG metric) is higher than that
of the UBCF, ALS_Implicit and SVD models if the knn>=30.
3.5. Conclusion
Chapter 3 proposes a new measure - called KnnUIR - that
predicts a user's rating for an item. KnnUIR is developed from
two SIMs - the typicality and the implicative intensity. KnnUIR
incorporates many factors affecting the predicted ratings such as
the nearest neighbors, the ratings that were rated by those
neighbors, and the contribution of an item to the relationship of
user requiring the recommendation and his/her nearest neighbors.
Besides, Chapter 3 proposes a new recommendation model named UIR - using KnnUIR and the user based collaborative
filtering method. The accuracy of the proposed UIR model is
evaluated by: The classification accuracy metrics (for binary
data), the predictive accuracy metrics (for non-binary data) and
the rank accuracy metrics (for both binary and non-binary data);
the group of internal comparison scenarios (UIR and SIR) and

the group of external comparison scenarios (UIR and the existing
models: AR, IBCF, POPULAR, ALS_Implicit, UBCF, SVD).
Experimental results show that the accuracy of the UIR model:
(1) is higher when considering the contribution of items in
relationship of a user and his/her neighbor; and (2) is the higher
than that of the compared existing models when the number of
known ratings of user who needs the recommendation is not too
low (i.e. that user is not a new user). Moreover, the experimental
results also show that the accuracy of UIR model is lower than
that of proposed SIR model in the case of new users.


16
CHAPTER 4. RECOMMENDATION BASED ON ITEMS
IMPLICATIVE RATING MEASURE
When predicting the rating given by the user 𝑢𝑎 to the item 𝑖,
we consider the items that were rated by 𝑢𝑎 are the potential
nearest neighbors of 𝑖. Each nearest neighbor 𝑖𝑗 has the different
effect on 𝑖. This value can be measured by the interestingness of
relationship (𝑖𝑗 , 𝑖). The confidence measure is used to calculate
the strength of relationship using the examples 𝑛𝑖𝑗𝑖 whereas the
implicative intensity is used for calculating the surprisingness of
relationship using the counter-examples 𝑛𝑖𝑗𝑖̅. If two relationships
(𝑖𝑗1 , 𝑖) and (𝑖𝑗2 , 𝑖) have the same confidence value, we use the
surprisingness value and otherwise. Therefore, these two
measures can be combined toghether to clearly distinguish the
effect of each neighbor 𝑖𝑗 on 𝑖. Chapter 4 also uses the nearest
neighbors as Chapter 3 but its neighbors is the items; is also based
on items as Chapter 2 but it just considers the relationship of two
items instead of a set of items and one item.

4.1. KnnIIR Definition
The k nearest neighbors (i.e. items) based implicative rating
measure 𝐾𝑛𝑛𝐼𝐼𝑅 is proposed to predict the rating given by a user
𝑢𝑎 for an item 𝑖 ∈ 𝐼 ; thereby increasing the recommendation
accuracy. 𝐾𝑛𝑛𝐼𝐼𝑅 is developed by the ratings of 𝑢𝑎 for items 𝑖𝑗
(𝑖𝑗 can be seen as one of potential nearest neighbors of 𝑖) and the
strength of relationship between each neighbor 𝑖𝑗 and the item 𝑖
using the confidence value 𝑐(𝑖𝑗 , 𝑖) and one of SIM values - such
as the implicative intensity 𝜑(𝑖𝑗 , 𝑖) or the cohesion value
𝑐𝑜ℎ(𝑖𝑗 , 𝑖) or the entropic version of implicative intensity 𝜙(𝑖𝑗 , 𝑖).


17
As a result, 𝐾𝑛𝑛𝐼𝐼𝑅 not only consideres the examples 𝑛𝑖𝑗 𝑖 of
relationship 𝑖𝑗 , 𝑖 but also considers the counter-examples 𝑛𝑖𝑗𝑖̅ of
this relationship.
𝑘𝑛𝑛

𝐾𝑛𝑛𝐼𝐼𝑅(𝑢𝑎 , 𝑖) = ∑

𝑗=1

𝑟𝑢𝑎𝑖𝑗 ∗ 𝑣𝑖𝑗 𝑖

(4.1)

𝜑(𝑖𝑗 , 𝑖) ∗ 𝑐(𝑖𝑗 , 𝑖)
𝑣𝑖𝑗 𝑖 = [𝑐𝑜ℎ(𝑖𝑗 , 𝑖) ∗ 𝑐(𝑖𝑗 , 𝑖)

(4.2)


𝜙(𝑖𝑗 , 𝑖) ∗ 𝑐(𝑖𝑗 , 𝑖)
4.2. Items implicative rating based model - IIR
The items implicative rating based model IIR is shown in
Figure 4.1.
(U, I, R)

(𝑢𝑎 , I, 𝑅𝑢𝑎 )

Confidence measure,
Implicative intensity,
Entropic version of implicative
intensity,
Cohesion measure

I x I  𝑉 = {𝑣𝑗𝑘 | 𝑗, 𝑘 = ̅̅̅̅̅̅̅̅
1, 𝑘𝑛𝑛}
K nearest neighbors/items based
implicative rating measure (KnnIIR)

Reclist={𝑖 |𝑖 ∈ 𝐼, 𝑟𝑢′ 𝑎𝑖 ∈ 𝑇𝑜𝑝𝑁}

𝑢𝑎 x I  𝑅𝑢′ 𝑎

Figure 4.1: The items implicative rating based model.
Similar to the models of Chapter 2 and Chapter 3, the
proposed IIR model also has a finite user set, a finite item set, a
rating matrix, a vector with the ratings already rated by user
requiring the recommendation, and a vector with the predicted
ratings. Differing from the models of the previous chapters, the

IIR model uses the item matrix V to store the values 𝑣𝑗𝑘 to carry


18
out the recommendation. Matrix V can be built directly or
indirectly. In the indirect form, we generate a set of rules (similar
to Chapter 2) but only consider rules with length of 2, the
thresholds of support and confidence to be 0; then convert this
ruleset to the item matrix. However, compared to the direct
method, this approach can increase the recommendation time as
well as depends on the tools used for generating rules. Besides,
the V matrix can be built online or offline. When the number of
items and the size of the dataset is large, the recommendation
time can be shortened if we pre-build the V matrix (offline) and
store it in a file.
4.3. Operation of the items implicative rating based model
The operational diagram of the IIR model is depicted in
Figure 4.2.
4.4. Experiment
4.4.1. Data and tool
Chapter 4 also uses the datasets and tool used by the SIR and
UIR models.
4.4.2. Evaluating the IIR model using the classification
accuracy metrics
- Building the item matrix directly can reduce the
recommendation time and does not depend on the tools used for
generating rules.
- The accuracy of IIR model (via Precision - Recall curve,
ROC curve and the F1 measure) is the highest when the
implicative intensity is used for building the item matrix and knn

is the number of items of the dataset.


19
- The accuracy of the IIR model is higher than that of the
compared recommendation models (AR, POPULAR, IBCF, SIR)
when the user requiring the recommendation is not a new user.
Rating matrix
i1
i2
u1
r11
NA
u2
NA
r21



un
r11
rn2

Inputs







im
r1m
r2m

NA

Ratings of user who requires the recommendation
i1
i2

im-1
im
ua
NA
ra2

ram-1
NA

Building the item matrix
with knn neighbors

Predicting ratings
using KnnIIR

Building the
item matrix

Pre-processing
data


i1

im

i1
NA

v11






Filtering the
matrix to obtain
knn neighbors

Outputs
The list of top N items
ua
{i1, i13,…, im-2}

Making the recommendation

im
v1m

NA


Recommend?
No
Yes

Recommending items
with the highest
predicted ratings

Predicted ratings
i1
i2
ua
r’a1
r’a2




im
r’am

Figure 4.2: The operational diagram of the IIR model.
4.4.3. Evaluating the IIR model using the predictive accuracy
metrics
- The accuracy of the IIR model (via MAE and RMSE) is the
highest when knn is the number of items of the dataset; and the
entropy version of implicative intensity is used for building the



20
item matrix if a user only rated a few items and the cohesion
measure otherwise.
- The accuracy of the IIR model is higher than that of the
IBCF model if a user requiring the recommendation already rated
many items.
4.4.4. Evaluating the IIR model using the rank accuracy
metrics
The accuracy of IIR model (via nDCG) is higher than that of
the IBCF, ALS_Implicit models if the active user rated a few of
items and requires a few of recommended items.
4.5. Comparing the proposed models
If dataset in binary form, the SIR model is suitable for the case
in which the active user rated a few of items whereas the IIR
model fits for the other cases. Besides, if the recommendation
time is taken into account, the UIR model can be used instead of
the SIR model. If the data in non-binary form, the accuracy of
UIR model is higher than that of IIR model.
4.6. Conclusion
Chapter 4 proposes a new measure (named KnnIIR)
developed from the relationship of two items to predict ratings;
and the IIR model using the proposed measure to recommend a
list of good items to a user or predict the rating given by a user
to an item. The proposed IIR model is improved by building the
item matrix directly. This reduces the recommendation time and
avoid the reliance on the tool used for generating rules. The
accuracy of IIR model is also evaluated: On both binary and nonbinary data; according to the classification accuracy metrics, the
predictive accuracy metrics and the rank accuracy metric. The



21
experimental results show that the IIR model should: (1) use the
implicative intensity if data in binary form or the combination of
the entropic version and the cohesion measure if data in nonbinary form to build the item matrix; (2) be used to build RSs
because of the high accuracy. In addition, the experimental
results also show that: (1) the combination between the
confidence value and the implicative value of two items
improves the recommendation result; and (2) the accuracy of IIR
model is lower than that of the SIR in the case of new user.


22
CONCLUSION AND FUTURE WORKS
Results of the study
- Identifying the statistical implicative measures to be used for
RSs; then proposing and improving the recommendation model
based on SIMs and association rules to recommend the good
items to users.
- Proposing a new measure KnnUIR based on the nearest
users and some SIMs, and then proposing a new recommendation
model UIR using this measure. The proposed model can predict
the ratings given by a user to items and recommend the good
items to users.
- Proposing a new measure KnnIIR based on the nearest items
and some SIMs, and then proposing a new recommendation
model IIR using the proposed measure.
- Developing the Interestingness tool in R language used for
the experiment.
- Collecting a binary dataset DKHP storing the information of
course registration to be used for evaluating the accuracy of

recommendation.
Future works
- Developing a hybrid recommendation model to obtain the
advantages of each proposed model.
- Evaluating the proposed models using other methods to
obtain the full evaluation; thereby modifying those models to get
the higher accuracy.
-Combining with methods of deep learning and reinforcement
learning to improve the accuracy of proposed models.


23
PUBLISHED ARTICLES
1. Lan Phuong Phan, Nghia Quoc Phan, Vinh Cong Phan, Hung Huu
Huynh, Hiep Xuan Huynh, and Fabrice Guillet, “Classification of
objective interestingness measures”, EAI Endorsed Transactions on
Context-Aware Systems and Applications, Vol. 3, No. 10, pp. 1-13,
2016.
2. Lan Phuong Phan, Nghia Quoc Phan, Ky Minh Nguyen, Hung Huu
Huynh, Hiep Xuan Huynh, and Fabrice Guillet, “Interestingnesslab: A
Framework for Developing and Using Objective Interestingness
Measures”, In Proceeding of The International Conference on
Advances in Information and Communication Technology, Thai
Nguyen, Vietnam, December 12-13, 2016, Springer, pp. 302-311, 2017.
3. Lan Phuong Phan, Ky Minh Nguyen, Hiep Xuan Huynh and Huu
Hung Huynh.“Association-Based Recommender System using
Statistical Implicative Cohesion Measure”. In Proceedings of the
Eighth International Conference on Knowledge and Systems
Engineering (KSE 2016), Ha Noi, Vietnam, October 6-8, 2016, IEEE,
pp. 144 -149, 2016.

4. Lan Phuong Phan, Huu Hung Huynh, Hiep Xuan Huynh, Régis
GRAS. “Systeme de recommandation basé sur des mesures
implicatives fortes”. Dans Actes du 9ème colloque d'Analyse Statistique
Implicative (A.S.I.9), Belfort, France, Octobre 4-7, 2017, Université
Bourgogne Franche-Comté – Besançon, pp. 508-532, 2017.
5. Phan Phương Lan, Huỳnh Hữu Hưng, Huỳnh Xuân Hiệp, “Hệ tư
vấn dựa trên độ đo cường độ hàm ý và trách nhiệm”, Kỷ yếu Hội nghị
Quốc gia lần thứ X về Nghiên cứu cơ bản và ứng dụng Công nghệ
Thông tin năm 2017 (FAIR 2017), Đà Nẵng, Việt Nam, ngày 17-18
tháng 8 năm 2017, Nhà xuất bản Khoa học tự nhiên và Công nghệ, trang
256-274, 2017.
6. Phan Phương Lan, Huỳnh Hữu Hưng, Huỳnh Xuân Hiệp, “Hệ tư
vấn lọc cộng tác dựa trên các độ đo hàm ý thống kê”, Trong Kỷ yếu Hội
nghị Quốc gia lần thứ XX về Điện tử, Truyền thông và Công nghệ Thông
tin (REV-ECIT 2017), Tp. Hồ Chí Minh, Việt Nam, ngày 14-15 tháng
12 năm 2017, Nhà xuất bản Khoa học và Kỹ thuật, trang 200-205, 2017.


×