Tải bản đầy đủ (.docx) (151 trang)

Dự đoán liên kết trong mạng hỗn tạp và ứng dụng dự đoán mối quan hệ giữa RNA không mã hóa và bệnh.

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.91 MB, 151 trang )

MINISTRY OF EDUCATION AND TRAINING
HANOI NATIONAL UNIVERSITY OF EDUCATION

NGUYEN VAN TINH

LINK PREDICTION IN HETEROGENEOUS INFORMATION
NETWORKS AND ITS APPLICATIONS IN PREDICTING
ASSOCIATIONS BETWEEN NON-CODING RNAS AND DISEASES

DOCTORAL DISSERTATION IN COMPUTER SCIENCE

HANOI-2023


MINISTRY OF EDUCATION AND TRAINING
HANOI NATIONAL UNIVERSITY OF EDUCATION

NGUYEN VAN TINH

LINK PREDICTION IN HETEROGENEOUS INFORMATION
NETWORKS AND ITS APPLICATIONS IN PREDICTING
ASSOCIATIONS BETWEEN NON-CODING RNAS AND DISEASES
Major: Computer Science
Code: 9480101

DOCTORAL DISSERTATION IN COMPUTER SCIENCE

SUPERVISORS
1.

Assoc. Prof. Dr. TRAN DANG HUNG



2.

Dr. LE THI TU KIEN

Hanoi-2023


i

AUTHORSHIP'S DECLARATION

I, NGUYEN VAN TINH, affirm that the dissertation entitled “Link
prediction in heterogeneous information networks and its applications in
predicting associations between non-coding RNAs and diseases” has been
completed by myself under the supervision of Assoc.Prof.Dr. Tran Dang Hung
and Dr. Le Thi Tu Kien. I assure some points as follows:
-

This dissertation was done in the Ph.D. research time at Hanoi National
University of Education.

-

This work has not been submitted for any other degrees or qualifications at
Hanoi National University of Education or any other institutions.

-

Appropriate acknowledgment has been given in the thesis where references

have been made to the other published works.

-

The submitted thesis is my own, except the work in the collaboration has
been included. The collaborative contributions have been indicated.
Hanoi, 2023
Ph.D. Student

SUPERVISORS:
1. Assoc. Prof. Dr. TRAN DANG HUNG

2. Dr. LE THI TU KIEN


ii

ACKNOWLEDGEMENT
The dissertation was completed in duration of my Ph.D. course at Hanoi
National University of Education (HNUE). HNUE is a special place where I
obtained valuable knowledge and skills on the way to become a researcher. I am so
grateful for all the people who always support and encourage me completing the
dissertation.
Firstly, I would to say thanks to my advisors, Assoc. Prof. Dr. Tran Dang
Hung and Dr. Le Thi Tu Kien for their instruction, advice, and encouragement
throughout my Ph.D. course. My dissertation could not be completed without my
advisors’ scientific direction, encouragement, and support.
Secondly, I wish to thank all members of the Faculty of Information
Technology, HNUE for their frequent support during my Ph.D. course. And I also
wish to thank all my colleagues in the Faculty of Information Technology, Hanoi

University of Industry (HaUI) for their support in professional work during the time
of the Ph.D. course.
Next, I wish to thank Assoc. Prof. Dr. Than Quang Khoat, Hanoi University
of Science and Technology, and Dr. Nguyen Tran Quoc Vinh, Faculty of
Information Technology, The University of Da Nang - University of Science and
Education for their contributions and suggestions during my Ph.D. course.
And then, I also would like to thank all reviewers for their valuable comments
and suggestions on the dissertation’s completion.
Additionally, this work was funded by Gia Lam Urban Development and
Investment Company Limited, Vingroup and Supported by Vingroup Innovation
Foundation (VINIF) under project code VINIF.2019 DA18.
Finally, I would like to express my sincere gratitude to my family and friends
for their continuous support and encouragement to complete the Ph.D. course.
Hanoi, 2023
Ph.D. Student
Nguyen Van Tinh


iii

CONTENTS
AUTHORSHIP'S DECLARATION........................................................................i
ACKNOWLEDGEMENT.......................................................................................ii
CONTENTS.............................................................................................................iii
ABBREVIATIONS..................................................................................................vi
LIST OF TABLES..................................................................................................vii
LIST OF FIGURES...............................................................................................viii
INTRODUCTION.....................................................................................................1
CHAPTER 1. BACKGROUND.............................................................................10
1.1. Basic concepts....................................................................................................10

1.1.1.

Heterogeneous information networks..........................................................11

1.1.2.

Biological systems.......................................................................................13

1.1.3.

Non-coding RNAs (ncRNAs)......................................................................14

1.2. Link prediction in heterogeneous information networks...................................15
1.2.1.

Link prediction problem..............................................................................15

1.2.2.

Link prediction methods..............................................................................16

1.2.3.

Link prediction applications in biological systems......................................19

1.3. Computational methods for predicting associations between non-coding RNAs
and diseases...............................................................................................................22
1.3.1.

Predicting non coding RNA-disease association prediction as a link


prediction problem....................................................................................................22
1.3.2.

Materials used for ncRNA-disease association prediction..........................22

1.3.3.

Similarity calculation and network construction.........................................26

1.3.4.

Literature review of computational methods to predict ncRNA-disease

associations...............................................................................................................27
1.4. Thesis’s research directions................................................................................36
1.5. Some evaluation methods and metrics to evaluate prediction performance......37
1.5.1.

Cross-validation...........................................................................................37

1.5.2.

Area under Roc Curve (AUC).....................................................................38


iv

1.5.3.


Area under Precision-Recall Curve (AUPR)...............................................39

1.5.4.

Checking case studies..................................................................................40

1.6. Chapter summary...............................................................................................41
CHAPTER 2. NCRNA-DISEASE ASSOCIATIONS PREDICTION WITH
COLLABORATIVE

FILTERING

AND

RESOURCE

ALLOCATION

PROCESS ON A TRIPARTITE GRAPH............................................................43
2.1. Motivations.........................................................................................................43
2.2. Main related works............................................................................................45
2.2.1. The item-based collaborative filtering algorithm for ncRNA-disease
association prediction................................................................................................45
2.2.2. Resource allocation on a tripartite graph........................................................46
2.3. The proposed model for predicting ncRNA-disease associations based on a
collaborative filtering algorithm and a resource allocation process on a tripartite
graph.........................................................................................................................48
2.4. Employing the proposed model to infer miRNA-disease associations based on
collaborative filtering and resource allocation..........................................................50
2.4.1. Detailed description of proposed model's stages in inferring miRNA-disease

associations...............................................................................................................50
2.4.2. Proposed method's experiments and results....................................................54
2.5. Employing the proposed model to predict lncRNA-disease associations based
on collaborative filtering and resource allocation.....................................................66
2.5.1. Detailed description of proposed model's stages in predicting lncRNA-disease
associations...............................................................................................................66
2.5.2. Proposed method’s experiments and results...................................................71
2.6. Chapter summary...............................................................................................79
CHAPTER 3. MIRNA-DISEASE ASSOCIATIONS PREDICTION USING
IMPROVED RANDOM WALK WITH RESTART AND INTEGRATING
MULTIPLE SIMILARITIES................................................................................81
3.1. Motivation and main related works...................................................................81


v

3.2. Datasets used in the proposed method...............................................................83
3.2.1. Human miRNA-disease associations..............................................................83
3.2.2. Disease semantic similarity.............................................................................83
3.2.3. MiRNA functional similarity..........................................................................84
3.3. Proposed method................................................................................................85
3.3.1. Proposed method overview.............................................................................85
3.3.2. Calculating Gaussian interaction profile kernel similarity for miRNAs and
diseases…………….................................................................................................87
3.3.3. Calculating Integrated similarity for miRNAs and diseases...........................88
3.3.4. Weighted K-nearest known neighbors algorithm...........................................88
3.3.5. Constructing miRNA similarity-based and disease similarity based
heterogeneous networks............................................................................................89
3.3.6. Employing improved random walk with restart to predict miRNA-disease
associations...............................................................................................................91

3.3.7. Rank the final prediction score of associations to obtain predicted miRNAdisease associations...................................................................................................94
3.4. Experiments and results.....................................................................................94
3.4.1. Datasets...........................................................................................................94
3.4.2. Implementing and Estimating time complexity of the proposed method.......95
3.4.3. Performance measures....................................................................................96
3.4.4. Performance comparison with other related models.....................................100
3.4.4. Case studies...................................................................................................102
3.5. Chapter summary and discussion.....................................................................108
CONCLUSION AND FUTURE WORKS..........................................................110
PUBLICATIONS..................................................................................................113
REFERENCES......................................................................................................114


vi

ABBREVIATIONS
No
1
2
3
4
5
6
7
8
9
10
11
12
13

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Abbreviation
AUC
AUPR
CF
CNN
CRC
DAGs
DBN
FN
FP
FPR
GCN

GIP
HCC
HF
HIN
lncRNAs
LOOCV
MF
miRNAs
ncRNAs
NMF
OAG
POAG
ROC
RWR
SVM
TN
TP
TPR
WKNKN

Meaning
Area Under Roc Curve
Area Under Precision-Recall Curve
Collaborative filtering
Convolutional neural network
Colorectal cancer
Directed acrylic graphs
Deep brief network
False negative
False positive

False positive rate
Graph convolutional network
Gaussian interaction profile
Hepatocellular carcinoma
Heart failure 
Heterogeneous information network
Long non-coding RNAs
Leave-one-out cross validation
Matrix factorization
Micro RNAs
Non-coding RNAs
Non-negative matrix factorization
Open-angle glaucoma
Primary open-angle glaucoma
Receiver operating characteristic
Random Walk with Restart
Support vector machine
True negative
True positive
True positive rate
Weighted K nearest known neighbors


vii

LIST OF TABLES
Table 1.1. Databases containing miRNA-related information and miRNA-disease
associations...............................................................................................................23
Table 1.2. Databases containing lncRNA-related information.................................24
Table 2.1. Performance comparison with other related models................................60

Table 2.2. Top 40 predicted miRNAs for Prostatic Neoplasms...............................62
Table 2.3. Top 40 predicted miRNAs for Heart failure............................................63
Table 2.4. Top 40 predicted miRNAs for Glioma....................................................64
Table 2.5. Top 20 miRNAs for Glaucoma, Open-Angle..........................................65
Table 2.6. AUC and AUPR values of related methods in comparison.....................76
Table 2.7. Top 10 predicted Prostate cancer-related lncRNAs.................................78
Table 2.8. Top 10 predicted lncRNAs related to Stomach cancer............................78
Table 3.1. AUC and AUPR One-sample t-test.........................................................97
Table 3.2. Evaluation of index changes in WKNKN algorithm...............................99
Table 3.3. AUC and AUPR values RWRMMDA and other latest methods in
comparison..............................................................................................................102
Table 3.4. Top 40 predicted Breast Neoplasms-associated miRNAs.....................103
Table 3.5. Top 40 predicted Hepatocellular carcinoma-associated miRNAs.........105
Table 3.6. Top 40 predicted Stomach Neoplasms-associated miRNAs.................106
Table 3.7. Top 10 predicted associations between Lung Neoplasms and miRNAs
from the simulated experiment for predicting new disease-related miRNAs.........107
Table 3.8. Top 10 predicted associations for Ovarian Neoplasms and miRNAs from
the simulated experiment for predicting new disease-related miRNAs..................108


viii

LIST OF FIGURES
Figure 0.1. The dissertation outline............................................................................8
Figure 1.1. An illustration of HIN with multiple node types and multiple link types.
...................................................................................................................................11
Figure 1.2. An illustration of HIN’s network schema..............................................12
Figure 1.3. An illustration of a link prediction problem...........................................16
Figure 1.4. A ROC curve and AUC's illustration.....................................................39
Figure 1.5. An illustration of a Precision-recall curve and AUPR...........................40

Figure 2.1. The proposed model's flowchart.............................................................49
Figure 2.2. The datasets and the numbers of data nodes in the proposed method....56
Figure 2.3. ROC curve and AUC value of the proposed method with γ = 0.9 in one
experimental running time........................................................................................59
Figure 2.4. Precision-Recall curve and AUPR value of the proposed method with γ
= 0.9 in one experimental running time....................................................................60
Figure 2.5. The relationships between the different data sources and the numbers of
data nodes used in the proposed method...................................................................72
Figure 2.6. The proposed method's ROC curves and AUC values in 5 running times
of experiments with γ=0.8 ........................................................................................75
Figure 2.7. The proposed method's Precision-Recall curves and AUPR values in 5
running times of experiments with γ=0.8.................................................................76
Figure 3.1. Illustration of computing miRNA functional similarity.........................84
Figure 3.2. The workflow of the proposed method (RWRMMDA).........................85
Figure 3.3. Illustration of the process of weight assignment in disease space and
miRNA space............................................................................................................91
Figure 3.4.

The improved RWR process's steps to predict miRNA-disease

associations...............................................................................................................92
Figure 3.5. ROC curves and AUC values (a) and PR curves and AUPR values (b) in
5 running times of 5-fold cross-validation experiments...........................................97


ix

Figure 3.6. ROC curve and AUC value (a) and PR curve and AUPR value (b) under
global LOOCV experiment.......................................................................................98
Figure 3.7. ROC curves and AUC values (a) and Precision-Recall curves and AUPR

values (b) in comparison with other related approaches.........................................101
Figure 3.8. ROC curves and AUC values (a) and Precision-Recall curves and AUPR
values (b) in different cases of RWRMMDAs.......................................................101


1

INTRODUCTION
Nowadays, we are in a connected world where data or objects’ information,
actors or agents, object groups or component groups are interacted with each other
to compose large networks. These networks are complex. They contain multiple
types of nodes and multiple types of interactions. These networks are called
heterogeneous information networks (HINs). They are rich in semantic information
and can be constructed from multiple data sources. Analyzing of heterogeneous
information network (HIN) generates a trendy research of mining of data, retrieving
of information, link prediction, mining of graph, network science, and so forth [1]–
[3].
Link prediction is a crucial and active task in HIN analysis. It benefits many
researchers and organizations in a variety of fields. The link prediction’s main
objective is to discover absent links in a network or to forecast links which may
soonly occur in a network. It has been extensively studied in different literature [4]–
[8]. Link prediction has been broadly applied in various domains from social
networks to biological systems. For biological systems, link prediction has been
used to discover the relationships or associations among biological objects such as
disease-phenotype/gene associations, disease-metabolite associations, drug-protein
interactions, drug-miRNA associations, disease-drug associations, non-coding
RNA-disease associations, and so forth. Especially, for a long time, identifying noncoding RNAs (ncRNAs) in the human genome is difficult. They were treated as
noise. However, ncRNAs play vital roles in life activities. Additionally, it has been
demonstrated that they have a significant impact on


the human diseases'

occurrence, progression as well as development. Identifying relationships between
ncRNAs and diseases has exposed opportunities for therapeutic and diagnostic of
human diseases. Therefore, the studies of ncRNA-disease relationships have
extensively been executed in recent years.
Recently, a huge number of experimental methods have been developed to


2

facilitate us in determining the relationships between ncRNAs and diseases.
However, conventional biological experiments make it costly, time-consuming, and
laborious to discover potential ncRNA-disease relationships. Therefore, it requires
to have computational methods for identifying ncRNA-disease associations. Among
ncRNAs' types, there are two special types, micro RNAs (miRNAs) as well as long
non-coding (lncRNAs), which have been carefully studied and attracted a lot of
attention from researchers. In the past few years, various computational methods for
predicting ncRNA-disease associations have been developed. We can practically
divides them into categories as: network-based, recommendation-based, resource
allocation-based, machine learning-based, deep learning-based, as well as multi
model and biological information integration-based methods [9]–[12]. Although
actual computational methods have made massive benefits in revealing disease‐
associated ncRNAs in each category and typically decrease the cost as well as time
of biological experiments. For examples, network-based methods are easy to
understand and normally have fast predictive capabilities. The machine learningbased methods can strongly learn and derive ncRNAs or diseases' features. The
deep learning-based approaches, with the graph neural networks' development, have
strong abilities of learning as well as predicting to combine features of networks and
biology. Howerver, there are still some limitations which are needed to be solved as
follows.

Firstly, the computational approaches for predicting ncRNA-disease
associations ought to deal with sparse data problem. It bases on the reality that the
known ncRNA-disease associations' number is quite smaller compared to the
unknown associations. Hence, it is difficult to obtain a reliable network to represent
a reasonable biological network. Therefore, it limits prediction accuracy [11].
Secondly, due to the sparsity data problem, it causes another issue that the is
unbalancing of positive and negative samples in performing computational methods
for predicting ncRNA-disease associations. It is the reason that the prediction
performance of computational methods is not very reliable.


3

Thirdly, the similarity calculation in existing computational methods depends
excessively on known associations between ncRNAs and diseases. It could generate
the noticeable bias to construct computational models for predicting ncRNA-disease
associations. Therefore, it requires to reasonably fuse different similarity scores
from different souces of biological information to enhance ncRNA-disease
association prediction performance [10].
Fourthly, most of existing computational methods are not applicable to predict
associations for isolated diseases or ncRNAs (miRNAs or lncRNAs) which have not
any known association with other ncRNAs or other diseases in the examined data
sets. So, it is nessessary to combine different biological information to improve the
capability of prediction ncRNA-disease association for isolated cases.
Fifthly, there are too many parameters that need to be adjusted in many
computational methods leading to the difficulty in performing ncRNA-disease
association prediction. It means that the researchers need to develop more
computational methods which will be easier to employ in ncRNA-disease
association prediction.
And finally, since more and more biological databases become available so it

requires to effectively fuse data from multiple data sources to enhance the reliability
and performance of prediction.
Up to date, a numerous number of research are weekly published in scientific
journals or conferences to show new results of research on developing
computational methods for ncRNA-disease association prediction. Many of them
concentrates on solving the above mentioned limitations. Additionally, based on the
fact that selecting useful data from heterogeneous information to build up a reliable
HIN is still a challenge, it remains room for scientists and researchers to research
for constructing a reliable HIN and training an useful computational method to
achieve more decisive performance of ncRNA-disease associations prediction [11].
The future research on developing computational methods for ncRNA-disease
associations can follow the below aspects.


4

Firstly, the sparse data problem needs to be solved to enhance the reliability of
prediction performance. The sparse data problem can be solved by selecting
reasonable similarity calculation, network representation methods as well as
selecting reasonable and meaningful pre-processing algorithms or methods that
were already applied in other recent studies.
Secondly, the future research needs to integrate different biological datasets to
construct more reasonable similarities and to reduce the impact of relying too much
on known ncRNA-disease associations. Thereby, the performance and reliability of
prediction of computational methods can be enhanced.
Thirdly, the computational methods from other domains such as microbedisease associations prediction, metabolite-disease associations prediction, drugdisease associations prediction, drug-target prediction, and so on, can also be
applied in predicting ncRNA-disease association area. Therefore, the future research
can borrow the computational methods from these areas and acclimating them to
attain better performance of ncRNA-disease associations prediction.
It is the reason that the Ph.D student selects the topic “Link prediction in

heterogeneous information networks and its applications in predicting
associations between non-coding RNAs and diseases” for this dissertation.
 Dissertation objective and research problem
Through this dissertation, the research will focus on: proposing computational
methods or models to improve prediction performance for predicting human noncoding RNA-disease associations on heterogeneous information networks by
solving the following problems:
-

Solving the sparse data problem to improve the accuracy of human ncRNAdisease associations prediction performance.

-

Fusing multi-types of information from different biological datasets to have
more realistic similarities and to decrease the impact of depending on known
human ncRNA-disease associations excessively.


5

-

Inheriting the computational methods from other domains such as predicting of
microbe-disease

associations,

drug-disease

relationships,


drug-target

interactions and so forth, and improving them to achieve better performance in
predicting
human non-coding RNA-disease associations.
 Research questions need to be answered:
To solve the above problems and achieve the research objective, some
research
questions need to be answered as follows.
The first question is "How to solve the sparse data problem?". Up to date,
there are several methods which are used to decrease the effects of the sparsity data
problem. For example, the colaborative filtering (CF) algorithms, in the context of
recommender systems, have been used in different studies to mitigate the sparsity
data problem in revealing ncRNA-disease associations [13]. Weighted K-nearest
known neighbors (WKNKN) algorithm has been applied to pre-process data to
reduce the number of unknown ncRNA-disease associations in different works
[14]–[17]. It based on the assumption that the unknown associations could be proper
association in the datasets used to train the models by measuring a ncRNA or
disease's similarities to other ncRNAs or diseases, respectively.
Therefore, the dissertation research can employ the CF or WKNKN
algorithms to solve the sparse data problem depending on the
used biological datasets. However, integrating the biological datasets to
construct reasonable similarities among various data sources is also an issue. It
depends on the types of selected biological data which are used to measure
similarities. Hence, the next question that the dissertation research has to answer is
"Which are the types of biological data used to have resonable network
representations to predict associations between non-coding RNAs and diseases?".
Recently, various studies have integrated biological information from different data
sources to measure the similarities among divergent biological objects. For instance,



6

Ding et al. [18] used information of lncRNAs, diseases and genes to build a
tripartite graph to forecast lncRNA-disease associations. Yu et al. [13] relied on
lncRNAs, diseases and miRNAs' information and a lncRNA-miRNA-disease
network to reveal lncRNA-disease associations. Other works used multi-type
biological networks or multi-omics data to forecast ncRNA-disease associations
[19], [20]. Besides that, category of computational methods contains their own
issues. Therefore, an other question that the research has to answer is that how to
combine or integrate multi-models and multi-type biological information to
effectively overwhelm the issues of the single models and enhance prediction
performance.
 Thesis’s research scope and methodology:
To achieve the objective of proposing computational methods or models to
improve prediction performance for predicting human non-coding RNA-disease
associations on heterogeneous information networks, both theory and experimental
methodologies are employed in the thesis.
Theory research
For theory research, firstly, the author has to review literature to obtain
background knowledge of heterogeneous information networks, link prediction
problem, biological systems, biological objects as well as link prediction
applications in biological systems and so on. Secondly, the computational methods
for predicting ncRNA-disease associations are reviewed and analyzed to understand
the strengths as well as to detect weaknesses and problems of these methods.
Finally, some new computational methods are proposed by combining, integrating
and improving different types of biological information and different computational
methods to solve the detected weaknesses and problems.
Experimental research
To evaluate performance of proposed computational methods in the

dissertation, the author already implemented them in Python programming language
(including PyCharm IDE, Python’s libraries…) using different biological datasets.


7

After having experimental results, the prediction performance of proposed methods
are compared with other related approaches on the same experimental datasets.
Additionally, to support prediction performance reliability, the author also
employed case studies by checking whether the predicted ncRNA-disease
associations were confirmed in other biological literature or databases.
 Thesis’s scientific contributions:
The thesis has the following scientific contributions:
-

Contribution 1: Proposed an improved computational model by combining
collaborative filtering algorithm and a process of resource allocation on a
tripartite graph using multiple known associations' types to forecast ncRNAdisease associations.

-

Contribution 2: Proposed a new miRNA-disease associations prediction
method which used a WKNKN algorithm to pre-process data to decrease
unverified associations in miRNA-disease association dataset and uncover
latent associations between miRNAs and diseases using improved random walk
with restart (RWR) algorithm and fusing multiple similarities from HINs.
The contribution 1 is presented in Chapter 2 of the dissertation, related

contents of this contribution were published the Proceeding of the KSE2020
([VTN1], Scopus Indexed), BMC Medical Genomics journal in 2021 ([VTN2], ISI

Q2 journal), and Proceeding of the KSE2021 ([VTN3], Scopus Indexed).
The contribution 2 is presented in Chapter 3 of the dissertation, related
contents of this contribution were published in Scientific Reports journal (ISI Q1
journal) in 2021 [VTN4].
 Thesis’s structure:
The dissertation outline is illustrated in Figure 0.1, which contains
Introduction, three Chapters, and Conclusion and Future Works. Each part of the
dissertation
briefly described as follows:
Introduction

is


8

In this section, an overview of heterogeneous information networks and link
prediction in heterogeneous information networks are firstly introduced. Next, the
importance of developing ncRNA-disease associations prediction computational
methods as well as some limitations in predicting associations between non-coding
RNAs and diseases are presented. Then, the thesis objective, research problems and
research questions are figured out. And then, the thesis scope and methodology are
summarized. The thesis scientific contributions are shown in the next section. And
finally, the thesis structure is outlined.

Figure 0.1. The dissertation outline
Chapter 1. Background
In this chapter, firstly some fundamentals of HINs, biological systems, and
non-coding RNAs are provided. Secondly, the problem of link prediction in HIN(s)
and popular link prediction methods are summarized. Thirdly, an overview of



9

ncRNA-disease associations' computational methods along with their strengths as
well as weaknesses are shown. Inspired by these strengths and weaknesses, some
research directions of the thesis are drawn. And finally, some methods and metrics
used in the prediction performance's evaluation of proposed models in the next
chapters are presented.
Chapter 2. NcRNA-disease associations prediction with collaborative
filtering and resource allocation process on a tripartite graph
In this chapter, firstly some fundamentals of CF algorithm and process of
resource allocation on a tripartite graph are introduced.
Secondly, a new computational model for non-coding RNA-disease
associations prediction using a CF algorithm and a process of resource allocation on
a tripartite graph based on multi-type biological objects was proposed.
Finally, the newly proposed model was applied in two applications of miRNAdisease association prediction and lncRNA-disease association prediction to
demonstrate its outperformance in prediction compared to other related methods.
The proposed model can also be considered a useful tool for inferring ncRNAdisease associations due to its high performance in both inferring potential
associations between miRNAs and diseases and discovering new associations for
new diseases (or ncRNAs) without any known associations.
Chapter 3. MiRNA–disease associations prediction using improved random
walk with restart and integrating multiple similarities
In this chapter, a method named “Predicting miRNA–disease associations
using
improved random walk with restart and integrating multiple similarities” is
presented.
The proposed method uses a WKNKN algorithm as a pre-processing step to solve
the sparsity data issue. It also integrates multiple data sources to increase prediction
reliability. Besides that, it borrows a RWR method from the microbe-disease

association prediction field and improves the RWR process to uncover latent



×