Tải bản đầy đủ (.pdf) (6 trang)

MVRM A Hybrid Approach to Predict siRNA Efficacy

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (210.79 KB, 6 trang )

2015 Seventh International Conference on Knowledge and Systems Engineering

MVRM: A hybrid approach to predict siRNA
efficacy
Bui Ngoc Thang

Le Sy Vinh

Ho Tu Bao

University of Engineering and Technology, University of Engineering and Technology,
School of Knowledge Science
Vietnam National University, Hanoi
Vietnam National University, Hanoi
Japan Advanced Institute of Science
144 Xuanthuy, Caugiay, Hanoi, Vietnam 144 Xuanthuy, Caugiay, Hanoi, Vietnam
and Technology
Email:
Email:
Email:
Abstract—The discovery of RNA interference (RNAi) leads
to design novel drugs for different diseases. Selecting short
interfering RNAs (siRNAs) that can knockdown target genes
efficiently is one of the key tasks in studying RNAi. A number
of predictive models have been proposed to predict knockdown
efficacy of siRNAs, however, their performance is still far from
the expectation. This work aims to develop a predictive model to
enhance siRNA knockdown efficacy prediction. The key idea is to
combine both the rule–based and the model–based approaches.
To this end, views of siRNAs that integrate available siRNA design
rules are first learned using an adaptive Fuzzy C Means (FCM)


algorithm. The learned views and other properties of siRNAs
are combined to final representations of siRNAs. The elastic net
regression method is employed to learn a predictive model from
these final representations. Experiments on benchmark datasets
showed that the proposed method achieved stable and accurate
results in comparison with other methods.

I.

The rule–based approach proposes different rules to generate effective siRNAs. These rules were empirically designed
and examined based on small datasets. The first rational
siRNA design rule was detected by Elbashir et al. [6]. They
suggested that siRNAs of size 19–21 nt with 2 nt overhangs
at the 3 ends can efficiently degrade target genes mRNAs.
Scherer et al. [22] found that the thermodynamic properties
are important characteristics to design effective siRNAs for
inhibiting target specific mRNAs. Soon after that, various
rational design rules to generate effective siRNAs have been
proposed [21], [30], [1], [15], [37], [38]. For example, Uitei
and his colleagues [30] examined 72 siRNAs targeting six
genes and discovered four criteria for effective siRNA design:
(i) A or U at position 19, (ii) G or C effective at position 1,
(iii) at least five U or A residues from positions 13–19,
(iv) no GC stretch more than 9 nt. Amarzguioui and co–
workers [1] analyzed 46 siRNAs targeting genes and reported
the following rule of six criteria for effective siRNA design:
(i) ΔT3 = T3 − T5 , the difference between the number of A/U
residues in three terminal positions at the 3 end and at the 5
end (relative to the sense strand of the siRNA). ΔT3 > 1
is positively correlated; (ii) G or C residue at position 1,

positively correlated; (iii) an U residue at position 1, negatively
correlated; (iv) an A residue at position 6, positively correlated;
(v) A or U at position 19, positively correlated; (vi) G at
position 19, negatively correlated.

I NTRODUCTION

RNA interference (RNAi) is a cellular process in which
long double stranded RNA duplex or hairpin precursors are
cleaved into short interfering RNAs (siRNAs) by the ribonuclease III enzyme Dicer. siRNAs bind the RNA induced
silencing complex (RISC), then unwinded into sense and
antisense strands, after that antisense siRNAs bind to their
complementary target mRNAs and induce their degradation.
In 1998, Fire and Mello discovered the important role
of dsRNAs when they studied RNAi in the nematode worm
Caenorhabditis elegans (they were awarded the Nobel Prize in
Physiology or Medicine for their contributions to research on
RNAi in 2006). Studies on the discovery of RNAi have had
an immense impact on biomedical research and make RNAi as
a valuable tool to design novel medical applications [27], [7],
[13], [25], [17], [10]. In RNAi research, synthesizing of highly
effective siRNAs is a crucial task to design novel drugs for
the treatment of different diseases such as influenza A virus,
HIV, hepatitis B virus, RSV viruses, cancer disease and so
on. As a consequence, siRNA–based silencing is considered
as one of the most promising techniques in future therapy
and predicting knockdown efficacy of siRNAs is an essential
problem for effective siRNA selection [39], [40], [28], [31],
[32], [33], [34], [35].


However, the rule–based approach does not reach our
satisfaction. About 65% of siRNAs generated by these rules
have failed when experimentally tested. In particular, they were
90% in inhibition and nearly 20% of them were inactive [20].
The main reason is that siRNA design rules were empirically
analyzed on small datasets and siRNAs were synthesized
from specific genes. Therefore, they are in general poor to
individually design highly effective siRNAs
The model–based approach includes predictive models that
were learned from larger datasets by different machine learning
techniques. The performance of predictive models is more
accurate and reliable than that of the rule–based approach [24].
For example, Huesken and co–workers [12] proposed a new
algorithm, Biopredsi, by applying artificial neural networks to
a dataset of 2431 scored siRNAs. This dataset was widely used
as a benchmark to train and test other predictive models such as
the ThermoComposition21 [24], DSIR [28], i–Score [14] and
Scales models [36]. The predictive models are currently estimated as the best predictors [18], [36]. More recently, Sciabola

A number of algorithms have been proposed to design
and predict effective siRNAs. They could be categorized into
two approaches: the rule–based approach and the model–based
approach [14], [18], [23].
978-1-4673-8013-3/15 $31.00 © 2015 IEEE
DOI 10.1109/KSE.2015.29

120


et al. [23] employed three–dimension structural information

of siRNAs to increase performance of their model. A stable
predictive model [3] called BiLTR was developed to predict
knockdown efficacy of siRNAs.

TABLE I.

SEQUENCES

Properties
GC content
T

Although model–based methods are better than rule-based
methods, they suffer from some drawbacks. Their performance
is still slow and unstable. The predictive ability of these
models is considerably decreased and changed when tested
on independent datasets such as the performance of 18 current
models tested on three independent datasets [23]. Our analyses
reveal two main reasons of the models: (1) siRNAs datasets
were provided by different groups under different protocols
in different scenarios [16], [41] so the distributions of these
datasets are very different and siRNAs data are heterogeneous.
(2) The performance of machine learning methods also heavily
depends on the choice of data representation (or features) on
which they are applied. In the previous models, siRNAs were
encoded by binary, spectral, tetrahedron, and sequence representations. However, because of siRNA distribution diversity
and unsuitable measures based on these siRNA representations,
they can be inappropriate to represent siRNAs in order to build
a good model for predicting siRNA efficacy.


GC stretch
A/Us at five
positions of the 5‘end
A/Us at seven
positions of the 5‘end

Condition
From 0.3 to 0.6
Otherwise
>= 1
Otherwise
>= 9
Otherwise

Encoding column
(1,0,-1,-1) at column
(0,1,-1,-1) at column
(1,0,-1,-1) at column
(0,1,-1,-1) at column
(1,0,-1,-1) at column
(0,1,-1,-1) at column

>= 3
Otherwise

(1,0,-1,-1) at column (n + 4)
(0,1,-1,-1) at column (n + 4)

>= 5
Otherwise


(1,0,-1,-1) at column (n + 5)
(0,1,-1,-1) at column (n + 5)

(n + 1)
(n + 1)
(n + 2)
(n + 2)
(n + 3)
(n + 3)

Encoding siRNAs by content: Each siRNA is a sequence
of n nucleotides such as “GAAAGGAAUUGUAUAAAUC” .
There are five well-studied characteristics of an siRNA [26]:
(1) GC content, (2) the difference of A/U in 3 nucleotides at
the two ends ( T), (3) GC stretch, (4), (5) the number of A/U
at five and seven positions of the 5’ end of the antisense strand.
This step encodes siRNA sequence si (i = 1 . . . m) by
a binary matrix Mi of size 4 × (n + 5) in which 4 rows
represent for 4 nucleotide types and (n + 5) columns represent
for n nucleotides and 5 siRNA characteristics. The first n
columns represent for n nucleotides, i.e. column c (c = 1 . . . n)
is binary vector of size 4 × 1 representing the nucleotide
at position c on the siRNA sequence. Specifically, four nucleotides A, C, G, and U are encoded by encoding vectors
(1, 0, 0, 0)T , (0, 1, 0, 0)T , (0, 0, 1, 0)T and (0, 0, 0, 1)T , respectively. The last five columns of the matrix represent for five
characteristics of siRNA. They are computed and encoded as
binary vectors as described in Table I. The encoding matrix
M of an siRNA sequence of 19 nucleotides GAAAGGAAUUGUAUAAAUC” is described in Table II.

In this paper, we develop a hybrid approach, named

MVRM, to predict the siRNA knockdown efficacy. The method
combines both design rules and machine learning methods
to build a predictive model. To this end, we focus on the
representation of siNRAs. Available siRNA design rules are
considered as prior background knowledge for generating
views to represent siRNAs. Each view captures characteristics
of a siRNA design rule. These views are then learned by
exploiting the fuzzy C means algorithm. A new representation
of siRNAs is composed by learned views and other properties
of siRNAs such as melting temperature, molecular weight and
thermodynamic values. After transforming siRNAs to the new
representation, a predictive model was learned by applying a
regularized method, Elastic Net, to predict knockdown efficacy
of siRNAs.

Encoding design rules to views: This step encode each
design rule ri (i = 1 . . . k) by a matrix Ti (view Ti ) of size
4 × (n + 5) in which 4 rows represent for 4 nucleotides types
and (n + 5) columns represents for n nucleotides and 5 siRNA
characteristics. Column j th (j = 1 . . . n) of the matrix shows
the knockdown efficacy of nucleotides A, C, G, U. The last five
columns describe the knockdown efficacy of the five siRNA
characteristics.

Our method is experimentally compared with other methods on benchmark datasets. Experiments show promising results that the performance of the MVRM is comparable or
better than that of other methods.
II.

T HE FIVE WELL - STUDIED CHARACTERISTICS OF SI RNA


The knockdown efficacy of view Ti has to satisfy constrains
of the siRNA design rule. The design rule ri propositionally
describes the occurrence or absence of nucleotides at different
positions on effective siRNAs and other mentioned siRNA
characteristics. Thus, if design rule ri states the occurrence
(or absence) of some nucleotides on the j th position, then
their corresponding values in the view Ti would be greater (or
smaller) than other values at column j. Similarly, if the siRNA
design rule ri shows the characteristics j th , the corresponding
value at column (n + j)th of matrix Ti would be greater than
the other values in the column.

M ETHODS

Our model, MVRM, is a hybrid of the rule–based and the
model–based approaches so it consists of two main phases:
Learning siRNA views from design rules to build new representations of siRNAs and building a predictive model from
these new representations to predict knockdown efficacy of
siRNAs.
A. Learning siRNA views
Given a dataset of m siRNA sequences S
=
{s1 , s2 , . . . , sm } with the same length n. The knownkdown
efficacy of sequence si ∈ S is ei (i = 1 . . . m). A set of k
design rules R = {r1 , r2 , . . . , rk } are collected from previous
rule-based studies. The learning siRNA views includes four
steps: Encoding siRNAs by content, Encoding rules to views,
Learning siRNA views, and Encoding siRNAs by learned
views.


For example, consider a rule r and its encoding matrix
T, the design rule shows that at position 19, nucleotides A
is effective and nucleotide C is ineffective. It means that the
knockdown efficacy of nucleotide A is larger than that of the
other nucleotides and the knockdown efficacy of nucleotide C
is smaller than that of the other nucleotides. T[1,19], T[2,19],
T[3,19], and T[4,19] are the knockdown efficacy of A, C, G,
121


Where Tj [., c] is a vector corresponding to the cth column
of the matrix Tj .

TABLE II.
T HE ENCODING MATRIX M OF SI RNA SEQUENCE
GAAAGGAAUUGUAUAAAUC. T HE FIRST 19 COLUMNS ENCODE FOR
19 NUCLEOTIDES OF THE SI RNA SEQUENCE . T HE LAST 5 COLUMNS
ENCODE FOR 5 CHARACTERISTICS OF THE SI RNA SEQUENCE .
Posision
siRNA

1 2 3 4 5 ...
GAAAG ....

18
U

19
C


20

21

22

23

24

Encoding
Matrix T

0
0
1
0

0
0
0
1

0
1
0
0

0
1

-1
-1

0
1
-1
-1

0
1
-1
-1

1
0
-1
-1

1
0
-1
-1

1
0
0
0

1
0

0
0

1
0
0
0

0
0
1
0

...
...
...
...

Algorithm 1 describes two steps including the computing
membership values of encoding matrices and updating matrices
corresponding to views. These two steps are repeated until
membership values and views meet convergence criteria.
Encoding siRNAs by views: To obtain a final representation of siRNAs, learned views are linearly combined and other properties of siRNAs are employed. In
particular, nucleotides A, C, G, and U of siRNAs at
a position c (c = 1 . . . n) are represented by vectors
(T1 [1, c], . . . , Tk [1, c]), (T1 [1, c], . . . , Tk [1, c]), (T1 [1, c], . . . ,
Tk [1, c]), and (T1 [1, c], . . . , Tk [1, c]), respectively. If the GC
content of siRNAs satisfies its condition (see Table 1), it is
represented by the vector (T1 [1, n + 1], . . . , Tk [1, n + 1]). In
contrast, it is represented by (T1 [2, n + 1], . . . , Tk [2, n + 1]).

Four other characteristics of siRNAs are computed in the
similar way. In short, each siRNA sequences is encoded by a
vector of k×(n+5). Moreover, other five properties of siRNAs
(melting temperature, molecular weight, three thermodynamic
properties consisting of enthalpy, entropy, and free energy )
are added to the final representation. They are calculated by
the nearest neighbor method [48]. As a result, each siRNA is
encode by a vector of k × (n + 5) + 5.

and U, respectively. The rule at position 19 can be expressed
into specific constrains on matrix T as follows


T [2, 19] − T [1, 19] < 0, i.e., A is effective than C



T [3, 19] − T [1, 19] < 0, i.e., A is effective than G



T [4, 19] − T [1, 19] < 0, i.e., A is effective than U



T [2, 19] − T [3, 19] < 0, i.e., C is ineffective than G



T [2, 19] − T [4, 19] < 0, i.e., C is ineffective than U


Let Gi (i = 1 . . . k) be the set of specific constrains of rule
ri on matrix Ti where each constraint of Gi is in the form
(T [p, j] − T [q, j] < 0) where row p = 1 . . . 4 and column
j = 1 . . . n + 5.
Learning views: The siRNA set {s1 , s2 , . . . , sm } will play
as the training set to learn k views (optimize k matrices
T1 , . . . , Tk ). Learning views can be considered as a clustering
problem[5] where k matrices are considered as centers of k
clusters. Each encoding matrix Mi of siRNA si is assigned
to views Tj with a membership value uij (i = 1 . . . m; j =
1 . . . k). It means that siRNA sequences can be generated by
different views at different confidences.

B. Learning a predictive model
This step will build a predictive model using the new
representation of siRNAs. The elastic net method [Zou, H.
et al., 2005] is applied to build the model for predicting
knockdown efficacy of siRNAs. This method is not only to
build the model but also to select important features that effect
to the target label. In addition, based on the lasso regularization
term of elastic net method, signification variables or important
characteristics that influence the knockdown efficacy of siRNAs are detected.

We employ the FCM algorithm [2] with k clusters to optimize k views (matrices) and membership values by minimizing
the following objective function.

III.
m


k

R=

u2ij ||Mi − Tj ||2F ro

This section presents experimental evaluation by comparing the proposed method MVRM (multiple view based
regression model) with recent methods for siRNA knockdown
efficacy prediction on four benchmark datasets.

(1)

i=1 j=1

subject to:
1)
2)

k constraint sets G1 , . . . , Gk where Gi set of specific
constrains of rule ri on matrix Ti
m
i=1 uij = 1, j = 1, . . . k

where || ◦ ||F ro is the Frobenius norm to calculate norm of
a matrix. Membership values and matrices can be solved by
using an iterative method: each column of a matrix is derived
while keeping the other ones. The final solution is computed
as follows:
uij =


1
k
z=1

Tj [., c] =

||Mi −Tj ||F ro
||Mi −Tz ||F ro
m
i=1

u2ij Mi [., c]
m
2
i=1 uij

2

E XPERIMENTAL E VALUATION

(2)



The Huesken dataset of 2431 siRNA sequences targeting 34 human and rodent mRNAs, commonly divided
into the training set HU train of 2182 siRNAs and the
testing set HU test of 249 siRNAs [12].




The Reynolds dataset of 240 siRNAs [21].



The Vicker dataset of 76 siRNA sequences targeting
two genes [29].



The Harborth dataset of 44 siRNA sequences targeting
one gene [9].

We employed five siRNA design rules (k = 5) to learn
five views of siRNAs. Specifically, the five design rules are
Reynolds rule, Uitei rule, Amarzguioui rule, Jalag rule, Hsieh
rule [21], [30], [1], [11], [15]. The HU train set was used to
learn these views and MVRM model. The other datasets were
used to comparative evaluation.

(3)

122


Algorithm 1 Multi-view Learning
Input: A dataset S = {s1 , s2 , . . . , sm } where si , i =
1 . . . m are siRNA sequences of length n; a set R of k
design rules. tM ax is the number of iterations.
Output: k matrices (views) T1 , T2 , . . . , Tk .
main

Encode siRNAs by content.
for ri in R do
– Form the set of constraints Gi based on ri
– Initialize the view Ti satisfying Gi .
end for
t = 0 { Iterative step}
repeat
t←t+1
{Compute membership values as follows}
for i = 1 to m do
for j = 1 to k do
(t)
Compute uij using equation (2)
end for
end for
{Update views as follow}
for j = 1 to k do
for c = 1 to n + 5 do
Compute Tj [., c](t) using equation (3)
if (Tj [., c](t) satisfies the constraints Gj ) then
Tj [., c] ← Tj [., c](t)
end if
end for
end for
Tp(t) −Tp(t−1) F ro
k

and
until
(t−1)

p=1
(t−1)
u(t)
qp −uqp
(t−1)
uqp
2

Tp

2



Fig. 2. Coefficients of the MVRM model show the importance of 125 features
TABLE III.
Algorithm
GPboot[42]
Uitei[30]
Amar [1]
Hsieh[11]
Takasaki[43]
Reynolds 1[21]
Reynolds 2[21]
Schawarz[37]
Khvorova[44]
Stockholm 1[45]
Stockholm 2[45]
Tree[45]
Luo[46]

i-score[14]
BIOPREDsi[12]
DSIR[28]
Katoh[47]
SVM[23]
BiLTR[3]
MVRM model

F ro

p=k,q=m
1 p=1,q=1

or (t > tM ax )

end main

T HE R

VALUES OF 18 MODELS AND
INDEPENDENT DATA SETS

RReynolds
(244si/7g)
0.55
0.47
0.45
0.03
0.03
0.35

0.37
0.29
0.15
0.05
0.00
0.11
0.33
0.54
0.53
0.54
0.40
0.54
0.57
0.6

RV icker
(76si/2g)
0.35
0.58
0.47
0.15
0.25
0.47
0.44
0.35
0.19
0.18
0.15
0.43
0.27

0.58
0.57
0.49
0.43
0.52
0.58
0.614

MVRM

ON THREE

RHarborth
(44si/1g)
0.43
0.31
0.34
0.17
0.01
0.23
0.23
0.01
0.11
0.28
0.41
0.06
0.40
0.43
0.51
0.51

0.44
0.54
0.57
0.52

tal efficacy through cross validation. We used five design rules
and five other properties of siRNAs in learning our model
so the final representation has 5 × 24 + 5 = 125 features.
After learning the model, 78 important features that influence
the knockdown efficacy of siRNAs were chosen. Figure 2
describes the influencing ability of 125 features. During the
learning process, the coefficients of less important features are
driven to zero. Based on the coefficients of the MVRM model,
important features can be easily selected in order to design
effective siRNAs
The MVRM model was compared to most of state–of–the–
art methods. For a fair comparison, we carried out experiments
on MVRM in the same conditions as reported by other
methods. Concretely, the comparative evaluation is as follows
Fig. 1.
values

Upper and lower curves of means squared error as a function of λ

1)

The turning parameter of the objective function of the
model was estimated by employing 10–fold cross validation.
Figure 1 shows the curves of upper and lower bounds of mean
squared error rates between predicted efficacy and experimen123


Comparison of MVRM with BIOPREDsi [12], Thermocomposition21 [24], DSIR [28], SVM [23] and ,
BiLTR [3] when trained on the HU train and tested
on the HU test dataset. The Pearson correlation coefficients of those five models are 0.66, 0.66, 0.67 and
0.80, 0.67 respectively. The performance of MVRM
estimated on the HU test is 0.66. The performance of


2)

MVRM model is similar to that of other models but
less than that of SVM model. The reason is that SVM
model uses positional features and 3D information.
This 3D feature captures the flexibility and strain
of siRNAs that can be important characteristics for
siRNAs of the HU test set extracted from human
NCI–H1299, Hela genes and rodent genes [12].
Comparison of MVRM with 19 models including
BIOPREDsi, DSIR, SVM, and BiLTR when all of
models were trained on the HU train set and tested
on three independent datasets of Reynolds, Vicker
and Harborth. The Pearson correlation coefficients
of MVRM model are 0.6, 0.614, and 0.52 when
tested Reynolds, Vicker and Harborth datasets, respectively. Table III shows that the MVRM considerably achieved results higher than the first 17 models.
It was better than SVM and BiLTR models when
tested on the first two datasets. The MVRM was not
as good as BiLTR on the Harborth dataset. However,
one limitation of BiLTR model is computational cost
to train transformation matrices and parameters. It
took about 5 days to train BiLTR while only about

five minutes to train MVRM model. Besides that,
unlike most of other models, the MVRM model
produces the stable results across each of independent
siRNA datasets.

experiments, the MVRM achieves the best results on the
Reynolds and Vicker datasets. Additionally, the performance
of MVRM model is higher than that of the other models except
the SVM and BiLTR models when tested on the Harborth
dataset (Table III).
ACKNOWLEDGMENT
Bui Ngoc Thang and Le Sy Vinh are financially supported
by Vietnam National Foundation for Science and Technology
(102.01-2013.04).
R EFERENCES
[1]
[2]
[3]
[4]

[5]
[6]
[7]

In these comparative studies, it was found that the performance of MVRM is more stable and higher than that of other
models. The reason is that previous siRNA representations
can be unsuitable to represent siRNAs provided different
groups under different protocols. In our proposed method,
the representation is enriched by incorporating background
knowledge of siRNA design rules. Therefore, it can capture

the distribution diversity of siRNA data.

[8]

[9]

[10]

As presented in the experimental comparative evaluation,
MVRM achieved better results than most other methods in
predicting siRNA knockdown efficacy.
IV.

[11]

C ONCLUSION

[12]

In this paper, we have proposed a stable and accurate
method to predict the knockdown efficacy of siRNA sequences.
In the model, to enrich siRNA representation, views of siRNAs are constructed and learned by incorporating background
knowledge of available design rules. By combining these
views, an appropriate siRNA representation is also developed
to represent siRNAs belonging to different distributions that
are provided by research groups under different protocols.

[13]

[14]


[15]

The experimental comparative evaluation on commonly
used datasets with standard evaluation procedure in different
contexts shows that the proposed method achieved promising
results. There are some reasons for that. First, it is expensive
to experimentally analyze the knockdown efficacy of siRNAs,
and thus most of available datasets have relatively small size
leading to limited results. Second, MVRM has its advantages
by incorporating domain knowledge (siRNA design rules)
experimentally found from different datasets. Third, MVRM
is generic and can be easily exploited when new design rules
are discovered. When our proposed model was tested on the
three independent datasets generated by different empirical

[16]

[17]
[18]

[19]

124

Amarzguioui M, Prydz H, An algorithm for selection of functional siRNA
sequences, Biochem Biophys Res Commun, 316:1050–8, 2004.
Bezdek JC, Ehrlich R, Full W, FCM: The fuzzy c-means clustering
algorithm, Computers & Geosciences, 10 (2): 191–203, 1984.
Bui TN, Ho TB, Tatsuo K, A semi-supervised tensor regression model

for siRNA efficacy prediction, BMC Bioinformatics, 16: 80, 2015.
Chang PC, Pan WJ, Chen CW, Chen YT, Chu YW, A design engine of
siRNA that integrates SVMs prediction and feature filters, Biocatalysis
and Agricultural Biotechnology, 1:129–134, 2012.
Chang X., Dacheng T., Chao X., A Survey on Multi-view Learning, CoRR
abs/1304.5634, 2013.
Elbashir SM, Lendeckel W, Tuschl T, RNA interference is mediated by
21– and 22–nucleotide RNAs, Genes Dev, 2001, 15:188–200.
Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Klaus W, Tuschl T,
Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured
mammalian cells, Nature, 411:494–498, 2001.
Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H, Li T, Integrated
siRNA design based on surveying of features associated with high RNAi
effectiveness, BMC Bioinformatics, 7:516, 2006.
Harborth J, Elbashir SM, Vandenburgh K, Manninga H, Scaringe SA,
Weber K, Tuschl T, Sequence, chemical, and structural variation of small
interfering RNAs and short hairpin RNAs and the effect on mammalian
gene silencing, Antisense Nucleic Acid Drug Dev, 13:83–105, 2003.
Hannon GJ, Rossi JJ, Unlocking the potential of the human genome
with RNA interference, Nature, 43:371–378, 2004.
Hsieh AC, Bo R, Manola J, Vazquez F, Bare O, Khvorova A, Scaringe S,
Sellers WR, A library of siRNA duplexes targeting the phosphoinositide
3-kinase pathway: determinants of gene silencing for use in cell-based
screens, Nucleic Acids Res, 32:893–901, 2004.
Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J,
Mellon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt
F, Hall J, Design of a Genome–Wide siRNA Library Using an Artificial
Neural Network, Nature Biotechnology, 23:955–1001, 2005.
Hutvagner G, McLachlan J, Balint E, Tuschl T, Zamore PD, A cellular
function for the RNA interference enzyme Dicer in small temporal RNA

maturation, Science, 293:834–838, 2001.
Ichihara M, Murakumo Y, Masuda A, Matsuura T, Asai N, Jijiwa M,
Ishida M, Shinmi J, Yatsuya H, Qiao S et al., Thermodynamic instability
of siRNA duplex is a prerequisite for dependable prediction of siRNA
activities, Nucleic Acids Res, 35:e123, 2007.
Jagla B, Aulner N, Kelly PD, Song D, Volchuk A, Zatorski A, Shum
D, Mayer T, De Angelis DA, Ouerfelli O, Rutishauser U, Rothman JE,
Sequence characteristics of functional siRNAs, RNA, 11:864–872, 2005.
Klingelhoefer JW, Moutsianas L, Holmes CC, Approximate Bayesian
feature selection on a large meta-dataset offers novel insights on factors
that effect siRNA potency, Bioinformatics, 25:1594–1601, 2009.
Meister G, Tuschl T, Mechanisms of gene silencing by double-stranded
RNA, Nature, 43:343–349, 2004.
Mysara M, Elhefnawi M, Garibaldi JM, MysiRNA: improving siRNA
efficacy prediction using a machine-learning model combining multitools and whole stacking energy, J Biomed Inform, 45:528–34, 2012.
Qiu S, Lane T, A Framework for Multiple Kernel Support Vector
Regression and Its Applications to siRNA Efficacy Prediction, IEEE/ACM
Trans Comput Biology Bioinform, 6:190–199, 2009.


[20]

[21]

[22]
[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]
[32]
[33]
[34]

[35]

[36]

[37]

[38]
[39]

[40]
[41]

[42]

[43]


Ren Y, Gong W, Xu Q, Zheng X, Lin D, et al., siRecords: an extensive
database of mammalian siRNAs with efficacy ratings, Bioinformatics,
22:1027–1028, 2006.
Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova
A, Rational siRNA design for RNA interference, Nat Biotechnol, 22:326–
330, 2004.
Scherer LJ, Rossi JJ, Approaches for the sequence-specific knockdown
of mRNA, Nat. Biotechnol., 21:1457–1465, 2003.
Sciabola S, Cao Q, Orozco M, Faustino I, Stanton RV, Improved nucleic
acid descriptors for siRNA efficacy prediction, Nucl Acids Res, 41:1383–
1394, 2013.
Shabalina SA, Spiridonov AN, Ogurtsov AY, Computational models
with thermodynamic and composition features improve siRNA design,
BMC Bioinformatics, 7:65, 2006.
Sudarsana LR, Sarojamma V, Ramakrishna V, Future of RNAi in
medicine: a review, World J Med Sci, 2:1–14, 2007.
Takasaki S, Methods for Selecting Effective siRNA Target Sequences
Using a Variety of Statistical and Analytical Techniques, Methods Mol
Biol, 942:17–55, 2013.
Tuschl T, Zamore PD, Lehmann R, Bartel DP, Sharp PA, Targeted
mRNA degradation by double-stranded RNA in vitro, Genes Dev.,
13:3191–3197, 1999.
Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y, An accurate and
interpretable model for siRNA efficacy prediction, BMC Bioinformatics,
7:520, 2006.
Vickers TA, Koo S, Bennett CF, Crooke ST, Dean NM, Baker BF,
Efficient reduction of target RNAs by small interfering RNA and RNase
H-dependent antisense agents. A comparative analysis, J Biol Chem.,
278:7108–7118, 2003.

Ui-Tei K, Naito Y, Takahashi F, Haraguchi T, Ohki–Hamazaki H, Juni
A, Ueda R, Saigo K, Guidelines for the selection of highly effective
siRNA sequences for mammalian and chick RNA interference, Nucleic
Acids Res., 32:936–948, 2004.
Ui–Tei K: Optimal choice of functional and off–target effect–reduced
siRNAs for RNAi therapeutics, Front Genet, 4:107, 2013.
Angart P, Vocelle D, Chan C, Walton SP, Design of siRNA therapeutics
from the molecular scale, Pharmaceuticals, 6:440–468, 2013.
Gavrilov K, Saltzman WM, Therapeutic siRNA: principles, challenges,
and strategies, Yale J. Biol. Med., 85:187–200, 2012.
Mutisya D, Selvam C, Lunstad BD, Pallan PS, Haas A, Leake D, Egli
M, Rozners E, Amides are excellent mimics of phosphate internucleoside
linkages and are well tolerated in short interfering RNAs, Nucleic Acids
Res, 42(10):6542–51, 2014.
Deng Y, Wang CC, Choy KW, Du Q, Chen J, Wang Q, Li L, Chung TK,
Tang T, Therapeutic potentials of gene silencing by RNA interference:
principles, challenges, and new strategies, Gene, 538(2):217–27, 2014.
Matveeva O, Nechipurenko Y, Rossi L, Moore B, Ogurtsov AY, Atkins
JF, et al., Comparison of approaches for rational siRNA design leading
to a new efficient and transparent method, Access, 35:1–10, 2007.
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD, Asymmetry in the assembly of the RNAi enzyme complex, Cell, 115(2):199–
208, 2003.
Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs
exhibit strand bias, Cell, 115(2):209–216, 2003.
Schramm G, Ramey R, siRNA design including secondary structure
target site prediction, Nature Medicine, 2(8). doi: 10.1038/nmeth780,
2005. (Application Notes).
Hannon GJ, Rossi JJ, Unlocking the potential of the human genome
with RNA interference, Nature, 431:371–378, 2004.
Qi L, Han Z, Ruixin Z, Ying X, and Zhiwei C, Reconsideration of in

silico siRNA design from a perspective of heterogeneous data integration:
problems and solutions, Brief Bioinform., 15:292–305, 2014.
Saetrom P, Predicting the efficacy of short oligonucleotides in antisense
and RNAi experiments with boosted genetic programming, Bioinformatics, 20(17):3055–3063, 2004.
Takasaki S, Kotani S, Konagaya A, An effective method for selecting
siRNA target sequences in mammalian cells, Cell Cycle, 3(6):790–5,
2004.

[44]
[45]

[46]

[47]
[48]

[49]

[50]

125

Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs
exhibit strand bias, Cell, 115:209–216, 2003.
Chalk A, Wahlestedt C, Sonnhammer E, Improved and automated prediction of effective siRNA, Biochem Biophys Res Commun., 319(1):264–
274, 2004.
Luo K, Chang D, The gene–silencing efficiency of siRNA is strongly
dependent on the local structure of mRNA at the targeted region,
Biochem Biophys Res Commun, 318 (1):303–310, 2004.
KatohT, Suzuki T, Specific residues at every third position of siRNA

shape its efficient RNAi activity, Nucleic Acids Res, 35:e27, 2007.
SantaLucia Jr., J., A unified view of polymer, dumbbell, and oligonucleotide DNA nearest–neighbor thermodynamics, Proceedings of the
National Academy of Science USA, 95 :1460–1465, 1998.
Zou, H., Hastie T., Regularization and variable selection via the elastic
net, Journal of the Royal Statistical Society, Series B, 67(2): 301–320,
2005.
H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England:
Addison-Wesley, 1999.



×