Giải bài toán tìm Motif bằng thuật toán đàn kiến

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (782.15 KB, 24 trang )

Trường Đại học Công nghệ-
Đại học Quốc gia Hà Nội

CÔNG TRÌNH DỰ THI
GIẢI THƯỞNG “SINH VIÊN NGHIÊN CỨU KHOA HỌC”
NĂM 2012

Tên công trình: Giải bài toán tìm Motif bằng thuật
toán đàn kiến

Họ và tên sinh viên: Nguyễn Mạnh Hà Nam
Nguyễn Hải Linh Nam

Lớp: K53CA Khoa: KHMT

Người hướng dẫn: PGS, TS Hoàng Xuân Huấn
ThS Đỗ Đức Đông

2

Abstract

A challenging problem in molecular biology is the identiﬁcation of the
speciﬁc binding sites of transcription factors in the promoter regions of
genes referred to as motifs. This report presents an Ant Colony Optimization
approach that can be used to provide the motif ﬁnding problem with
promising solutions. Further, it searches both in the space of starting
positions as well as in the space of motif patterns so that it has more chances
to discover potential motifs. The approach has been implemented and tested
on some datasets. Its performance was compared with a very popular
algorithm for ﬁnding motifs MEME. Experimental results show that our
approach could achieve comparable or better performance in terms of motif
accuracy within a reasonable computational time

3

1 Introduction

Finding the location of the common motif, shared by a set of DNA sequences, in each
sequence has became a fundamental problem in bioinformatics with important
applications in locating regulatory sites and drug target identification [14]. The motif
finding problem has been formally considered as a difficult pattern recognition problem.
Most developed motif finding algorithms use either approximate or heuristic techniques
to obtain near optimal solutions at relatively low computational cost. Some of them carry

out the search in the space of possible starting positions, whereas others search in the
space of all possible motifs based on a given model. Recent researches covering most of
the relevant techniques and approaches for motif finding, as well as several of the
benchmarks algorithms included in this work can be found in [3,12].

Moreover, bio-inspired algorithms and other metaheuristics have been also proposed.
Examples of these algorithms include genetic algorithms [6,8], genetic programming
[15], and simulated annealing [9]. Although these methods M. Dorigo et al. (Eds.): ANTS
2010, LNCS 6234, pp. 464–471, 2010.c_Springer-Verlag Berlin Heidelberg 2010Motif
Finding Using Ant Colony Optimization 465 have been shown to generate acceptable
results in terms of the quality of the solutions found, the motif finding problem is still
unsolved. One solution technique to respond to this challenge is a swarm-based approach,
a natural metaphor, called Ant Colony Optimization (ACO) [5,4]. ACO is a population-
based stochastic search method inspired by the foraging behavior of ant colonies. This
metaheuristic has been used successfully for computing the best-known solutions for a
wide range of combinatorial optimization problems (See [5] for more details). In [11], an
ACO algorithm was developed to find a set of better initial positions for the Gibbs
sampler (GS) [10] in order to improve its efficiently in term of time computing time and
score. However, it does not incorporate any form of heuristic information. Moreover, a
specific ant colony system was used for predicting the MHC class II binders [7].

In this study, we present a new motif finding approach based on ant colony optimization
called SMMAS that is improved from MAX-MIN Ant System [13], one of the best
performing variants of ACO metaheuristic, mombines with other techniques, such as
Local Search to improve the result. Unlike some other motif finding techniques, SMMAS
4

searches both in the space of starting positions as well as in the space of motif patterns.
Due to this feature, it has more chances to find potential motif patterns. Although our
approach is also valid for protein sequences, we apply it only to DNA sequences.

The rest of the paper is organized as follows. In Section 2, the motif finding problem is
formally introduced. Section 3 is the general knowledge about Ant Colony Optimization.
Section 4 presents our new ACO algorithm for motif finding problem. Section 5
describes the conducted experiments. Finally, concluding remarks given in Section 6.

2 The Motif Finding Problem

2.1 Motif concept
A DNA motif is defined as a nucleic acid sequence pattern that has some biological
significance such as being DNA binding sites for a regulatory protein, i.e., a transcription
factor. Normally, the pattern is fairly short (5 to 20 base-pairs (bp) long) and is known to
recur in different genes or several times within a gene. DNA motifs are often associated
with structural motifs found in proteins. Motifs can occur on both strands of DNA.
Transcription factors indeed bind directly on the double-stranded DNA. Sequences could
have zero, one, or multiple copies of a motif. In addition to the common forms of DNA
motifs two special types of DNA motifs are recognized: palindromic motifs and spaced
dyad (gapped) motifs. A palindromic motif is a subsequence that is exactly the same as
its own reverse complement, e.g., CACGTG. A spaced dyad motif consists of two
smaller conserved sites separated by a spacer (gap). The spacer occurs in the middle of
the motif because the transcription factors bind as a dimer. This means that the
transcription factor is made out of two subunits that have two separate contact points with
the DNA sequence. The parts where the transcription factor binds to the DNA are
conserved but are typically rather small (3–5 bp). These two contact points are separated
by a non-conserved spacer. This spacer is mostly of fixed length but might be slightly
variable.
5

2.2 Finding Motif Problem

Given a set of DNA sequences (promoter region), the motif finding problem is the task of
detecting overrepresented motifs as well as conserved motifs from orthologous sequences
that are good candidates for being transcription factor binding sites. A large number of
algorithms for finding DNA motifs have been developed. Most of these algorithms are
designed to deduce motifs by considering the regulatory region (promoter) of several
coregulated genes from a single genome. It is assumed that coexpression of genes arises
mainly from transcriptional coregulation. As coregulated genes are known to share some
similarities in their regulatory mechanism, possibly at transcriptional level, their promoter
regions might contain some common motifs that are binding sites for transcription
factors. A sensible approach to detect these regulatory elements is to search for
statistically overrepresented motifs in the promoter region of such a set of coexpressed
genes. A statistically overrepresented motif means a motif that occurs more often than
one would expect by chance. Therefore, these algorithms search for overrepresented
motifs in this collection of promoter sequences. However, most of these motif finding
algorithms have been shown to work successfully in yeast and other lower organisms, but
perform significantly worse in higher organisms. To overcome this difficulty recent motif
finding algorithms are taking advantage of cross-species genome comparison or
phylogenetic footprinting. The simple premise underlying phylogenetic footprinting is
Figure 1.1: Example of Motif
6

that selective pressure causes functional elements to evolve at a slower rate than non-
functional sequences. This means that usually well conserved sites among a set of

orthologous promoter regions are excellent candidates for functional regulatory elements
or motifs. Several motif finding algorithms have been developed based on phylogenetic
footprinting. Most recently, algorithms that integrate DNA sequence data from
coregulated genes and phylogenetic footprinting have significantly improved motif
finding from genomic sequences. Efforts have also focused toward developing algorithms
that incorporate parameters that are useful for motif finding in higher organisms. Stormo
presented an excellent history of development and application of computer algorithms for
DNA motif finding. Since then a remarkably rapid development has occurred in DNA
motif finding algorithms and a large number of DNA motif finding algorithms have been
developed and published.
In Bioinformatics, a formal description of this problem can be viewed as follows. Given a
set of DNA sequences S = S
1
,S
2
, , S
N
of common length W
1
. Find the promising motif
pattern X = x
1
x
2
x
i
x
l
of length l, x
i

∈
{A, T,C,G} and the starting locations of its
occurrences on all sequences in S. The selection of a particular motif pattern is based on a
deﬁned score function that measure the similarity between the motif pattern and its
occurrences. There are several methods for scoring a motif pattern. Our proposed
approach uses consensus score and information content as score functions. To illustrate
how to compute these score functions, consider a candidate motif pattern that can be
generated by choosing a random position from each sequence. Then, the patterns starting
at these positions are aligned to form an N × l alignment matrix.
Therefore, the candidate motif pattern can be represented by a count-based proﬁle C
where C(i, j) is the count of nucleotide i on the column j of the alignment matrix and its
corresponding consensus score (CSc) is deﬁned as:

The information content (IC) score function can be easily computed as follows:

7

Where each element Q(i, j) indicates the frequency of the nucleotide i to be in position j
of the motif pattern and B
0
(i) denotes its background frequency, i.e. the observed
frequency of nucleotide i overall all sequences in the dataset.

3 Ant Colony Optimization Algorithm
3.1 ACO algorithms

ACO algorithms make use of simple agents called ants which iteratively construct

candidate solution to a combinatorial optimization problem. The ants’ solution
construction is guided by (artificial) pheromone trails and problem-dependent heuristic
information. In principle, ACO algorithms can be applied to any combinatorial
optimization problem by defining solution components which the ants use to iteratively
construct candidate solutions and on which they may deposit pheromone (see [10,11] for
more details). An individual ant constructs candidate solutions by starting with an empty
solution and then iteratively adding solution components until a complete candidate
solution is generated. We will call each point at which an ant has to decide which
solution component to add to its current partial solution a choice point. After the solution
construction is completed, the ants give feedback on the solutions they have constructed
by depositing pheromone on solution components which they have used in their solution.
Typically, solution components which are part of better solutions or are used by many
ants will receive a higher amount of pheromone and, hence, will more likely be used by
the ants in future iterations of the algorithm. To avoid the search getting stuck, typically
before the pheromone trails get reinforced, all pheromone trails are decreased by a factor
p .

8

The ants’ solutions are not guaranteed to be optimal with respect to local changesand
hence may be further improved using local search methods. Based on this observation,
the best performing ACO algorithms for many NP-hard static combinatorial problems
are in fact hybrid algorithms combining probabilistic solution construction by a colony of
ants with local search algorithms. In such hybrid algorithms, the ants can be seen as
guiding the local search by constructing promising initial solutions, because ants
preferably use solution components which, earlier in the search, have been contained in
good locally optimal solutions.

In general, all ACO algorithms for static combinatorial problems follow a specific
algorithmic scheme outlined in Figure 1. After the initialization of the pheromone trails

and some parameters, a main loop is repeated until a termination condition — which may
be a certain number of solution constructions or a given CPU-time limit — is met. In the
main loop, first, the ants construct feasible solutions, then the generated solutions are
possibly improved by applying local search, and finally the pheromone trails are updated.
It should be noted that the ACO metaheuristic is more general than the algorithmic
scheme given here.

procedure ACO algorithm for static combinatorial problems
Set parameters, initialize pheromone trails
while (termination condition not met) do
ConstructSolutions
ApplyLocalSearch % optional
UpdateTrails
end
end
Fig. 3.1 Algorithmic skeleton for ACO algorithms applied to static
combinatorial problems.
9

Update of Pheromone Trails

After all the ants have constructed their tours, the pheromone trails are updated. This is
done by first lowering the pheromone value on all arcs by a constant factor, and then
adding pheromone on the arcs the ants have crossed in their tours. Pheromone
evaporation is implemented by

where 0 < ra1 is the pheromone evaporation rate. The parameter r is used to avoid
unlimited accumulation of the pheromone trails and it enables the algorithm to ‘‘forget’’
bad decisions previously taken. In fact, if an arc is not chosen by the ants, its associated
pheromone value decreases exponentially in the number of iterations. After evaporation,
all ants deposit pheromone on the arcs they have crossed in their tour:

Where is the amount of pheromone ant k deposits on the arcs it has visited. It is
defined as follows:

where Ck, the length of the tour Tk built by the k-th ant, is computed as the sum of the
lengths of the arcs belonging to Tk. By means of equation, the better an 72 Chapter 3 Ant
Colony Optimization Algorithms for the Traveling Salesman Problem ant’s tour is, the
more pheromone the arcs belonging to this tour receive. In general, arcs that are used by
many ants and which are part of short tours, receive more pheromone and are therefore
more likely to be chosen by ants in future iterations of the algorithm.

10

As we said, the relative performance of AS when compared to other metaheuristics tends
to decrease dramatically as the size of the test-instance increases. Therefore, a substantial
amount of research on ACO has focused on how to improve AS.

3.2 MAX–MIN Ant System

MAX–MIN Ant System (MMAS) (Stu¨ tzle & Hoos, 1997, 2000; Stu¨ tzle, 1999)
introduces four main modifications with respect to AS. First, it strongly exploits the best
tours found: only either the iteration-best ant, that is, the ant that produced the best tour in
the current iteration, or the best-so-far ant is allowed to deposit pheromone.
Unfortunately, such a strategy may lead to a stagnation situation in which all the ants
follow the same tour, because of the excessive growth of pheromone trails on arcs of a
good, although suboptimal, tour. To counteract this effect, a second modification
introduced byMMAS is that it limits the possible range of pheromone trail values to the
interval [tmin; tmax]. Third, the pheromone trails are initialized to the upper pheromone
trail limit, which, together with a small pheromone evaporation rate, increases the
exploration of tours at the start of the search. Finally, in MMAS, pheromone trails are
reinitialized each time the system approaches stagnation or when no improved tour has
been generated for a certain number of consecutive iterations.

Update of Pheromone Trails

After all ants have constructed a tour, pheromones are updated by applying evaporation
as in AS, followed by the deposit of new pheromone as follows:

Where , The ant which is allowed to add pheromone may be either the
best-so-far, in which case or the iteration-best, in which case
11

, where is the length of the iteration-best tour. In general, in
MMAS implementations both the iteration-best and the best-so-far update rules are used,

in an alternate way. Obviously, the choice of the relative frequency with which the two
pheromone update rules are applied has an influence on how greedy the search is: When
pheromone updates are always performed by the best-so-far ant, the search focuses very
quickly around , whereas when it is the iteration-best ant that updates pheromones,
then the number of arcs that receive pheromone is larger and the search is less directed.

Experimental results indicate that for small TSP instances it may be best to use only
iteration-best pheromone updates, while for large TSPs with several hundreds of cities
the best performance is obtained by giving an increasingly stronger emphasis to the best-
so-far tour. This can be achieved, for example, by gradually increasing the frequency
with which the best-so-far tour T bs is chosen for the trail update (Stu¨ tzle, 1999).

Pheromone Trail Limits

In MMAS, lower and upper limits tmin and tmax on the possible pheromone values on
any arc are imposed in order to avoid search stagnation. In particular, the imposed
pheromone trail limits have the e¤ect of limiting the probability pij of selecting a city j
when an ant is in city i to the interval [ pmin; pmax], with 0 < pmin

pij

pmax

1.
Only when an ant k has just one single possible choice for the next city, that is
, we have pmin = pmax = 1.

It is easy to show that, in the long run, the upper pheromone trail limit on any arc is
bounded by , where C* is the length of the optimal. Based on this result, MMAS

uses an estimate of this value, , to define tmax: each time a new best-so-far tour
is found, the value of tmax is updated. The lower pheromone trail limit is set to tmin =
tmax/a, where a is a parameter (Stu¨ tzle, 1999; Stu¨ tzle & Hoos, 2000). Experimental
results (Stu¨ tzle, 1999) suggest that, in order to avoid stagnation, the lower pheromone
12

trail limits play a more important role than tmax. On the other hand, tmax remains useful
for setting the pheromone values during the occasional trail reinitializations.

Pheromone Trail Initialization and Reinitialization

At the start of the algorithm, the initial pheromone trails are set to an estimate of the
upper pheromone trail limit. This way of initializing the pheromone trails, in combination
with a small pheromone evaporation parameter, causes a slow increase in the relative
difference in the pheromone trail levels, so that the initial search phase of MMAS is very
explorative.

As a further means of increasing the exploration of paths that have only a small
probability of being chosen, in MMAS pheromone trails are occasionally reinitialized.
Pheromone trail reinitialization is typically triggered when the algorithm approaches the
stagnation behavior (as measured by some statistics on the pheromone trails) or if for a
given number of algorithm iterations no improved tour is found.

MMAS is one of the most studied ACO algorithms and it has been extended in many
ways. In one of these extensions, the pheromone update rule occasionally uses the best
tour found since the most recent reinitialization of the pheromone trails instead of the
best-so-far tour (Stu¨ tzle, 1999; Stu¨ tzle & Hoos, 2000). Another variant (Stu¨ tzle,
1999; Stu¨ tzle & Hoos, 1999) exploits the same pseudorandom proportional action
choice rule as introduced by ACS [see equation (3.10) below], an ACO algorithm that is
presented in section 3.4.1.

Behavior of MMAS

Of the ACO algorithms considered in this chapter, MMAS has the longest explorative
search phase. This is mainly due to the fact that pheromone trails are initialized to the
initial estimate of tmax, and that the evaporation rate is set to a low value (a value that
gives good results for long runs of the algorithm was found to be p = 0.02). In fact,
because of the low evaporation rate, it takes time before significant di¤erences among the
pheromone trails start to appear.
13

When this happens,MMAS behavior changes from explorative search to a phase of
exploitation of the experience accumulated in the form of pheromone trails. In this phase,
the pheromone on the arcs corresponding to the best-found tour rises up to the maximum
value tmax, while on all the other arcs it decreases down to the minimum value tmin.
This is reflected by an average l-branching factor of 2.0. Nevertheless, the exploration of
tours is still possible, because the constraint on the minimum value of pheromone trails
has the e¤ect of giving to each arc a minimum probability pmin > 0 of being chosen. In
practice, during this exploitation phase MMAS constructs tours that are similar to either
the best-so-far or the iteration-best tour, depending on the algorithm implementation.

ASC principle:
This principle simulates the ants ASC, and consists local best updating and global best
updating.
Local best updating: if the ant visits edge (i, j), or (i, j) s(l) so this edge’s pheromone
trail will change as following rule:
1
)1(



ijij

Global best updating: only used for edges belong to w(t):
))(()1( twg
ijij



)(),( twji 

Three level principle [1]:
This updating rule is combination of 2 local best updating and global best updating.
Therefore the pheromone trail on edge (i, j) is updated globally as formular 1.11 with:

)(),( twji 

)(),(: lsjil 







1
2,
))((





twg
ji
14

In three level principle, 
1
and 
2
are constances. In
[16]
there is a new multi-level updating
principle:

Multi-level principle [2]:
Online updating rule: an ant locates at vertice i and goes to vertice j so the pheromone of
edge (i, j) is updated as following rule:
1
)1(


ijij

ACO plus Local Search

The vast literature on metaheuristics tells us that a promising approach to obtaining high-
quality solutions is to couple a local search algorithm with a mechanism to generate
initial solutions. As an example, it is well known that, for the TSP, iterated local search
algorithms are currently among the best-performing algorithms. They iteratively apply
local search to initial solutions that are generated by introducing modification to some
locally optimal solutions (see chapter 2, section 2.4.4, for a detailed description of
iterated local search).

ACO’s definition includes the possibility of using local search (see figure 3.3); once ants
have completed their solution construction, the solutions can be taken to their local
optimum by the application of a local search routine. Then pheromones are updated on
the arcs of the locally optimized solutions. Such a coupling of solution construction with
local search is a promising approach. In fact, because ACO’s solution construction uses a
di¤erent neighborhood than local search, the probability that local search improves a
solution constructed by an ant is quite high. On the other hand, local search alone su¤ers
from the problem of finding good starting solutions; these solutions are provided by the
artificial ants.
15

4 Ant Colony Optimization for Finding Motif Problem
4.1. Initialization
Choosing the appropriate graph representation for the problem to be solved is of central
importance to graph-based ACO metaheuristic. In our case, we use a weighted directed
graph G(V,E) with V being the set of nodes and E being the set of edges. If the motif’s
length is l then there are 4×l nodes arranged in agridoffourrowsand l columns. Each node
in position (i, j), simply denoted by node(i, j), is associated to nucleotide i to be in the jth
position of the motif. In addition, an edge ei(u, v) always exists between two nodes node
(u, j) and node (v,j+1)where u, v

∈
{A, T,C,G} and (1 ≤ j ≤ l − 1).

In SMMAS, two types of pheromone trails are modeled. First, a pheromone trail τ
1
i
, i
∈
{A, T,C,G}, is associated with each node (i, 1). The value τ
1
i
encodes the desirability of
nucleotide i being at the ﬁrst motif’s position. Second, a pheromone trail τ
j
uv
, u, v
∈
{A,
T,C,G}, is associated with each edge e
j
(u, v). The value τ
j
uv
corresponds to the desirability
of nucleotides u and v being at motif’s position j and j + 1 respectively. Initially, the
pheromone values are all set to a constant value.
Figure 4.1 Constructed graph
16

4.2. Solution construction
Motif finding using ACO consists of a number of iterations of solution construction. Each
ant incrementally builds its solution by traversing the graph to complete a tour
representing one candidate motif pattern. The probability P
1
i
(t) of an ant selects node (i,
1) as the ﬁrst solution component, at iteration t, can be expressed as:

where η
0
i
is the heuristic information of node (i, 1) representing the frequency of
nucleotide i in the dataset, that is, η
0
i
= B
0
(i). Besides, an ant located at node (u, j)
chooses to go to node (v, j + 1) with a probability deﬁned as:

where η
n
v,j
is the heuristic information of edge e
j
(u, v) which is computed based on the
higher-order background model. Using a background model of order n means that, in

addition to pheromone information, the ant’s decision to add solution component
Figure 4.2 Each ant builds its solution
by traversing the graph

17

(nucleotide) i in position j of the pattern motif depends on the n previous visited
nucleotides during the solution construction. Doing so, the n
th
background model B
n
is
based on counting the frequency of all nucleotide subsequences of length (n + 1) in the
dataset. Let x
1
x
2
x
j−1
x
j
be the partial motif pattern constructed by an ant and being at
node (u, v), i.e. x
j
= u. If j<n then η
n
v,j
= P(x
j
+1 = v|x

j
−n+1 x
j
−1x
j
), otherwise η
n
v,j
=
B
0
(v).
4.3. Pheromone updating
The values of pheromone trail 
(i)
, at each vertice i, are updated according to:

(i)
= (1-)*
(i)
+ *
max
if 
(i)
belongs to the solution of the ant used to
update,

(i)
= (1-)*
(i)

+ *
min
if 
(i)
does not.
With i = 0 3 correspoding to A, C, G, T
Where ρ (0 <ρ< 1) is the pheromone evaporation rate and 
max
= 1.0, 
min
= 
max

/(4^L). The pheromone trail 
(i,j)
on each edge is updated accoding to the following
updating rule:

(i,j)
= (1-)*
(i,j)
+ *
max
if 
(i)
belongs to the solution of the ant used to
update,

(i,j)
= (1-)*

(i,j)
+ *
min
if 
(i)
does not.
With i = 0…15 and j = 0…(L-1)

4.4. Local Search and Memetics
4.4.1 Local Search
Before updating the pheromone trails, we make a local search for the current best
solution to improve it. In particular, we change each nucleotide of the current best motif
18

into three others nucleotides, and recaculate its score. If the new solution has better score,
we will use that motif to updating pheromone trails.

Figure 4.3 Example of local search
Local search is very effective when combine with ACO, because of good solution, all
next ants tend to find their motifs in better way. Althought local seach makes run time
increased considerably, the solutions found have higher score. Therefore we can reduce
the loops and keep run time stable.
4.4.2 Memetic
Memetic is a technique that uses local search for more than one ant in one loop of the
algorithm.
Sometime, using local search for only the best ant in one loop does not give the good
result as we expect. Because the better result can be given if we use the local search for
the 2, 3, 4

or more best ants. Our algorithm use this technique. Using local search for

more than one ant in one loop will improve the score; however, the running time will also
increase dramatically. Therefore, we have to test and decide how many ants will be used
the local search to give a good score with acceptable running time.
By many experiments, in this motif findign problem with our new ACO algorithms, we
decide that using local search for two best ants in one loop will give the best result. From
the 3
rd
best ant, the local search only make the running time increasing, but does not
improve the score.

19

5 Experiment Result
We have tested the our SMMAS algorithms with the MEME algorithm, one of the most
popular algorithms fo motif finding problem.

We also test the memetic. The local search will be use for 0, 1, 2 and 3 best ants to find
out how many ants using local search will give the best result.

The input is a set of 20 sequences, with the same length 600. The lengths of motif are 7,
10 and 13.

Each algorithm we run 20 times, Both of the best result and the average result of two
algorithms are compared. Time is shown in seconds. The better results are shown in bold.

Our tests have been performed on an CPU core i5 2.56 GHz, 4GB of RAM, run
windows 7.
The results in comparing our algorithm and MEME algorithm are shown in tables
below:

Length of Motif = 7
MEME
SMMAS
Score
Runtime(s)
Score
Runtime(s)
Search 1 motif
119
4.54
125
22.4
Search 10
Motif
Best result
124
39.86
125
22.4
Average
119.4
124.1

Table 5.1: Motif’s length is 7
20

Length of Motif = 10
MEME

SMMAS
Score
Runtime(s)
Score
Runtime(s)
Search 1 motif
158
4.61
160
83.15
Search 10
Motif
Best result
158
39.42
160
83.15
Average
153.5
158.4

Length of Motif = 13
MEME
SMMAS
Score
Runtime(s)
Score
Runtime(s)

Search 1 motif
191
4.56
193
147.53
Search 10
Motif
Best result
191
37.91
193
147.53
Average
180.2
190.3

It can be seen that in all cases, our SMMAS algorithm always gives the better score than
MEME algorithms does, especially in finding 10 different motifs in one input.
About the running time, MEME algorithm has shorter running time in finding one motif.
However, when finding more than one motif, the running time increases dramatically
(Runtime = Runtime for finding one motif * amount of diffirent motif ). With SMMAS
Table 5.3: Motif’s length is 13
Table 5.2: Motif’s length is 10
21

algorithm, finding one motif or many diffirent motif pays the same running time.
Moreover, the balance of the average score of SMMAS algorithm is very high.

Tables show the result in testing for memetic:

(In this, NoA is Number of Ants using Local seach )
Length of Motif = 7
NoA = 0
NoA = 1
NoA = 2
NoA = 3
Score
Time
Score
Time
Score
Time
Score
Time
Search 1 motif
118
15.8
121
18.3
125
22.4
125
28.9
Search 10
Motif
Best result
119
15.8
123
18.3

125
22.4
125
28.9
Average
118.7
122.1
124.1
124.2

Length of Motif = 10
NoA = 0
NoA = 1
NoA = 2
NoA = 3
Score
Time
Score
Time
Score
Time
Score
Time
Search 1 motif
149
60.1
153
68.4
160

83.15
160
99.3
Search 10
Motif
Best result
150
60.1
155
68.4
160
83.15
160
99.3
Average
149.6
154.2
158.4
158.4

Table 5.4: Motif’s length is 7
Table 5.5: Motif’s length is 10
22

Length of Motif = 13
NoA = 0
NoA = 1

NoA = 2
NoA = 3
Score
Time
Score
Time
Score
Time
Score
Time
Search 1 motif
183
104.13
187
128.42
193
147.53
193
172.62
Search 10
Motif
Best result
183
104.13
188
128.42
193
147.53
193
172.62

Average
182.8
187.9
190.3
190.5

6 Conclutions
In this paper, we shared our knowledge about Ant Colony Optimization. We also
proposed a new promising method for the Motif finding problem and presented and
commented some experimental results.

We compared our approach, SMMAS for Motif finding problem, to one of the most
popular algorithms MEME. The result shows that our algorithm computes always better
solutions, especially in finding many diffirent motifs in one input.

Our future work will be focus on two things: improving the quality of result, and
decreasing the running time.

Table 5.6: Motif’s length is 13
23

References
[1] Hoang Xuan Huan. Convergence Analysis of ACO Algorithms and New Perpectives,
manuscript, 2003.

[2] Hoang Xuan Huan, Do Duc Dong and Dinh Quang Huy. Multi-level Ant System and
Typical Combanatorial Optimization Problems. 2nd Optimization and Scientific
Computation Conference, Institue of Mathematics, Ha Noi, Viet Nam, 05.2004,
page 15.
[3] Das, M., Dai, H.: A survey of the DNA motif finding algorithms. BMC
Bioinformatics 8(suppl.7), S21 (2007)

[4] Dorigo, M., Maniezzo, V., Colorni, A.: The Ant System: Optimization by a colony
of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics - Part
B 26(1), 29–41 (1996)

[5] Dorigo, M., St¨utzle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)

[6] Che, D., Song, Y., Rasheed, K.: MDGA: Motif discovery using a genetic algorithm.
In: Proc. of the 2005 Conf. on Genetic and Evolutionary Computation (GECCO
2005), pp. 447–452. ACM Press, Washington (2005)

[7] Karpenko, O., Shi, J., Dai, Y.: Prediction of MHC class II binders using the ant
colony search strategy. Artificial Intelligence in Medicine 35(1), 147–156 (2005)

[8] Kaya, M.: MOGAMOD: Multi-objective genetic algorithm for motif discovery.
Expert Systems with Applications 36, 1039–1047 (2009)

[9] Keith, J.M., Adams, P., Bryant, D., Kroese, D.P., Mitchelson, K.R., Cochran, D.,
Lala, G.H.: A simulated annealing algorithm for finding consensus sequences.
Bioinformatics 18(11), 1494–1499 (2002)

24

[10] Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: A

Gibbs sampling strategy for multiple alignments. Science 262(5131), 208–214
(1993)

[11] Liao, Y.J., Yang, C.B., Shiau, S.H.: Motif finding in biological sequences. In: Proc.
of 2003 Symposium on Digital Life and Internet Technologies, Tainan, Taiwan, pp.
89–98 (2003)

[12] Tompa, M., et al.: Assessing computational tools for the discovery of transcription
factor binding sites. Nature Biotechnology 23(1), 137–144 (2005)

[13] St¨utzle, T., Hoos, H.: MAX-MIN ant system. Future Generation Computer Systems
16, 889–914 (2000)

[14] Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA
sequences. In: Proc. of the 8th Int. Conf. on Intelligent Systems for Molecular
Biology (ISMB 2000), pp. 269–278. AAAI Press, San Diego (2000)

[15] Seehuus, R., Tveit, A., Edsberg, O.: Discovering biological motifs with genetic
programming. In: Proc. of the 2005 Conf. on Genetic and Evolutionary
Computation (GECCO 2005), pp. 401–408. ACM Press, Washington (2005)

Giải bài toán tìm Motif bằng thuật toán đàn kiến

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về