An Efficient Ant Colony Algorithm for DNA Motif Finding

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (488.69 KB, 12 trang )

An Efficient Ant Colony Algorithm for DNA Motif Finding
Xuan Huan Hoang1, The Hung Nguyen1, T. Thu Ha Doan2, and T. Anh Tuyet Duong1
1

University of Engineering and Technology, VNU, Hanoi, Vietnam
{huanhx, hungnt_55, tuyetdta_55}@vnu.edu.vn
2
Hanoi University of Agriculture,

Abstract. Finding motifs in gene sequences is one of the most important problems of
bioinformatics and belongs to NP-hard type. This paper proposes a new ant colony
optimization algorithm based on consensus approach, in which a relax technique is applied to
recognize the location of common motif. The efficiency of the algorithm is evaluated by
comparing it with the state-of-the-art algorithms.

1. Introduction
Gene regulatory elements are called the DNA motifs (later we call it “motifs” for
short), which contain a number of important biological information [1,5,12,14,18]. The
identification of DNA motif is currently one of the most important problems in
bioinformatics and is NP-hard (see [2,10,12,16,17,19]). There are two main approaches to
search for a motif: biological experiment and computing method, i.e. bioinformatics. Due to
the high cost and time consuming, biological experiments are not really effective, whereas
computing methods are widely used to predict motifs.
Researchers have made various definitions of motif, many statements for motif finding
problem and also developed a number of algorithms for finding motif [1,3,5,15]. One of the
widely used approaches is to use an approximate algorithm to optimize consensus score or
information content [2,7,10,11,15,16,19]. Recently, the methods that use ant colony
optimization (ACO) have been applied effectively by several authors for this problem. For
example, Bouamama et al. (2010) proposed MFACO algorithm [2] that uses consensus score
to find motifs and information content to locate their appearances (binding sites) in each
DNA sequence. Yang et al. (2011) proposed an algorithm [19], referred to from now on as

EMACO, that combines ACO algorithm with Expectation Maximization (EM) to find the
starting positions of motif in sequences. Liu et al. (2013) proposed ACRI algorithm [11] that
uses information content as the objective function for the same purpose as EMACO. In this
paper, we propose a new ACO algorithm called ACOMotif using the total Hamming distance
score function of motif to DNA sequences for this problem. ACOMotif uses the structural
graph as in MFACO but with different heuristics information, pheromone update rule, and
local search technique. For each motif found, to locate the starting positions in the DNA
sequences, the algorithm subsequently applies a relax technique and gives R-ACOMotif
version for this goal. Runtime of ACOMotif is also compared with MotifSuite (2012) on a
very large dataset obtained from [21] called SCPD. The efficiency of ACOMotif is indicated
by the experiments on the same datasets published in three articles above and on SCPD.
The rest of this paper is organized as follows. Section 2 states the DNA motif finding
problem, followed by a brief introduction of ACO method and how it was applied in

1

MFACO, ACRI, and EMACO algorithms. Our new algorithm will be introduced in Section 3.
Section 4 describes the experiments comparing ACOMotif/R-ACOMotif with MFACO,
EMACO, ACRI, and MotifSuite. Some conclusions are presented in the last section.

2. DNA motif finding problem and related works
2.1. DNA motif finding problem
DNA motif finding problem, from optimization perspective, can be described as follows
[2,19]: consider a set of same length DNA sequences S = {S1,S2,..., SN}, in which
,
belongs to the letter set Σ = {A, C, G, T} for all i, j. For a given natural number
l, there are two approaches to discover a motif:
1) Consensus approach: Find a string Sc of length l and a set of subsequences
, in which mi is the substring of length l of Si, such that they minimize

the objective function:
(1)
Then Sc is called a motif of S, each mi is called a motif instance (or instance in short)
from Si.
If we consider M as a matrix (called a consensus matrix) with the ith row being the string
mi and denote C(u,j) as the number of nucleotides u in column j, the objective function
CSc(M) is formulated as:
(2)
Then motif Sc is a string of length l and the letter at position i is the nucleotide that
occurs most often in the ith column of M. Note that each M can have many motifs but we
consider only one of them.
2) Positional approach: Find a set of substrings
} and a set of
starting positions
}, in which, each instance mi is a length l substring
of Si corresponding to starting position ai. In this approach, the objective function is
information content:
(3)
In which, Q(u,j) indicates the frequency of nucleotide u in column j of the matrix M, pu
is the background frequency of u in the entire set S. In reality, the location of mi on Si is
called the binding site of DNA.
Remark: Note that, it is not sure that the optimal solutions of this objective function are real
motifs. So, the more solutions and the closer to real motif’s binding sites from locations of the
instances, the better an algorithm is.

2.2. Ant Colony Optimization method
Ant colony Optimization (ACO) proposed by Dorigo [6,9] is a random metaheuristic
method to solve hard combinatorial optimization problems. This algorithm has been diversely
improved in the literature and widely applied in many applications. Memetic scheme using
population-based search technique was first proposed by Moscato [13] and applied for genetic

algorithm. Today, it is incorporated with other algorithms [3,8].

2

Memetic-ACO algorithms
In this article, we apply ACO with reinforcement search following a simple memetic
schema as described in Figure 1. In such algorithms, the original problems are converted into
the problems of finding solutions on the structural graph G = (V, E, Ω, η, T) where V is the
vertex set, E is the edge set, and η and T are the set of heuristics information and pheromone
trail, respectively, in reinforcement learning ; η and T can be placed on vertices or edges. An
acceptable solution is a path satisfying the condition Ω, starting from a vertex in C0 set of V,
then expanded by a random to the next vertex based on heuristics information and pheromone
trail. The ACO algorithm uses Nant artificial ants, in each iteration, each ant finds a solution
by a randomized procedure on the structural graph. Then all the ant solutions are assessed and
choosing the best one to apply enhancing strategy or local search technique. Consequently,
the obtained solutions will be evaluated again and the pheromone trail is updated as
reinforcement learning information in the next iteration. Although many algorithms use the
same graph G(V,E), they use diffirent heuristics information, pheromone update rules and
local search techniques. From now on we will call G(V,E) as the structural graph.

Procedure of Memetic-ACO algorithms;
Begin
Initialize; // initialize pheromone trail matrix and u ants;
Repeat
Construct solutions; // each ant constructs its own solution;
Choose a subset Ωil to evolve by enhanced (or local) search;
For each individual in Ωil do
Run enhance (or local) search
End for;

Update trail;
Until End condition;
Choose the solutions

End;
Algorithm 1. Specification of a simple memetic-ACO algorithm
Recently, some ACO-based algorithms following this scheme have been applied
effectively for DNA motif finding:
MFACO(2010), proposed by Bouamama et al. [2], uses consensus score to find motifs
and information content to determine their starting positions. Experiments showed that this
algorithm obtains better results than the other best techniques: GS, BP, and MEME.
EMACO (2011), proposed by Do Yang et al. [19], combines ACO and Expectation
Maximization (EM), in which EM is used to determine binding sites. Experiments revealed
that this method is better than GAME and GALF.
ACRI (2013), proposed by Liu et al. [11], uses information content as the objective
function to determine positions. Different from two methods above, this algorithm uses local
search at two adjacent positions instead of using random search method. It is also

3

experimentally proven to have better results than that of these algorithms: MEME, AlignACE,
and Gibbs Sampler.
In ACO algorithms, there are four important factors affecting their performances: 1)
structural graph, 2) heuristics information, 3) pheromone update rule, and 4) local search
technique. The proposed algorithm uses the same structural graph of MFACO algorithm.

3. The proposed algorithm
This algorithm, named ACOMotif, uses total Hamming distance of motif to DNA
sequences as the objective function. ACOMotif uses structural graph of MFACO but with

different heuristics information, pheromone update rule, and local search technique. For each
motif found, to locate binding sites in DNA sequences, the algorithm subsequently applies
relax technique, which is why in this case ACOMotif is called R-ACOMotif.

3.1. ACOMotif
ACOMotif follows the scheme described in Algorithm 1.The output is the set Q which
includes the motifs of length l and the corresponding instances on the DNA sequences which
have smallest hamming distance compared to the motifs. The detailed description of
ACOMotif is as follows.

Structural graph
The structural graph G (V, E) is the same as MFACO’s. To find a motif of length l, the
graph has 4l vertices arranged in four rows and l columns. Each vertex at position (u, j) is
labeled by the corresponding nucleotide u as shown in Fig.1 . The labels of the vertices in
each row are also used to refer to the rows. From left to right, edges connect vertices of two
consecutive columns. We denote
as the edge connecting vertex (u, j) to (v, j +1).
Heuristics Information and pheromone trail are placed at the vertices of the first column and
on the edges.

Figure 1. Structural graph for finding motif of length l

Heuristic information
The heuristics information is placed at the first column vertices and on the edges.
At t vertices of the first column, heuristics information is the frequency of
nucleotide in the entire dataset S.

4

Heuristics information on edge
is the frequency of the couple uv
nucleotide in S. There are only 16 such quantities
with (u,v) ∑x∑.
Remark. Note that, in MFACO, heuristics information at edges is computed by high-order
background model based on the frequency of the motif pattern from the first column to the
current column in S. Since the appearance of these patterns in each DNA sequence Si is rare,
this kind of statistical information is limited.

Pheromone update rule
Our algorithm uses the SMMAS pheromone update rule (Smooth Max-Min Ant
System) [9]. Pheromone trails

(on each vertex u of the first column) and

) are first initialized to a predetermined value
at each vertex u is updated by Equation (4):

(on edges

. After each loop, pheromone trail

,

(4)

where

,

and are pre-determined parameters.
Pheromone trail on edge
is updated by Equation (5)
,

(5)

where:
The computational analysis and experiment in [9] show that this rule is better than
MMAS update rule used in MFACO.

Randomized procedure to find solution
In each iteration, each ant randomly selects a starting node u at the first column with
probability :
(6)
Then the ant randomly walks through all the columns sequentially with the probability
of choosing the edge from vertex u of column j to vertex v at column j+1 being
(7)
The path of the ant from starting vertex to the last column vertex identifies an
acceptable solution for motif.

The objective function and identification of binding sites
For each acceptable solution Sc, instead of using

5

as in Equation 1, ACOMotif

takes the total Hamming distance Hd(Sc) from Sc to DNA sequences in S as the objective

function:
(8)
where
}

(9)

Minimum string mi (9) is an instance of and its position in Si is the binding site. Note
that each can have multiple instances on Si with the same distance.

Local search
ACOMotif applies hill-climbing technique for local search as described below. After all
ants finish their paths through the graph, the solutions are formed with their corresponding
total hamming scores; then the local search is applied on the solutions having smallest scores.
For each potential motif Sm, use the set Q(Sm ) to contain search results, and the iteration
procedure is carried out as follows:
Step 1: Initialize Q(Sm) = {Sm};
Step 2.Repeat:
For each i=1,…,l do:
2.1. Replace letter at position i of Sm by one of three remaining letters
consecutively in set ∑ to get Sp;
2.2. Compute
;
2.3. If
≤
then Sm Sp and Q(Sm) Q(Sm) {Sp};
Until we cannot improve the objective function anymore.
After applying local search for potential motifs in each iteration, the sets Q(Sm),
consisting of candidates with the smallest or nearly smallest score, are combined into the set
Q containing all the best solutions up to that point, which have the same binding site

(retaining only a motif). Based on set Q, the pheromone trail on the graph is updated
according to Equation (4) and (5). The algorithm stops when it finishes running a predefined
number of loops. The binding sites associated with motifs in Q allow us to identify instances
of motif.

3.2. R- ACOMotif
Because real positions of motifs in DNA sequences are not certainly the solution of
optimization problem, ACOMotif additionally uses relax technique to locate binding site of
each motif. When ACOMotif employs this technique, it is called R- ACOMotif.
With each motif Sc found in the set of solutions Q of ACOMotif and given a number
, relax technique finds set of instances
} and set of starting
positions
} as follows:
Step 1. // Expand the instances set and binding sites
1.1. On each sequence Si, finds locations
so that for each substring

6

of

length l, Hamming distance from substring to Sc is
or
. We get sets Mi and Ai including
and respectively;
1.2.Compute the number of elements ni of sets Mi respectively and
;
Step 2.// Filter to reduce the size of sets Mi and Ai

Repeat
2.1. Rearrange the order of the set M incrementally with repect to ni // later
follows this new order;
2.2. Determine the smallest number k so that with every i ≥ k then ni>1;
2.3. For each i = k to N do
2.3.1. For each
Mi, compute
;
2.3.2. Compute gi= min{
};
2.3.3. If
then remove
out of set Mi;
2.4.
// reduce ;
Until
is smaller than half of that value before the loop;
Step 3. // find the best solutions
3.1. Build all consensus matrices from the reduced set Mi and compute
consensus score as in Equation (2);
3.2. Sort the matrices in step 3.1 and the corresponding locations of the instances
in decreasing order with respect to their consensus scores;
Step 4. The solution is the first tuple in the list in step 3.2 // can take more depending on
priority computed.

4. Experimental results
The program was written in Perl, run on a desktop computer equipped with CPU Intel Core i5
2.5 GHz and 4 GB RAM, using Ubuntu 12.04 Operating System. Our experiments compare
the new algorithm’s efficiency with those of MFACO [2], EMACO [19], and ACRI [11] on
the same datasets, using the same numbers of loops and ants as in the corresponding

evaluations. The number of ants is fixed to 8. Because we do not have the programs of these
algorithms, we cannot compare runtime on the same configuration machine, the results of the
compared algorithms will be taken directly from the published articles. The runtime of
ACOMotif/R-ACOMotif is in average. The parameters had been set as follows:

is chosen depending on algorithm’s number of loops.
10-100
100-300
300-600
> 600
Number of loops
0.03-0.05
0.02-0.03
0.01-0.02
0,005
Coefficient
To evaluate computation time, ACOMotif was compared with MotifSuite (2012) [20] on
SCPD dataset [21]. The efficiency of ACOMotif was assessed by experiments on the same
published dataset of three algorithms above and on SCPD.

4.1. Comparison with MFACO using consensus approach
Experiments on H.sapiens used in [2] contain three small sets with the number of strings are
6, 9, and 12, respectively. Each of them has 3,001 nucleotide in length. H.sapiens dataset did

7

not have known actual motif biologically. Therefore, our experiments just compare the values
of objective functions as computed in Equation (1) and Equation (2). We use notations HSc
for total Hamming distance score and CSc for consensus score. Note that:

HSc = N*l - CSc
(10)
Then smaller HSc is equivalent to greater CSc; therefore, we only need to care about HSc.
The experiments have been performed in the same way as in [2]: each set is run three times
with 50 loops as in [2], computation time is in average, the result of MFACO is taken from
[2].
The experimental results for H.Sapiens dataset 1 are shown in Table I and Table II with motif
length l = 7 and l = 13 respectively.
Table I. A comparison between ACOMotif Table II. A comparison between ACOMotif and
and MFACO on H.sapiens 1: = 0.03, l = MFACO on H.sapiens 1: = 0.03, l = 13, N= 6,
7; N=6, runtime 41s
runtime 96s
ACOMotif
CCTCCCC
AAAAAAA
GCAGCGG
GCCGGGG
GCCGCCG
AAAAAAG
GCCTGTG
TAAAAAT
CGGCGCC
GGGCCAG
GGCCAGG
GCGGGCG
CCCGGGC

CSc
42
42

42
42
42
42
42
42
42
42
42
42
42

HSc
0
0
0
0
0
0
0
0
0
0
0
0
0

MFACO
AAAAAAA
AGGAGGA

AAAAAAG
TAAAAAT

CSc
42
42
42
42

HSc
0
0
0
0

ACOMotif
AAAAAAAAAAAGA
AAAAAAAAAAAAG
GCTGAGGCAGGAG
GCCGCCGCCGCCG
CGCCGCCGCCGCC
GAGGCTGAGGCAG

CSc
76
75
72
72
72
71

HSc
2
3
6
6
6
7

MFACO
AAAAAAAAAAAGA
AAAAAAAAAAAAG
AAAAAAAAAAAGT
AAAAAAAAAAAAG

CSc HSc
76
2
75
3
75
3
75
3

Remark: Table I shows that ACOMotifis is considerably better (with 13 motifs found) in
comparison to MFACO (with only 4 motifs found) with the same HSc and CSc. We can see
from Table 2 that both algorithms discovered the motif having the best score, but MFACO did
find three motifs whose HSc score equals 3, while ACOMotif found only one. However, the
number of motifs discovered by ACOMotif is higher in general.

The experimental results for H.Sapiens dataset 2 are represented in Table III and Table IV,
with motif length l = 7 and l = 13, respectively.
Table III. Comparison between ACOMotif
and MFACO on H.sapiens2: = 0.03, l = 7;
N=9, runtime 62s
ACOMotif
CCCTCCT
CTCCCTT
GAGCAGG
GGGTTGG
GGGGCTG
TGGGAGG
GGCGGCC
GGGGCTG
CCCCTCC
TTCCTGG
CCCCTCC
GGGCTGG
CCTCCCT

CSc
63
62
62
62
62
62
62
62
62

62
62
62
62

HSc MFACO
0
CCCTCCT
1
CCCTCAG
1
GGGTTGG
1
GAGCAGG
1
1
1
1
1
1
1
1
1

CSc
63
62
62
62

HSc
0
1
1
1

Table IV. Comparison between ACOMotif and
MFACO on H.sapiens2: = 0.03, l = 13; N=9,
runtime 126s
ACOMotif
GCCGGCGGGCGCC
GGCCCCCGGGCGG
GGGGGAGCAGGAG
GGCCGGCGGGCGG
GCAGGGGCTGGGG
GGCCAGGCTCGGC
CCCCGCCCCCGGC

8

CSc
102
101
101
100
100
100
100

HSc

15
16
16
17
17
17
17

MFACO
GCCGGCGGGCGCC
GCGGGCGGGCGCC
GCCGGCGGGCGGC
GCCGGAGGGCGCC

CSc
102
101
100
100

HSc
15
16
17
17

Remark: Table III shows that ACOMotif is considerably better in terms of number of found
motifs with minimum score. ACOMotif found 12 motifs whose HSc equals 1, compared with
only 3 motifs when using MFACO. As can be seen from Table IV, ACOMotif still

represented its superiority over MFACO when it also found the best score motif, but two
motifs whose HSc equals 16, and four motifs with HSc 17.
The experimental results for H.Sapiens dataset 3 are illustrated in Table V and Table VI, with
motif length l = 7 and l = 13, respectively.
Table V. Comparison between ACOMotif and TableVI. Comparison between ACOMotif and
MFACO on H.sapiens3: = 0.03, l = 7; N=9, MFACO on H.sapiens3: = 0.03, l = 13; N=9,
runtime 114s
runtime 231s
ACOMotif

CSc

GGCGGGG
CTGAGGC
CCAGCTG
GAGGCAG
GGGGCGG

123
123
123
123
123

HS
c
3
3
3
3

3

MFACO

CSc

HSc

GGGGCGG
CCCAGCT
CCAGCTG
CTGAGGC

123
123
123
123

3
3
3
3

ACOMotif
GGGAGGCTGAGGC
CGGGAGGCGGAGG
GCTGAGGCAGGAG
GGAGGCTGAGGCA
GGCTGAGGCAGGA
GGGCGGGGCGGGG

CSc
205
204
202
202
119
119

HSc
29
30
32
32
35
35

MFACO
GGGAGGCTGAGGC
CGGGAGGCGGAGG
GGAGGCTGAGGCA
CGGGAGGCGGGGG

CSc
205
204
202
201

Remark: Table V proves that ACOMotif discovered one more motif with minimum score 3.

With table VI, ACOMotif still gave better results in terms of both number of found motifs and
score.

4.2. Comparisonwith MFACO and ACRI in terms of position approach
The experiment was carried on E.coli dataset: CRP binding sites used by both MFACO and
ACRI in [2], [11] to compare discovered binding site. The dataset containseighteen 105nucleotite strings. The length of examined motif is 22 as the same in MFACO and ACRI. RACOMotif ran 20 times, each with 300 loops, 10 ants, and = 0.02. Experimental result is
expressed in Table VII.
Table VII: Comparison result betweenR-ACOMotif and MFACO, ACRI algorithms.
Ordered
number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Position

ACRI

Error

MFACO

Error

R-ACOMotif

Error

17; 61
17; 55
76
63
50
7; 60
42
39
9; 80
14
61
41
48
71
17
53
1; 84

78

63
57
78
65
52
9
44
41
11
16
63
43
50
73
19
55
95
78

2
2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
4
0

61
55
76
63
50
7
42
39
9
14
61
41
48
71
17
53
84
78

0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

61
55
76
63
50
7
42
39
9
14
61
41

48
71
17
53
84
78

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

9

HSc
29

30
32
33

Remark: The result shows that R-ACOMotif and MFACO both discovered all the correct
starting positions, however ACRI had comparably high error.

4.3. Comparison with EMACO in terms of position approach
Experiments on two datasets, ERE and E2F, were carried on with EMACO [19]. Each of
them includes 25 strings and 200 nucleotides per string; real motifs and its starting
positionson each string are known in advance. The algorithms were run 20 times and
compared their average values, using 20 ants, 100 loops, and
. According to [19], the
discovered position is correct if it is at most 3 unit(s) away from real location.
To assess the result, the study [4] proposed three measurements including precision, recall,
and F-score:
Precision =

, Recall =

, F- score =

,

(11)

where nc is the number of binding sites that were correctly predicted, np is the total
number of predicted binding sites, and nt is the total number of actual binding sites.
Especially, F-score is said to be suitable for assessing quality of algorithm [11]. Experimental

result comparing with EMACO is presented in Table 8.
Table VIII.Comparison between R-ACOMotif and EMACO on ERE and E2F datasets
Data
ERE
E2F

Precision
0.89 0.005
1 0

RACOMotif
Recall
0.83 0.004
1 0

ACO combined with EM
Precision
Recall
F-score
0.81 0.01
0.76 0.01
0.85 0.01
0.91 0.02
0.92 0.01
0.91 0.02

F-score
0.86 0.005
1 0

Remark: From the above table, we can see easily that on both two datasets, R-ACOMotif has
significantly better measurements as against EMACO. In particular, the former’s F-score
ishigher than the later’s one. Thus, we can conclude that R-ACOMotif runs more efficient
than ACO combined with EM algorithm in [19].

4.4. Comparison with MotifSuite
To compare computation time and precision scores, ACOMotif and MotifSuite [20] were run
on two datasets GCR1 and GCN4 taken from SCPD data [21]. GCR1 contains six 9,050 in
length strings (DNA sequences), and real motif is CTTCC (l = 5). GCN4 contains nine
strings, and real motif is TGACTC (l =6).
Both two programs were run 20 times with the same 20 loops to compute average runtime and
to check if they can find the real motif or not. ACOMotif used eight ants with
.
The experimental result shows that ACOMotif found the real motif CTTCC on GCR1 with
score HSc = 0 and TGACTC on GCN4 with score HSc = 1, whereas MotifSuite did not.
Runtime of the two algorithms presented in Table IX shows that ACOMotif is dramatically
faster.

10

Table IX: Comparison of runtime between ACOMotif and MotifSuite
ACOMotif(s)

MotifSuite(s)

GCR1

148,8

501,8

GCN4

314,9

891,9

5. Conclusion
Motif Finding is one of the challenges of biomolecular. The application of ant colony
optimization algorithm to address the problem has shown its power, but each algorithm has its
own advantages and disadvantages. Experiments prove that ACOMotif algorithm is superior
in comparison with existing algorithms, and R-ACOMotif version allows us to find binding
sites of real motif precisely. This algorithm can be developed to apply to other types of motif
problems, and can improve search techniques to enhance quality. When parallel processing is
employed, the runtime will be lower.
Acknowledgements This work was done during the stay of the first author in Vienamese
institute for advanced study in mathematics (VIASM)

REFERENCES
1. S. Bandyopadhyay, S. Sahni, S. Rajasekaran (2012) Pms6: A faster algorithm for motif
discovery. In: Proceedings of the second IEEE Int. Conf. on Computational Advances
in Bio and Medical Sciences(ICCABS 2012), 1–6.
2. S. Bouamama, A. Boukerram, and A.F. Al Badarneh: Motif finding using ant colony
optimization, ANTS’10 Proc. of the 7th int. conf. on Swarm intelligence(2010), LNCS
vol.6234, 464-471.
3. X. S. Chen, Y. S. Ong, M. H. Lim. Research frontier: memetic computation - past,
present & future, IEEE Computational Intelligence Magazine 5 (2), 2011, 24-36.
4. M. Claeys; V. Storms; H. Sun; T. Michoel; K. Marchal. MotifSuite: workflow for
probabilistic motif detection and assessment.Bioinformatics; 28(14):1931-1932. doi:

10.1093/bioinformatics/bts293 (2012).
5. H. Dinh, S. Rajasekaran, J. Davila, qPMS7: A fast algorithm for finding the (l; d)-motif
in DNA and protein sequences, PLoS one, Vol.7, No 7, 2012, e41425.
6. M. Dorigo, T. St¨utzle: Ant Colony Optimization. MIT Press, Cambridge, 2004
7. E. Eskin and P. Pevzner, Finding composite regulatory patterns in DNA sequences,
Bioinformatics S1, 2002,. 354-363..
8. H. Hoang Xuan, D. Do Duc, N. Manh Ha: An efficient two-phase ant colony
optimization algorithm for the closest string problem. SEAL2012, 188-197
9. H. Hoang Xuan, T. Nguyen Linh, D. Do Duc, H. Huu Tue, Solving the traveling
salesman problem with ant colony optimization: a revisit and new efficient
algorithms, REV Journal on Electronics and Communications, Vol. 2, No. 3–4, July –
December,2012, 121-129.
10. M.K. Keith et al., “A simulated annealing algorithm for finding consensus sequences,”
J.Bioinformatics, 18(2002), 1494-1499
11. W. Liu, H. Chen, L. Chen: An ant colony optimization based algorithm for identifying
gene regulatory elements. Comp. in Bio. and Med. 43(7), 2013, 922-932
12. N. W. Lo, S. W. Changchien, Y. F. Chang and T. C. Lu, “Human promoter prediction

11

based on sorted consensus sequence patterns by genetic algorithms,” Proc. of the Int.
Congress on Biological and Medical Engineering, D3I-1540: 111-112, (2002)
13. P. Moscato, On evolution, search, optimization, genetic algorithms and martial arts:
towards memetic algorithms. Tech. Rep.Caltech Concurrent Computation Program,
Report. 826, California Institute of Technology, Pasadena, California, USA (1989).
14. A. Neuwald, J. Liu, and C. Lawrence, Gibbs motif sampling: detection of bacterial
outer membrane protein repeats. Protein Science, 4:1618–1632 (1995).
15. N. Pisant, A. Carvalho, L. Marsan, M.F. Sagot (2006) Risotto: Fast extraction of
motifs with mismatches. In: Proceedings of the 7th Latin American Theoretical

Informatics Symposium. pp 757–768 (2006)
16. G.D. Stormo, G.W. Hartzell, 3rd, Identifying protein-binding sites from unaligned DNA
fragments, Proc. Natl. Acad. Sci. USA 86 (4) (1989) 1183–1187.
17. G. Thijs, M. Lescot, K. Marchal, S. Rombauts, B.D. Moor, P. Rouz´e, Y. Moreau: A
higher order background model improves the detection of regulatory elements by
Gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001).
18. W. Thompson, C.R. Eric and E.L, Lawrence, “Gibbs recursive sampler: finding
transcription factor binding sites,” J. Nucleic Acids Research, 31, pp. 3580-3585
(2003).
19. C.H. Yang, Y.T. Liu, L.Y. Chuang. DNA motif discovery based on ant colony
optimization and expectation maximization.Proc.of IMECS 2011, 169-174 3.
20. />21. />
12

An Efficient Ant Colony Algorithm for DNA Motif Finding

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về