Tải bản đầy đủ (.pdf) (12 trang)

Báo cáo khoa học: Genome-wide analysis of clustering patterns and flanking characteristics for plant microRNA genes doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (407.42 KB, 12 trang )

Genome-wide analysis of clustering patterns and flanking
characteristics for plant microRNA genes
Meng Zhou
1,
*, Jie Sun
1,
*, Qiang-Hu Wang
1,
*, Li-Qun Song
2
, Guang Zhao
1
, Hong-Zhi Wang
2
,
Hai-Xiu Yang
1
and Xia Li
1
1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
2 Department of Internal Medicine, Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, China
Introduction
MicroRNAs (miRNAs), 21–24 nucleotides in
length, are a large class of endogenous, noncoding
small RNA molecules that regulate gene expression
at the post-transcriptional level in animals and plants
[1–4]. The first microRNA – lin-4 – was discovered
in 1993 in Caenorhabditis elegans through forward
genetic screens [5]. The first plant miRNA was dis-
covered in Arabidopsis thaliana in 2002 [6,7]. Plant
miRNA genes are mostly transcribed into primary


miRNA transcripts (pri-miRNAs) by RNA polymer-
ase II (Pol II). The pri-miRNAs are processed by
DICER-LIKE 1 (DCL1) into stem–loop pre-miRNAs
in the nucleus. Then, pre-miRNAs are processed by
DCL1 in the nucleus and exported to the cytoplasm,
possibly through the action of the plant exportin 5 or-
thologue HASTY and other unknown factors. Mature
RNA duplexes excised from pre-miRNAs (miR-
NA ⁄ miRNA*, where miRNA is the guide strand and
miRNA* is the degraded strand) are methylated by
HEN1. The guide miRNA strand is then incorporated
Keywords
clustering patterns; flanking regions; motif;
plant microRNA gene; sequence
characteristics
Correspondence
Xia Li, College of Bioinformatics Science
and Technology, Harbin Medical University,
Harbin 150081, China.
Fax: +86 045186615922
Tel: +86 045186669617
E-mail:
*These authors contributed equally to this
work
(Received 11 October 2010, revised 7
December 2010, accepted 7 January 2011)
doi:10.1111/j.1742-4658.2011.08008.x
MicroRNAs (miRNAs) have been proven to play important roles at the
post-transcriptional level in animals and plants. To investigate clustering
patterns and specific sequence characteristics in the flanking regions of

plant miRNA genes, we performed genome-wide analyses of Arabidopsis
thaliana, Populus trichocarpa, Oryza sativa and Sorghum bicolor. Our
results showed that miRNA pair distances were significantly higher than
would have been expected to occur at random and that the number of
miRNA gene pairs separated by very short distances of < 1 kb was higher
than of protein-coding gene pairs. Analysis of the promoter architecture of
different miRNA genes in plants revealed significant differences in the
number and distribution of core promoters between intergenic miRNAs
and intragenic miRNAs, and between highly conserved miRNAs and low
conserved or nonconserved miRNAs. We applied two motif-finding algo-
rithms to search for over-represented, statistically significant sequence
motifs, and discovered six species-specific motifs across the four plant spe-
cies studied. Moreover, we also identified, for the first time, several signifi-
cantly over-represented motifs that were associated with conserved
miRNAs, and these motifs may be useful for understanding the mechanism
of origin of new plant miRNAs. The results presented provide a new
insight into the transcriptional regulation and processing of plant miRNAs.
Abbreviations
miRNA, microRNA; Pol II, RNA polymerase II; pri-miRNAs, primary miRNA transcripts; TSSs, transcription start sites.
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 929
into AGO proteins to carry out the silencing reactions
[1,2].
In plants, Xie et al. [8] identified transcription start
sites (TSSs) for 63 miRNA primary transcripts in
A. thaliana and found the TATA box motif in their
core promoter regions. Unlike animal miRNAs, the
vast majority of plant miRNAs are intergenic but not
intronic [2,9]. Several studies have characterized the
upstream sequences of intergenic miRNAs in model
organisms and found the same type of promoters as in

the protein-coding genes of most of the intergenic
miRNAs [10–12]. Furthermore, Zhou et al. [11] also
discovered some interesting sequence motifs that are
specific to intergenic miRNAs in four different model
species. For all other miRNAs located within the
introns of protein-coding genes, little is known about
their transcriptional regulatory element. These intra-
genic miRNAs are possibly transcribed with, or inde-
pendently of, the host genes. Recently, Heikkinen
et al. [13] examined the upstream sequences of miR-
NAs in C. elegans and Caenorhabditis briggsae, and
discovered a sequence motif – GANNNNGA – com-
mon to all miRNAs, including intragenic miRNAs. In
rice (Oryza sativa), some intragenic miRNAs were
found to contain class II promoters in upstream
sequences [10]. However, the complex transcriptional
regulation mechanisms of plant miRNAs still remain
largely unknown.
Although many efforts have been directed towards
examining clustering patterns and the sequence charac-
teristics of the upstream sequences of miRNA genes in
animals in an attempt to understand transcriptional
regulation [11,13–16], similar analyses have been per-
formed only for a relatively small number of miRNAs
in plants, and these were limited to A. thaliana and
O. sativa. Recently, increasing numbers of plant
miRNAs have been identified through forward genet-
ics, direct cloning and computational prediction. An
increasing number of plant miRNAs provide a good
opportunity to uncover complex transcriptional regu-

lation mechanisms for plant miRNAs. In our study,
we performed computational approaches, based on
genome-wide analyses, to examine the clustering pat-
terns of plant miRNAs. In addition, we analyzed
regions, up to 2 kb upstream and up to 1 kb down-
stream, of miRNA stem–loop sequences in four plant
species, to identify characteristic sequence motifs. We
hope that the present results can improve the current
understanding of transcriptional regulation and pro-
cessing of plant miRNAs and provide useful knowledge
for understanding the mechanism of the origin and
computational identification of new miRNAs in plants.
Results and Discussion
Analysis of clustering patterns of miRNA genes
in four plant genomes
Many previous studies have shown that miRNA genes
tend to be present as clusters within a region of several
kilobases in animal genomes [17–20]. In contrast, plant
miRNA genes are rarely arranged in tandem [1]. To
further explore the clustering patterns of miRNAs in
plant genomes, we computed the distances between
same-strand consecutive miRNA genes of four plant
species to analyze the distance distribution of miRNA
genes in different plant species based on reported miR-
Base coordinates. The cumulative distance distribution
of the miRNA gene pairs is presented in Fig. 1 and
shows that 17.71%, 26.94% and 29.07% of the miR-
NA gene pairs are separated by regions of < 1, 10
and 100 kb, respectively, which are much smaller than
the regions separating animal miRNA gene pairs. Fur-

thermore, we compared the distance distribution of the
miRNA gene pairs with the distance distribution of
protein-coding genes in four plant genomes (Fig. 1).
We found that more miRNA gene pairs than protein-
coding gene pairs were separated by very short dis-
tances of < 1 kb. To evaluate the statistical signifi-
cance of the clustering patterns of miRNA genes in the
four plant species studied, we also compared the
distances of the miRNA gene pairs with random dis-
tances, as described in the Materials and methods, and
found that the miRNA gene pair distances were statis-
tically significantly higher than expected at random
(P < 0.001). To identify more characteristics of miR-
NA clusters in plant genomes, we defined 10 kb as the
maximum inter-miRNA distance for two miRNA
genes to be considered as clustered because 26.94% of
the miRNA gene-pair distances were < 10 kb and
extending the threshold to 100 kb added relatively few
miRNA gene pairs. Furthermore, the relatively small
distance prevented overestimation of the number of
clusters and made our analysis more stringent. Accord-
ing to this definition, we examined the characteristics
of potential clusters within maximum inter-miRNA
distance of 2-,5- and 10-kb (Table 1). Our study
revealed that the number of members in miRNA clus-
ters at very short gene-pair distances in O. sativa and
Sorghum bicolor was significantly larger than in A. tha-
liana and Populus trichocarpa (P < 0.01; two-sample
t-test). This may suggest that miRNA clusters in
monocots are larger than those in eudicots. This spe-

cific clustering pattern of miRNAs may be indicative
of functional divergence of the miRNA cluster in
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
930 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
miRNA-mediated gene regulation between monocots
and eudicots. Furthermore, miRNA clusters in plants
are frequently found to have smaller size of cluster
compared with miRNA clusters in animals (P < 0.01;
two-sample t-test). In animals, a large proportion of
known miRNAs are arranged in clusters. For example,
48% of human miRNAs appear as clusters within a
maximum inter-miRNA distance of 10 kb [21] and
50% of miRNAs appear as clusters within a maximum
inter-miRNA distance of 3 kb in the zebrafish genome
[22]. In contrast to patterns of clustering found in ani-
mal miRNAs, only a small proportion of plant miR-
NAs (25.35% in A. thaliana, 17.09% in P. trichocarpa,
22.29% in O. sativa and 21.62% in S. bicolor) were
found to be clustered within a 10-kb region in our
study. It has been demonstrated that miRNA families
are preferentially expressed in eudicots relative to
monocots [23]. Our analysis further indicated that
most plant miRNA clusters are composed of family
members and are located in intergenic regions, which
is consistent with previous studies in plants [10,24,25].
Our results imply that the size of the miRNA cluster
may contribute to preferential expression in eudicots
relative to monocots. Li et al. [25] suggested that the
co-transcription of similar or identical miRNAs in
clusters for plants may be involved in gene dosage

effect.
Analysis of the core promoter of the class II
promoter in plant miRNA genes
miRNA genes were determined to be part of the poly-
cistronic transcript if the pairwise distance of two
miRNAs on the same chromosome was < 10 kb. For
miRNAs in polycistronic transcripts, only sequences
upstream of the 5¢ pre-miRNAs and downstream of
the 3¢ pre-miRNAs were chosen to represent the poly-
cistronic transcript. As described in the Materials and
methods, we used the TSSP-TCM program to initially
search for the putative core promoter of the class II
promoter occurring in 2-kb upstream sequences of
miRNAs in the four plant species studied. We identi-
fied 130 (77.8%) miRNAs in A. thaliana, 145 (89%)
miRNAs in P. trichocarpa, 233 (71.5%) miRNAs in
O. sativa and 102 (81.6%) miRNAs in S. bicolor to
contain the core promoter of the class II promoter,
suggesting that a significant proportion of plant
miRNA genes have resident Pol II promoters in
upstream regions. It is generally accepted that miRNA
genes located in the intronic regions as part of the host
Fig. 1. Cumulative distance distribution of miRNA genes and protein-coding genes in four plant species. The neighbour distances between
every two same-strand miRNA genes or protein-coding genes in the same chromosome were calculated. The distance is drawn on a loga-
rithmic scale.
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 931
gene are expressed from the host gene promoters
[26,27]. However, a recent study on intergenic ⁄ intronic
and conserved ⁄ nonconserved miRNA genes in rice

revealed that several intronic miRNA genes in rice
have a class II promoter, and rice miRNAs with more
than one promoter appear to be conserved [10], thus
implying that different sequence characteristics may be
presented in upstream regions of different miRNA
genes in plants. To further explore the promoter archi-
tecture of different miRNAs in plants and the relation-
ship between the number of Pol II promoters and the
degree of conservation of miRNAs, we classified four
plant miRNA genes into two types (intergenic miR-
NAs and intragenic miRNAs) based on their genomic
locations. Then, the miRNAs from the four plant spe-
cies studied were divided into three groups (based on
evolutionary conservation across all plant species, as
described in the Materials and methods): highly con-
served miRNAs, low conserved miRNAs and noncon-
served miRNAs. The results are summarized in Fig. 2.
As shown in Fig. 2A, we found a significant difference
between intergenic miRNAs and intragenic miRNAs
in the numbers of class II promoters in the upstream
regions (P < 0.001; two-sample t-test). The miRNAs
lying between protein-coding genes usually contained
more class II promoters in their upstream sequences
(on average 1.4 per miRNA) than those miRNAs lying
within the introns (on average 0.7 per miRNA) in the
four plant species studied. These results strongly indi-
cate that most intergenic miRNAs are transcribed by
RNA polymerase II in plants, and provide additional
evidence that a significant proportion of intragenic
miRNAs have Pol II promoters. It suggests that these

intragenic miRNAs may be transcribed as an indepen-
dent unit from their own promoter. However, in
plants, a small number of miRNAs with no class II
promoter may be transcribed through other transcrip-
tional mechanisms, such as the host gene promoter.
Further studies carried out to explore whether there is
a relationship between the number of Pol II promoters
and the degree of miRNA conservation revealed that
the number of Pol II promoters in the upstream
sequences of highly conserved miRNAs was signifi-
cantly higher than in low conserved (P < 0.001; two-
sample t-test) and in nonconserved (P < 0.001; two-
sample t-test) miRNAs. As shown in Fig. 2B, only
13.67% of highly conserved miRNAs had no Pol II
promoter, which is significantly lower than in low con-
served miRNAs (31.14%) (P < 0.01; Fisher’s exact
test) and in nonconserved miRNAs (26.76%)
(P < 0.05; Fisher’s exact test). On the contrary,
50.13% of highly conserved miRNAs have at least two
Pol II promoters, whereas only 27.38% of low con-
Table 1. Characterization of miRNA clusters in four plant species.
Species miRNAs
a
2 kb 5 kb 10 kb
Cluster
b
Members
c
Average
d

Distances
e
Percentage
f
Cluster Members Average Distances Percentage Cluster Members Average Distances Percentage
A. thaliana 213 15 32 2.13 0.67 15.02% 17 42 2.47 1.40 19.72% 21 54 2.57 2.69 25.35%
P. trichocarpa 234 9 18 2 0.45 7.69% 14 33 2.36 2.33 14.1% 17 40 2.35 3.11 17.09%
O. sativa 462 26 74 2.85 0.48 16.02% 28 90 3.21 1.08 19.48% 33 103 3.12 1.82 22.29%
S. bicolor 148 6 18 3 0.36 12.16% 10 27 2.7 1.14 18.24% 12 32 2.67 1.86 21.62%
a
The number of miRNA genes studied in four plant species.
b
The number of predicted clusters.
c
The number of miRNA genes located in clusters.
d
The average number of miRNA
genes in a cluster for the four plant species.
e
The average distance between two miRNA genes in a cluster.
f
The percentage of miRNA genes located in clusters.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
932 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
served miRNAs (P < 0.001; Fisher’s exact test) and
23.94% of nonconserved miRNAs (P < 0.0001; Fish-
er’s exact test) have at least two Pol II promoters.
However, there was no significant difference in the
number of Pol II promoters in upstream sequences
between low conserved and nonconserved miRNAs in

plants. Taken together with the findings of the study
performed by Cui et al. [10], our results provide a
more comprehensive understanding of the relationship
between the number of Pol II promoters and the
degree of miRNA conservation in plant genomes.
Highly conserved miRNAs may be associated with
more Pol II promoters (on average 1.72 per miRNA)
than low conserved and nonconserved miRNAs (on
average 1.13 and 1.05 per miRNA, respectively) in
plants. It has been demonstrated that the highly con-
served miRNAs are likely to be central regulators and
are highly expressed [28,29]. The results of one study
suggested that less conserved miRNAs rarely had obvi-
ous effects on plant morphology [30]. Therefore, we
speculate that the increased number of Pol II promot-
ers located in the upstream regions of highly conserved
miRNAs may have an important effect on the high
levels of expression of highly conserved miRNAs.
To further characterize the putative core promoter
of the Pol II promoter in the upstream sequences of
S. bicolor
O. sativa
40%
50%
A
B
20%
30%
Intragenic
Intergenic

Intragenic
Intergenic
Intragenic
Intergenic
Intragenic
Intergenic
0%
10%
40%
50%
60%
20%
30%
0%
10%
40%
50%
20%
30%
0%
10%
40%
50%
20%
30%
0%
10%
P. trichocarpa
A. thaliana
012


3
012

3
012

3
012

3
The number of core promoter The number of core promoter
The number of core promoter
012

3
The number of core promoter
The number of core promoter
100%
60%
80%
Non-conserved
Low conserved
Highly conserved
20%
40%
0%
Fig. 2. Distribution of miRNA genes with
the same number of putative core promot-
ers. (A) The percentage of miRNA genes

occurring between protein-coding genes or
within the introns in four plant species.
(B) The percentage of miRNA genes with
different degrees of conservation.
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 933
plant miRNAs, we examined the distribution of the
putative core promoter in 2-kb upstream regions of
miRNAs in the four species of plant studied. In these
four plant species, the vast majority of the predicted
core promoters of the Pol II promoters were found to
lie within a 900-bp region upstream of the miRNAs.
Distribution analysis of core promoter localization in
2-kb regions upstream of the miRNAs from the four
species of plants studied showed that 50.4% of the
putative core promoters of the Pol II promoter were
located within 0–1 kb, 26.8% were located within 1–
1.5 kb and 22.8% were located within 1.5–2 kb,
respectively of the miRNA. A recent study on rice
(O. sativa) suggested that the majority of TSSs and
TATA-boxes are found within 0–400 bp upstream of
the miRNA [10]. Here, we found a similar distribution
of the putative core promoter in upstream regions of
miRNAs in four plant species. As shown in Fig. 3A, a
significant number of putative core promoters of the
Pol II promoter were found to be located within the
400-bp upstream regions in three plant species,
although the putative promoters in O. sativa were dis-
tributed mainly from 0 to 0.4 kb and from 1.6 to 2 kb.
Together, these results indicate that this distribution

pattern of putative core promoters seems to be con-
served in the 2-kb region upstream of miRNAs in dif-
ferent plant species, and provide additional evidence
that the core promoter regions of most miRNAs are
close to pre-miRNA hairpins in plants. Fig. 3B shows
the distribution of the core promoter in upstream
sequences in view of the evolutionary conservation of
plant miRNAs. We found that the distribution pattern
of the core promoter in upstream regions was different
between highly conserved miRNAs and low conserved
or nonconserved miRNAs. Highly conserved miRNAs
tend to contain more core promoters within the 400-
bp region upstream of the miRNA. However, core
promoters are distributed mainly in the 0 to )0.4 kb,
)0.8 to )1.2 kb and )1.6 to )2 kb regions upstream
of low conserved miRNAs, and, in contrast, core pro-
moters are evenly distributed in upstream regions of
nonconserved miRNAs. These results suggest that
there is a relationship between the distribution pattern
of core promoters and the degree of miRNA conserva-
tion in plants. Based on these observations, we pro-
pose that the core promoter of Pol II promoters in the
close proximal promoter region of miRNAs may play
a more effective, or even a greater, role for efficient
transcription initiation.
Analysis of specific sequence motifs in four plant
species
To further identify specific characteristic motifs in the
flanking regions of miRNAs in four plant species, we
performed motif analysis to search for over-repre-

sented and statistically significant motifs in the flank-
ing regions up to 2 kb upstream and 1 kb downstream
from the miRNA stem–loop sequences. First of all, we
used RepeatMasker with default settings to mask
repeats in all upstream and downstream sequences,
and then used two motif-finding tools – MEME and
MotifSampler – to identify over-represented motifs.
Finally, we carried out whole-genome Monte Carlo
simulation analysis to assess the specificity and signifi-
cance of motifs identified, as described in the Materials
and methods. Motifs whose Z-scores were > 2.0 were
considered as over-represented and statistically signifi-
cant motifs. Several significantly over-represented spe-
cies-specific motifs were identified in the flanking
regions of four plant species. All the species-specific
0%
10%
20%
30%
40%
50%
A
B
S. bicolor
O. sativa
A. thaliana
P. trichocarp
a
0%
10%

20%
30%
40%
–0.4 kb –0.8 kb –1.2 kb –1.6 kb –2 kb
–0.4 kb –0.8 kb –1.2 kb –1.6 kb –2 kb
Highly conserved
Low conserved
Non-conserved
Fig. 3. Histograms of distances between putative core promoters
and miRNA stem–loop sequences. The horizontal axis shows the
positions of putative core promoters with respect to the corre-
sponding miRNA stem–loop sequences, and the vertical axis shows
the percentage of putative core promoters at the specified posi-
tions. (A) Percentage of putative core promoters at the specified
positions in different plant species. (B) Percentage of putative core
promoters at the specified positions for miRNAs with a different
degree of conservation.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
934 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
motifs found in the four plant species studied are
shown in Table 2. The motif M2, represented by the
consensus sequence TTAGGGTTTC, has also been
found in A. thaliana by Zhou et al. [11]. Moreover, we
also discovered a novel motif – M1 – with a Z-score
value of 10.62 that is specific to A. thaliana. In order
to gain a deeper insight into the function of these spe-
cies-specific motifs, we compared our species-specific
motifs against known transcription factors in plants
from the PlantCARE database [31]. Only one motif
(M5) was already a known transcription factor in

plant promoters. We found that M5, with the consen-
sus sequence GCATGCATGC, is an RY cis-acting
regulatory element involved in seed-specific regulation
in both monocot and eudicot species of plants [32,33].
Although the functions of other species-specific motifs
are still unknown, we found that some motifs have
repeat sequences in their consensus. M5 has two copies
of GCAT, and M3, which can be considered as GCA-
repeats. Palindromic patterns have been found in the
binding sites of some transcription factors in plants
and animals [34,35]. In contrast to A. thaliana,
P. trichocarpa and S. bicolor , we could not detect any
significant species-specific motifs in the flanking
regions of miRNAs in O. sativa, although a previous
study has identified three specific motifs in the promot-
ers of miRNAs in O. sativa [11]. Our analysis suggests
that these species-specific motifs are associated with
different specific functions, and may play an important
role in species-specific transcriptional regulation net-
works of miRNA genes or contribute to the formation
of species-specific miRNAs in plants. However, their
functions need to be investigated in further studies.
Furthermore, these species-specific motifs will be useful
in the computational identification of species-specific
miRNAs in plants.
Table 2. Significantly over-represented species-specific sequence motifs identified in the flanking regions of the three plant species studied.
Species Index
Consensus
sequence
a

Motif logo
b
E-value
c
Z-score
d
A. thaliana M1 GGCCTGAGCC 1.4e-008 10.62
M2 TTAGGGTTTC
2.4e-009 4.31
P. trichocarpa M3 GCAGCAGAAG
7.2e-006 6.21
M4 CGGGTCAAAC
3.6e-016 4.45
S. bicolor M5 GCATGCATGC
2.7e-030 5.86
M6 GAACTAAACA
2.1e-019 3.53
a
The consensus sequence represents a sequence of the most frequent base at each position.
b
The motif logos show the information con-
tent present at each position in the sequence.
c
The expected frequencies of motifs in a random database of the same size.
d
The Z-score
value was obtained by whole-genome Monte Carlo simulation analysis.
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 935
The mechanism by which new plant miRNAs origi-

nate is not fully understood. It is believed that the ori-
gin of new plant miRNAs is dependent on duplication
and inversion events [36–38]. However, several lines of
evidence have also suggested that new plant miRNA
genes can arise from foldback sequences, which are
under the control of transcriptional regulatory
sequences [39,40]. In order to determine whether some
significantly over-represented sequence motifs are
related to the degree of conservation of miRNA genes
in plants, we classified the miRNA genes of four plant
species into highly conserved miRNAs, low conserved
miRNAs and nonconserved miRNAs, as described in
the Materials and methods. We then examined the
upstream sequences and downstream sequences of
these miRNA genes to reveal characteristic sequence
motifs. Several significantly over-represented motifs
associated with the degree of miRNA conservation are
identified and listed in Table 3. Two motifs (CAT-
GCATGCA and CTAGCTAGCT; M1 and M2,
respectively), which have repetitive and palindromic
patterns in their consensus sequences, were found to
be significantly over-represented in highly conserved
plant miRNAs and therefore these motifs can be con-
sidered as CATG repeats and CTAG repeats, respec-
tively. However, we did not find any significantly over-
represented sequence motifs in the flanking sequences
of nonconserved miRNAs in the four plant species. In
contrast to nonconserved miRNA genes that have a
single copy, conserved miRNA genes are usually multi-
copy [25]. miRNAs that are highly conserved across

plant species must have originated a long time ago and
experienced many genome-duplication events. It has
been shown that the duplication events for miRNA
gene evolution in plants not only involve the region
that is transcribed but also the miRNA promoter
regions [41,42]. This might indicate that these signifi-
cantly over-represented sequence motifs in highly con-
served and low conserved miRNAs are evolutionarily
related elements that play important functional roles in
evolutionarily conserved regulatory systems in plants
or are associated with duplication events for miRNA
gene evolution in plants, although the functionality of
these computationally identified conserved motifs
remains to be experimentally validated.
Conclusions
In this study, we concentrated our efforts on clustering
patterns and flanking characteristics that might be
involved in the transcriptional regulation and process-
ing of plant miRNAs, including the miRNAs located
in the intergenic area and in the protein-coding area
whose possible sequence characteristics were not stud-
ied earlier. Previous studies have revealed that miR-
NAs located in close genomic proximity to each other
are co-transcribed as polycistronic units [24,43,44].
Therefore, we performed genome-wide analysis to
examine the clustering patterns of the miRNAs in four
species of plant. The pairwise distance analysis results
of same-strand consecutive miRNAs suggested that the
distances between the four plant miRNAs are statisti-
cally significantly higher than expected at random

(P < 0.001). Comparison of the miRNA pair distances
with the pair distances of protein-coding genes
revealed that plant miRNAs are more clustered than
Table 3. Significantly over-represented sequence motifs related to the conservation of miRNAs.
Conservation Index Consensus sequence
a
Motif logo
b
E-value
c
Z-score
d
Highly M1 CATGCATGCA 3.6e-019 7.82
M2 CTAGCTAGCT
1.6e-024 5.76
Low M3 TGGCGGGAAA
24e-014 4.32
a
The consensus sequence represents a sequence of the most frequent base at each position.
b
The motif logos show the information con-
tent present at each position in the sequence.
c
The expected frequencies of motifs in a random database of the same size.
d
The Z-score
value obtained by whole-genome Monte Carlo simulation analysis.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
936 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
protein-coding genes in the very short pairwise dis-

tances of < 1 kb. Then, we characterized the putative
core promoter of Pol II promoters in plant miRNA
upstream sequences. Our results suggest that most
plant miRNAs contain the core promoter of Pol II
promoters that are close to pre-miRNA hairpins.
Analysis of promoter architecture for different miRNA
genes in plants reveals significant differences in the
number and distribution of core promoters between
intergenic miRNAs and intragenic miRNAs, and
between highly conserved miRNAs and low or non-
conserved miRNAs. We applied two motif-finding
tools to search for over-represented, statistically signifi-
cant sequence motifs in the flanking regions of
miRNAs in different plant species. Six motifs were
found to be species-specific motifs in three plant
species and included some previously known species-
specific motifs and some novel species-specific motifs.
We also identified three specific motifs associated with
the degree of miRNA conservation.
Compared with previous studies, our study system-
atically explored clustering patterns and the character-
istics of flanking regions up to 2 kb upstream and 1 kb
downstream of miRNA stem–loop sequences, and
extended the results on a small number of miRNAs in
A. thaliana and in O. sativa to all known miRNAs in
four plant species. It remains largely unknown whether
there are some motifs related to the degree of conser-
vation of miRNAs. In order to dissect this question,
we classified the miRNA genes of the four plant
species studied into three groups, according to their

conservation, and examined characteristic sequence
motifs in the flanking sequences of these miRNA
genes. Several significant motifs appeared to be related
to the degree of miRNA conservation. We hope that
our results can contribute to gaining a better under-
standing of transcriptional regulation and process-
ing of miRNAs and provide useful data for further
computational identification of miRNAs in plants.
Also, we anticipate that these motifs related to the
degree of miRNA conservation may be useful for
understanding the mechanism of the origin of new
plant miRNAs.
Materials and methods
Data sets
To obtain the upstream and downstream sequences of plant
miRNA genes, we chose four species of plant (A. thaliana,
P. trichocarpa, O. sativa and S. bicolor) to study clustering
patterns and sequence characteristics in the flanking regions
of plant miRNA genes because the number of miRNA
genes in these four plant species is relatively large and the
genome sequences are relatively complete. All known
miRNAs and genome coordinates in these four plant spe-
cies were downloaded from the miRBase Sequence Data-
base, release 16 ( [45]. The genome
sequences and the protein-coding genes of A. thaliana and
S. bicolor were downloaded from MapViewer in National
Center for Biotechnology Information (i.
nlm.nih.gov/). The genome sequences of P. trichocarpa and
O. sativa and the protein-coding genes were downloaded
from the Poplar site on Phytozome v6.0 (P. trichocarpa

v2.0) ( [23] and TIGR
Oryza Pseudomolecules (version_6.0) [46], respectively.
Then, we extracted sequences up to 2 kb upstream and up
to 1 kb downstream from all available miRNA precursors
in the four plant species. A detailed description of the data
set used in our study is shown in Table 4.
Conservation analysis of miRNA in the four plant
species studied
To determine the degree of conservation of miRNA in the
four plant species, we performed a sequence-based homol-
ogy search for known miRNAs from the four plants to
detect both closely related and distantly related homo-
logues. First, known miRNA hairpin sequences from the
four plants were aligned against all known miRNA hairpin
sequences in monocots and eudicots using standalone
BLAST (blastn, version 2.2.27). The hairpin sequences were
considered as homologues when they exhibited a minimum
sequence identity of 85% over an alignment length of at
least 90%. Second, ClustalW [47] was used to compare
mature miRNA sequences for a search of homologues. We
adopted mature miRNA sequences matching at least 18
nucleotides and left 0–3 nucleotides for possible sequence
Table 4. Detailed description of the data set in our study.
Species
Version of genome
annotation
No. of
miRNAs
No. of polycistronic
transcripts

No. of upstream
sequences
No. of downstream
sequences
A. thaliana TAIR9 213 21 167 167
P. trichocarpa JGI_Poptr2.0 234 17 163 163
O. sativa MSU6.0 462 33 326 326
S. bicolor JGI_sbi1 148 12 125 125
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 937
variations [19]. Finally, we divided the miRNAs of the four
plant species into three groups: the miRNAs whose homo-
logues were found simultaneously in monocots and eudicots
were considered as highly conserved miRNAs; those found
only in monocots or eudicots were considered as low con-
served miRNAs; and those found only in one species were
considered as nonconserved miRNAs.
Analysis of clustering patterns
To study the clustering patterns of miRNA genes in differ-
ent plant species, we computed the neighbour distances
between every two same-strand consecutive miRNA genes
in the same chromosome. The average distance of the
neighbour miRNA pairs was calculated across all chromo-
somes in the four plant species studied. To evaluate the sta-
tistical significance of miRNA clustering patterns in the
four plant species, we performed a sampling approach to
evaluate significance. First, we selected random positions
whose number was equal to the number of miRNA genes
on each chromosome. Then we computed the neighbour
distances between consecutive random points and the aver-

age. By random shuffling 1000 times, we set the P value as
the fraction of times for which the random averages were
smaller (or larger) than the average distances of miRNA
pairs to evaluate the statistical significance for clustering
patterns in the four plant species.
Prediction of the core promoter of the plant
miRNA gene
The core promoter of the class II promoter, including the
TSS and the TATA-box, in the upstream sequences of plant
miRNA genes were detected using the TSSP-TCM program
( with
its default parameters; this program is well established
and is the most commonly used plant promoter prediction
software [48].
Motif analysis
To identify characteristic motifs in the flanking regions for
microRNA genes in the four plant species, we first used
RepeatMasker (version 3.2.9; )
with default settings to mask repeats in all upstream and
downstream sequences. Then we applied the MEME Suite
software (version 4.3.0; which is a
probabilistic local alignment tool [49]. The significance of
a detected motif was represented by the E-value, which
refers to the expected number of motifs of equal width
with the same or higher likelihood in a random sequence
set with the same size and nucleotide composition as the
considered set of sequences. Here, MEME was used to
identify 10 top-ranking motifs for each species with a
width of 10 bp. All other options were left as default.
Furthermore, we also applied MotifSampler, which is

based on Gibbs sampling [50], to find over-represented
motifs. MotifSampler is a stochastic algorithm and the
results may vary for different runs. Therefore, we carried
out 50 repeated runs of MotifSampler for each analysis.
The number of different motifs was set to 10 and the
width of the motifs was set to 10. All other options were
set at a variety of arguably sensible settings. The results of
these two programs were integrated to identify motifs that
are frequently reported to have a low E-value among these
settings and among both motif-finding tools in the flanking
regions of the microRNA genes from the four plant spe-
cies. Sequence logos for all motifs found by these two pro-
grams were created using WebLogo Version 2.8.2 (http://
weblogo.berkeley.edu) [51].
In order to determine whether a motif is statistically sig-
nificant in the flanking regions of plant miRNA genes,
whole-genome Monte Carlo simulation, resulting in a
Z-score, was used to take into account the specificity and
significance of a motif, as previously described by
Zhou et al. [11]. For a given motif, we first obtained the
average number of occurrences per target sequence, denoted
as Nt, and then randomly generated the same number of ref-
erence sets from protein-coding genes and an intergenic
sequence, far upstream of the miRNA, as an appropriate
background. Next, the MEME motifs were individually
aligned using the MAST program with default values [52] to
the reference sets to compute the average number of occur-
rences of a motif, Nr, and its standard deviation, rr, over
the reference sets. The Z-score was computed as Z =
(Nt ⁄ Nr)=rr, which measures the normalized difference

between the average occurrence of the motif in the target set
and the sample mean in the reference sets [11].
Acknowledgements
This work was supported in part by the National
Natural Science Foundation of China (grant nos
30871394, 30600367 and 30571034), the National
High Tech Development Project of China, the 863
Program (grant nos 2007AA02Z329), the National
Basic Research Program of China, the 973 Program
(grant nos 2008CB517302) and the National Science
Foundation of Heilongjiang Province (grant nos
ZJG0501, 1055HG009, GB03C602-4, JC2007H and
BMFH060044).
References
1 Voinnet O (2009) Origin, biogenesis, and activity of
plant microRNAs. Cell 136, 669–687.
2 Chen X (2008) MicroRNA metabolism in plants. Curr
Top Microbiol Immunol 320, 117–136.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
938 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS
3 Ambros V (2004) The functions of animal microRNAs.
Nature 431, 350–355.
4 Singh SK, Pal Bhadra M, Girschick HJ & Bhadra U
(2008) MicroRNAs–micro in size but macro in function.
Febs J 275, 4929–4944.
5 Lee RC, Feinbaum RL & Ambros V (1993) The
C. elegans heterochronic gene lin-4 encodes small RNAs
with antisense complementarity to lin-14. Cell 75, 843–
854.
6 Park W, Li J, Song R, Messing J & Chen X (2002)

CARPEL FACTORY, a Dicer homolog, and HEN1,
a novel protein, act in microRNA metabolism in
Arabidopsis thaliana. Curr Biol 12 , 1484–1495.
7 Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B &
Bartel DP (2002) MicroRNAs in plants. Genes Dev 16,
1616–1626.
8 Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA &
Carrington JC (2005) Expression of Arabidopsis
MIRNA genes. Plant Physiol 138, 2145–2154.
9 Bartel DP (2004) MicroRNAs: genomics, biogenesis,
mechanism, and function. Cell 116, 281–297.
10 Cui X, Xu SM, Mu DS & Yang ZM (2009) Genomic
analysis of rice microRNA promoters and clusters. Gene
431, 61–66.
11 Zhou X, Ruan J, Wang G & Zhang W (2007)
Characterization and identification of microRNA core
promoters in four model species. PLoS Comput Biol 3,
e37.
12 Megraw M, Baev V, Rusinov V, Jensen ST, Kalantidis
K & Hatzigeorgiou AG (2006) MicroRNA promoter
element discovery in Arabidopsis. RNA 12, 1612–1619.
13 Heikkinen L, Asikainen S & Wong G (2008) Identifica-
tion of phylogenetically conserved sequence motifs
in microRNA 5¢ flanking sites from C. elegans and
C. briggsae. BMC Mol Biol 9, 105.
14 Inouchi A, Shinohara S, Inoue H, Kita K & Itakura M
(2007) Identification of specific sequence motifs in the
upstream region of 242 human miRNA genes. Comput
Biol Chem 31, 207–214.
15 Ohler U, Yekta S, Lim LP, Bartel DP & Burge CB

(2004) Patterns of flanking sequence conservation and a
characteristic upstream motif for microRNA gene iden-
tification. RNA 10, 1309–1322.
16 Fujita S & Iba H (2008) Putative promoter regions of
miRNA genes involved in evolutionarily conserved
regulatory systems among vertebrates. Bioinformatics
24, 303–308.
17 Zhou M, Wang Q, Sun J, Li X, Xu L, Yang H, Shi H,
Ning S, Chen L, Li Y et al. (2009) In silico detection
and characteristics of novel microRNA genes in the
Equus caballus genome using an integrated ab initio and
comparative genomic approach. Genomics 94, 125–131.
18 Yue J, Sheng Y & Orwig KE (2008) Identification of
novel homologous microRNA genes in the rhesus
macaque genome. BMC Genomics 9,8.
19 Sunkar R & Jagadeeswaran G (2008) In silico identifi-
cation of conserved microRNAs in large number of
diverse plant species. BMC Plant Biol 8, 37.
20 Lagos-Quintana M, Rauhut R, Meyer J, Borkhardt A
& Tuschl T (2003) New microRNAs from mouse and
human. RNA 9, 175–179.
21 Altuvia Y, Landgraf P, Lithwick G, Elefant N,
Pfeffer S, Aravin A, Brownstein MJ, Tuschl T &
Margalit H (2005) Clustering and conservation patterns
of human microRNAs. Nucleic Acids Res 33, 2697–
2706.
22 Thatcher EJ, Bond J, Paydar I & Patton JG (2008)
Genomic organization of zebrafish microRNAs. BMC
Genomics 9, 253.
23 Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigo-

riev I, Hellsten U, Putnam N, Ralph S, Rombauts S,
Salamov A et al. (2006) The genome of black cotton-
wood, Populus trichocarpa (Torr. & Gray). Science 313,
1596–1604.
24 Zhang B, Pan X, Cannon CH, Cobb GP &
Anderson TA (2006) Conservation and divergence of
plant microRNA genes. Plant J 46, 243–259.
25 Li A & Mao L (2007) Evolution of plant microRNA
gene families. Cell Res 17 , 212–218.
26 Baskerville S & Bartel DP (2005) Microarray profiling
of microRNAs reveals frequent coexpression with
neighboring miRNAs and host genes. RNA 11,
241–247.
27 Kim YK & Kim VN (2007) Processing of intronic
microRNAs. EMBO J 26, 775–783.
28 Lindow M & Krogh A (2005) Computational evidence
for hundreds of non-conserved plant microRNAs. BMC
Genomics 6, 119.
29 Hofmann NR (2010) MicroRNA evolution in the genus
Arabidopsis. Plant Cell 22, 994.
30 Todesco M, Rubio-Somoza I, Paz-Ares J & Weigel D
(2010) A collection of target mimics for comprehensive
analysis of microRNA function in Arabidopsis thaliana.
PLoS Genet 6, e1001031.
31 Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y,
Van de Peer Y, Rouze P & Rombauts S (2002)
PlantCARE, a database of plant cis-acting regulatory
elements and a portal to tools for in silico analysis of
promoter sequences. Nucleic Acids Res 30, 325–327.
32 Baumlein H, Nagy I, Villarroel R, Inze D & Wobus U

(1992) Cis-analysis of a seed protein gene promoter: the
conservative RY repeat CATGCATG within the legu-
min box is essential for tissue-specific expression of a
legumin gene. Plant J 2, 233–239.
33 Fujiwara T & Beachy RN (1994) Tissue-specific and
temporal regulation of a beta-conglycinin gene: roles of
the RY repeat and other cis-acting elements. Plant Mol
Biol 24, 261–272.
34 Olefsky JM (2001) Nuclear receptor minireview series.
J Biol Chem 276, 36863–36864.
M. Zhou et al. Clustering and flanking characteristics for plant miRNAs
FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS 939
35 Krawczyk S, Thurow C, Niggeweg R & Gatz C (2002)
Analysis of the spacing between the two palindromes of
activation sequence-1 with respect to binding to differ-
ent TGA factors and transcriptional activation poten-
tial. Nucleic Acids Res 30, 775–781.
36 Fahlgren N, Howell MD, Kasschau KD, Chapman EJ,
Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant
SR, Dangl JL et al. (2007) High-throughput sequencing
of Arabidopsis microRNAs: evidence for frequent birth
and death of MIRNA genes. PLoS ONE 2, e219.
37 Rajagopalan R, Vaucheret H, Trejo J & Bartel DP
(2006) A diverse and evolutionarily fluid set of microR-
NAs in Arabidopsis thaliana. Genes Dev 20 , 3407–3425.
38 Allen E, Xie Z, Gustafson AM, Sung GH, Spatafora
JW & Carrington JC (2004) Evolution of microRNA
genes by inverted duplication of target gene sequences
in Arabidopsis thaliana. Nat Genet 36, 1282–1290.
39 Felippes FF, Schneeberger K, Dezulian T, Huson DH

& Weigel D (2008) Evolution of Arabidopsis thaliana
microRNAs from random sequences. RNA 14, 2455–
2459.
40 Axtell MJ (2008) Evolution of microRNAs and their
targets: are all microRNAs biologically relevant?
Biochim Biophys Acta 1779, 725–734.
41 Haberer G, Hindemitt T, Meyers BC & Mayer KF
(2004) Transcriptional similarities, dissimilarities, and
conservation of cis-elements in duplicated genes of
Arabidopsis. Plant Physiol 136, 3009–3022.
42 Wang Y, Hindemitt T & Mayer KF (2006) Significant
sequence similarities in promoters and precursors of
Arabidopsis thaliana non-conserved microRNAs.
Bioinformatics 22, 2585–2589.
43 Guddeti S, Zhang DC, Li AL, Leseberg CH, Kang H,
Li XG, Zhai WX, Johns MA & Mao L (2005) Molecu-
lar evolution of the rice miR395 gene family. Cell Res
15, 631–638.
44 Allen E, Xie Z, Gustafson AM & Carrington JC (2005)
microRNA-directed phasing during trans-acting siRNA
biogenesis in plants. Cell 121, 207–221.
45 Griffiths-Jones S, Saini HK, van Dongen S
& Enright AJ (2008) miRBase: tools for microRNA
genomics. Nucleic Acids Res 36, D154–158.
46 Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M,
Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng
L et al. (2007) The TIGR Rice Genome Annotation
Resource: improvements and new features. Nucleic
Acids Res 35, D883–887.
47 Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ,

Higgins DG & Thompson JD (2003) Multiple sequence
alignment with the Clustal series of programs. Nucleic
Acids Res 31, 3497–3500.
48 Shahmuradov IA, Solovyev VV & Gammerman AJ
(2005) Plant promoter prediction with confidence
estimation. Nucleic Acids Res 33, 1069–1076.
49 Bailey TL & Elkan C (1994) Fitting a mixture model by
expectation maximization to discover motifs in biopoly-
mers. Proc Int Conf Intell Syst Mol Biol 2, 28–36.
50 Thijs G, Lescot M, Marchal K, Rombauts S, De Moor
B, Rouze P & Moreau Y (2001) A higher-order
background model improves the detection of promoter
regulatory elements by Gibbs sampling. Bioinformatics
17, 1113–1122.
51 Crooks GE, Hon G, Chandonia JM & Brenner SE
(2004) WebLogo: a sequence logo generator. Genome
Res 14, 1188–1190.
52 Bailey TL & Gribskov M (1998) Combining evidence
using p-values: application to sequence homology
searches. Bioinformatics 14, 48–54.
Clustering and flanking characteristics for plant miRNAs M. Zhou et al.
940 FEBS Journal 278 (2011) 929–940 ª 2011 The Authors Journal compilation ª 2011 FEBS

×