Tải bản đầy đủ (.pdf) (21 trang)

Báo cáo y học: "Phylogenetic and structural analysis of centromeric DNA and kinetochore proteins" doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (978.67 KB, 21 trang )

Genome Biology 2006, 7:R23
comment reviews reports deposited research refereed research interactions information
Open Access
2006Meraldiet al.Volume 7, Issue 3, Article R23
Research
Phylogenetic and structural analysis of centromeric DNA and
kinetochore proteins
Patrick Meraldi
¤
*†
, Andrew D McAinsh
¤
*‡
, Esther Rheinbay
*
and
Peter K Sorger
*
Addresses:
*
Department of Biology, Massachusetts Institute of Technology, Massachusetts Ave., Cambridge, MA 02139, USA.

Institute of
Biochemistry, ETH Zurich, Schafmattstr.,18 CH-8093 Zurich, Switzerland.

Chromosome Segregation Laboratory, Marie Curie Research
Institute, The Chart, Oxted, Surrey RH8 0TL, UK.
¤ These authors contributed equally to this work.
Correspondence: Peter K Sorger. Email:
© 2006 Meraldi et al.; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License ( which


permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Kinetochore evolution<p>Analysis of centromeric DNA and kinetochore proteins suggests that critical structural features of kinetochores have been well con-served from yeast to man.</p>
Abstract
Background: Kinetochores are large multi-protein structures that assemble on centromeric
DNA (CEN DNA) and mediate the binding of chromosomes to microtubules. Comprising 125
base-pairs of CEN DNA and 70 or more protein components, Saccharomyces cerevisiae
kinetochores are among the best understood. In contrast, most fungal, plant and animal cells
assemble kinetochores on CENs that are longer and more complex, raising the question of whether
kinetochore architecture has been conserved through evolution, despite considerable divergence
in CEN sequence.
Results: Using computational approaches, ranging from sequence similarity searches to hidden
Markov model-based modeling, we show that organisms with CENs resembling those in S. cerevisiae
(point CENs) are very closely related and that all contain a set of 11 kinetochore proteins not found
in organisms with complex CENs. Conversely, organisms with complex CENs (regional CENs)
contain proteins seemingly absent from point-CEN organisms. However, at least three quarters of
known kinetochore proteins are present in all fungi regardless of CEN organization. At least six of
these proteins have previously unidentified human orthologs. When fungi and metazoa are
compared, almost all have kinetochores constructed around Spc105 and three conserved multi-
protein linker complexes (MIND, COMA, and the NDC80 complex).
Conclusion: Our data suggest that critical structural features of kinetochores have been well
conserved from yeast to man. Surprisingly, phylogenetic analysis reveals that human kinetochore
proteins are as similar in sequence to their yeast counterparts as to presumptive Drosophila
melanogaster or Caenorhabditis elegans orthologs. This finding is consistent with evidence that
kinetochore proteins have evolved very rapidly relative to components of other complex cellular
structures.
Published: 22 March 2006
Genome Biology 2006, 7:R23 (doi:10.1186/gb-2006-7-3-r23)
Received: 19 October 2005
Revised: 19 December 2005
Accepted: 24 February 2006

The electronic version of this article is the complete one and can be
found online at />R23.2 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Background
Kinetochores are eukaryote-specific structures that assemble
on centromeric (CEN) DNA and perform three crucial func-
tions: they bind paired sister chromatids to spindle microtu-
bules (MTs) in a bipolar fashion compatible with chromatid
disjunction; they couple MT (+)-end polymer dynamics to
chromosome movement during metaphase and anaphase [1];
and they generate the spindle checkpoint signals linking ana-
phase onset to the completion of kinetochore-MT attachment
[2]. Despite the conservation of these functions, and of MT
structure and dynamics, CENs in closely related organisms
are highly diverged in sequence, as are CENs on different
chromosomes in a single organism [2,3]. The simplest known
CENs, those in the budding yeast Saccharomyces cerevisiae,
consist of 125 base-pairs (bp) of DNA and three protein-bind-
ing motifs (CDEI, CDEII and CDEIII) that are present on all
16 chromosomes [4]. These short CEN sequences, often called
'point' CENs, are structurally similar to enhancers and tran-
scriptional regulators in that their assembly is initiated by
highly sequence-selective DNA-protein interactions [5]. In
contrast, CEN DNA in fungi such as the budding yeast Cand-
ida albicans and fission yeast Schizosaccharomyces pombe,
plants such as Arabidopsis thaliana, and metazoans such as
Drosophila melanogaster and Homo sapiens, are longer and
more complex and exhibit poor sequence conservation [6-
10]. These regional CENs range in size from 1 kb in C. albi-
cans [6], to several megabases in H. sapiens [8] and typically
contain long stretches of repetitive AT-rich DNA. CEN organ-

ization is particularly divergent in nematodes such as
Caenorhabditis elegans, which contain holocentric CENs
with MT-attachment sites distributed along the length of
chromosomes [11]. Sequence-selective DNA-protein interac-
tions have not been identified in regional CENs and it is
thought that kinetochore position is determined by a special-
ized chromatin domain whose formation at one site on each
chromosome is controlled by epigenetic mechanisms [2,12].
A combination of genetics and mass spectrometry in S. cere-
visiae has yielded a fairly detailed view of the composition
and architecture of its simple kinetochores. S. cerevisiae
kinetochores contain upwards of 70 protein subunits organ-
ized into 14 or more multi-protein complexes that together
have a molecular mass in excess of 5 to 10 MDa [5]. S. cerevi-
siae kinetochore proteins can be assigned to DNA-binding,
linker, MT-binding and regulatory functions. While 'linker
protein' is used rather loosely, all linkers exhibit a clear hier-
archical relationship with respect to DNA and MT-binding
proteins: linker proteins require DNA binding proteins, and
possibly also other linker proteins, for CEN DNA binding but
not MTs or MT-associated proteins (MAPs).
Kinetochore assembly in S. cerevisiae is initiated by associa-
tion of the essential four-protein CBF3 complex with the
CDEIII region of CEN DNA. CBF3-CDEIII association then
recruits several additional DNA binding proteins, including
scCse4, a specialized histone H3 found only at CENs
(CenH3). CenH3-containing nucleosomes are thought to be
core components of all kinetochores [13]. When CEN associ-
ated, the DNA binding subunits of S. cerevisiae kinetochores
recruit four essential multi-protein linker complexes, the

NDC80 complex (four proteins), COMA (four proteins),
MIND (four proteins) and the SPC105 complex (two pro-
teins). These complexes, in turn, recruit a multiplicity of
motor proteins and MAPs to form a fully functional MT-
attachment site (P De Wulf and PK Sorger, unpublished
observation) [14-16].
A key question in the study of kinetochores is whether archi-
tectural features currently being elucidated in S. cerevisiae
are conserved in higher cells. Some S. cerevisiae proteins
have been shown to have orthologs in one or more metazoa.
These metazoan orthologs include CenH3, CENP-C
Mif2
,
Mis6
Ctf3/CENP-I
, Spc105
KNL-1/Kia1570
, members of the NDC80
and MIND complexes as well as MT-associated proteins such
as EB1
Bim1
and CLIP170
Bik1
, Mad-Bub spindle checkpoint pro-
teins and some regulatory kinases [2,17-26]. To date, how-
ever, only CenH3 and CENP-C have been carefully compared
at a sequence level in a wide range of organisms [27]. Here we
report a systematic analysis of sequence relationships among
a set of approximately 50 fungal, plant and metazoan kineto-
chore proteins with the overall aim of exploring their struc-

tural and evolutionary relationships. Our analysis supports
the conclusion that the four linkers at the core of S. cerevisiae
kinetochores, the NDC80 complex, MIND, COMA, and the
SPC105 complex, have been conserved through eukaryotic
evolution. A subset of kinetochore proteins, perhaps 20% of
the total in S. cerevisiae, seems to be specific to point CENs,
all of which are very closely related. A second set of kineto-
chore proteins is found only on regional CENs. It appears,
therefore, that all kinetochores have a single ancestor, proba-
Point centromeres are derived from regional centromeres and appeared only once during evolutionFigure 1 (see following page)
Point centromeres are derived from regional centromeres and appeared only once during evolution. (a) The 16 CENs from S. cerevisiae were used to train
a HMM. The blue bar indicates the number of predicted point CENs in the genome and the red bar represents the number of known chromosomes. (b)
HMM from (a) was used to search the genome of fungi with known point CENs, known regional CENs and predicted point CENs. Blue and red bars are as
described in (a) except gray bars, which indicate the predicted number of chromosomes, based on synteny within other Saccharomyces species. (c)
Sequence comparison of the CDEI, CDEII and CDEIII elements from budding yeast with point centromeres. (d) Frequency distribution of the CDEII length
(measured in bp) in each budding yeast with point centromeres. (e) Evolutionary conservation of CBF3 subunits in fungi with point and regional CENs. (f)
Phylogenetic analysis of 17 different fungi, including the 7 budding yeast with point centromeres and the 3 budding yeast with regional centromeres using 3
highly conserved reference proteins (α-tubulin, the signal recognition protein SRP54 and the DNA replication factor PCNA). Blue branches represent
fungi with point centromeres and black branches those with regional centromeres.
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.3
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
Figure 1 (see legend on previous page)
(e)
0.1 changes/aa
(c)
Saccharomyces
cerevisiae
Candida
glabrata

Kluyveromyces
lactis
Eremothecium
gossypii
CDEI
CDEII
CDEIII
83-86 bp
>86%
73-78 bp
161-164 bp
164-167 bp
Consensus
Length AT content
>83%
>85%
>79%
>79%73-167 bp
point CEN
regional CEN
(a)
0 2 4 6 8 10121416
Saccharomyces
bayanus
Saccharomyces
mikatae
Saccharomyces
paradoxus
Predicted point CEN
(f)

Fungi
Point CEN Ndc10 Cep3 Ctf13
S. cerevisiae
C. glabrata
E. gossypii
K. lactis
S. pombe
C. albicans
A. nidulans
S. bayanus
S. mikatae
S. paradoxus
++++
++++
++++
++++
++++
++++
++++
Saccharomyces
paradoxus
Saccharomyces
mikatae
Saccharomyces
bayanus
-
-
-
-
-

-
-
-
-
-
-
-
Ctf3/Spc105
+
+
+
+
+
+
+
+
+
+
Saccharomyces
cerevisiae
Candida
glabrata
Eremothecium
gossypii
Kluyveromyces
lactis
Schizo-
saccharomyces
pombe
Candida

albicans
Aspergillus
nidulans
Known point CEN Known regional CEN
Number of predicted point CENs
Number of chromosomes Predicted number of chromosomes
0 2 4 6 8 10 12 14 16
(b)
0
1
2
bits
5?
1
G
2
G
T
3
G
A
T
4
C
T
5
A
T
6
A

T
7
G
8
C
T
A
9
G
C
T
10
A
T
11
A
T
12
C
13
C
14
G
15
A
16
A
17
G
C

A
3?
0
1
2
bits
5?
1
T
G
A
2
T
3
C
4
A
5
T
C
6
A
G
7
T
8
G
9
T
C

A
10
C
A
T
3?
0
1
2
bits
5?
1
G
2
T
G
3
C
G
T
4
C
T
5
T
A
6
G
A
T

7
G
8
G
C
T
A
9
G
T
10
C
A
T
11
A
T
12
C
13
C
14
G
15
A
16
A
17
G
A

C
3?
0
1
2
bits
5?
1
G
A
2
G
T
3
C
4
A
5
G
C
T
6
C
A
G
7
T
8
G
9

A
10
C
T
3?
0
1
2
bits
5?
1
T
G
2
C
T
3
A
T
4
A
T
5
G
A
T
6
T
G
A

7
C
T
8
9
10
T
11
T
12
C
13
C
14
G
15
A
16
A
17
A
3?
0
1
2
bits
5?
1
T
A

2
G
T
3
C
4
A
5
T
C
6
G
7
T
8
G
9
C
A
10
3?
0
1
2
bits
5?
1
G
2
T

3
G
T
A
4
T
5
A
G
6
G
C
T
7
8
G
A
T
9
G
T
10
T
11
T
12
C
13
C
14

G
15
A
16
A
17
A
3?
0
1
2
bits
5?
1
A
2
G
T
3
C
4
A
5
T
C
6
A
G
7
T

8
G
9
A
10
T
A
C
3?
0
1
2
bits
5?
1
T
G
A
2
T
3
C
4
A
5
C
6
G
A
7

T
8
G
9
G
T
A
3?
0
1
2
bits
5?
1
G
C
T
A
2
A
G
T
3
G
4
G
T
5
A
T

6
C
T
7
G
C
A
T
8
C
A
T
9
C
G
10
T
C
G
A
11
A
C
T
12
G
A
T
13
T

14
C
15
C
16
G
17
A
18
A
3?
0
1
2
bits
5?
1
C
G
A
2
G
T
3
C
4
A
5
G
C

6
G
A
7
T
8
G
9
T
C
A
3?
0
1
2
bits
5?
1
C
A
2
A
T
3
G
4
G
T
5
A

T
6
C
T
7
G
A
T
8
A
C
T
9
A
G
10
T
C
A
11
A
C
T
12
C
T
13
T
14
C

15
C
16
G
17
A
18
A
3?
0
1
2
bits
5?
1
C
G
A
2
T
3
C
4
A
5
C
6
A
G
7

T
8
G
9
T
C
A
3?
0
1
2
bits
5?
1
G
A
T
2
G
A
T
3
G
4
G
T
5
G
A
T

6
T
7
G
A
T
8
G
A
T
9
T
G
10
T
C
A
11
C
A
G
T
12
C
A
T
13
A
T
14

C
15
C
16
G
17
A
18
A
3?
82-84 bp
81-85 bp
82-85 bp
>91%
>90%
>88%
0
1
2
bits
5?
1
C
G
T
A
2
G
A
T

3
A
G
4
G
T
5
G
A
T
6
C
T
7
G
A
T
8
G
C
A
T
9
10
11
12
G
A
13
14

15
A
T
16
A
T
17
G
A
T
18
A
T
C
G
19
G
C
T
A
20
A
C
G
T
21
C
A
T
22

A
T
23
C
24
C
25
G
26
A
27
A
3?
0
1
2
bits
5?
1
C
T
G
A
2
G
T
3
C
4
A

5
G
T
C
6
A
G
7
T
8
G
9
T
C
A
3?
18 20
18 20
Frequency (%)
(d)
Length of CDEII (bp)
0
10
20
30
40
50
60
70
80

90
S. cerevisiae
S. bayanus
S. mikatae
S. paradoxus
C. glabrata
E. gossypii
K. lactis
2
12
23
32
22
52
62
72
82
92
102
112
122
132
142
152
162
172
182
192
202
100

Saccharo-
mycotina
Basidio-
mycota
Pezizo -
mycotina
Candida glabrata
Saccharomyces cerevisiae
Kluyveromyces lactis
Eremothecium gossypii
Candida albicans
Debaryomyces hansenii
Yarrowia lipolytica
Ustil agomaydis
Cryptococcus neoformans
Fusarium graminearum
Neurospora crassa
Aspergillus nidulans
Schizosaccharomyces pombe
Magnaporthe grisea
100
Saccharomyces bayanus
Saccharomyces mikatae
Saccharomyces paradoxus
100
62
72
100
100
75

100
100
100
100
R23.4 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
bly based on a regional CEN, from which contemporary kine-
tochores diverged rapidly while conserving key structural
features.
Results
Point centromeres have a common origin
As a first step in determining relationships among kineto-
chores in different organisms, we searched fungal genomes
for point CENs similar in structure to those in S. cerevisiae.
Three such examples are already known, C. glabrata, E. gos-
sypii and K. lactis [28], but a significant number of newly
sequenced genomes have not yet been analyzed. Finding new
CENs with a CDEI-CDEII-CDEIII structure is not trivial
because the number of identical bases in CDEI and CDEIII is
relatively small, even among chromosomes in S. cerevisiae.
Moreover, CDEII is not conserved in sequence but, rather, is
characterized by high AT content and alternating runs of
poly-A and poly-T. To capture this information we con-
structed a tri-partite computational model based on profiles
for CDEI and CDEIII, a hidden Markov model (HMM) for
CDEII (Figure 1a), and S. cerevisiae CENs as a training set.
When the model was tested on C. glabrata, E. gossypii and K.
lactis, organisms whose genomes are fully annotated, 6/13
centromeres in C. glabrata, 6/7 centromeres in E. gossypii
and 6/6 in K. lactis were identified correctly (Figure 1b). Con-
versely, no point-CEN sequences were found in S. pombe, C.

albicans or A. nidulans, organisms known to have regional
CENs (Figure 1b). With a success rate of >70% and a false pos-
itive rate of <5%, we conclude that our computer model is
effective at finding point CENs.
When unannotated genomes were analyzed using the tri-par-
tite computational model, 15 CDEI-II-III sequences were
found in S. bayanus,14 in S. mikatae and 15 in S. paradoxus
(Figure 1b) [29]. S. bayanus, S. mikatae and S. paradoxus
contigs have not yet been fully assembled, but sequence sim-
ilarity and synteny suggest that all 3 have 16 chromosomes,
close to the number of putative CEN sequences identified
computationally in each organism. When these newly identi-
Table 1
Sequence similarities among selected fungal kinetochore proteins of point CEN
Location Complex Protein Ubiquitous Point CEN specific Similarity* Identity*
DNA-binding Monomer Mif2 + 65% 23%
?Sgt1 + 74%28%
CBF3 Cep3 + 65% 14%
Ctf13 + 53% 9%
Ndc10 + 48% 10%
Linker layer COMA Mcm21 + 45% 7%
Ctf19 + 47% 7%
Ame1 + 45% 9%
Okp1 + 51% 7%
MIND Nnf1 + 67% 14%
Nsl1 + 69% 15%
NDC80 Ndc80 + 73% 20%
Spc24 + 63% 6%
SPC105 Spc105 + 48% 5%
Ydr532C + 52% 6%

? Chl4 + 52% 11%
?Ctf3 + 51%6%
?Nkp1

+ 55% 6%
?Nkp2

+ 63% 6%
? Mcm16 + 52% 7%
? Mcm22 + 53% 4%
?Iml3

+ 24% 6%
?Cnn1 +40%4%
MT-binding DASH Ask1 + 43% 11%
Dam1 + 54% 6%
Regulatory ? Bub3 + 65% 18%
?Mad2 + 98%54%
*As determined from the proteins in S. cerevisiae, C. glabrata, E. gossyppii and K. lactis.

Instead of E. gossypii the sequences were derived from the very
closely related S. kluyveri.

Similarity was determined from proteins of the point CEN containing S. cerevisiae, S. kudriavzevii, K. waltii and S. kluyveri.
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.5
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
fied point CENs were combined with those in the literature,
85 CDEI-II-III sequences from 7 organisms became availa-
ble. These yielded a clear consensus for CDEI and CDEIII and

revealed that, within a single organism, CDEII can vary in
sequence from one chromosome to the next but that length
distributions are very narrow (± 3%; Figure 1c, d). Most fungi
have 84 bp CDEII sequences but E. gossypii and K. lactis
have 164 bp CDEIIs, suggesting the presence of two copies of
an underlying approximately 84 bp CDEII module (Figure
1d). To a first approximation, the extent of conservation
among CDEI and CDEIII sequences on different chromo-
somes within a single organism was not much greater than
the extent of conservation among syntenic CENs in different
organisms (Figure 1c). Together, these data strongly imply
that all organisms with CDEI-II-III point CENs arose from a
relatively recent common ancestor.
Kinetochore proteins specific to organisms with point
centromeres
Does the existence of CENs with similar CDEI-II-III struc-
tures imply the existence of similar DNA-binding kinetochore
proteins? In addressing this question, the CDEI-binding Cbf1
protein is not very useful because it functions not only as a
kinetochore subunit but also as a transcription factor for a set
of highly conserved biosynthetic genes [30], implying conser-
vation of non-kinetochore function. We therefore concen-
trated on components of the CBF3 complex, three of whose
subunits are thought to function only in CDEIII-binding (the
fourth subunit, scSkp1, is also a component of the SCF ubiq-
uitin ligase complex [31] and, like Cbf1, has conserved non-
kinetochore functions). When PSI-BLAST was used to search
predicated open reading frames in 17 fungal genomes for
orthologs of scCtf13, scCep3 and scNdc10, all 3 CBF3 subunits
were found in the organisms with point CENs (7 in total), but

not in organisms with regional CENs (Figure 1e). As a positive
control for the PSI-BLAST search, orthologs of scMis6
Ctf3
and
scSpc105 could be found in all fungi examined (Figure 1e).
Importantly, Mis6
Ctf3
and Spc105 have approximately the
same degree of sequence divergence in point-CEN containing
fungi (51% and 48% similarity, respectively) as Ndc10 (48%
similarity; Table 1). We provisionally conclude that CBF3 pro-
teins are present only in fungi with CDEI-II-III CEN DNA
whereas other kinetochore proteins (such as Spc105 and Ctf3)
are ubiquitous. Moreover, when organisms with point CENs
and CBF3 subunits are mapped on a phylogenetic tree (con-
structed using the highly conserved reference proteins α-
tubulin, the signal recognition particle subunit SRP54 and
PCNA) they were found to cluster closely together (Figure 1f).
While recognizing the possibility for false-negative findings
in cross-species sequence searching, we conclude that CDEI-
II-III CENs and CBF3 CEN-binding proteins are probably
found only in a subset of closely related budding yeasts and,
thus, may have co-evolved. Intriguingly, the apparent com-
mon ancestor of point-CEN and regional-CEN organisms
appears to be a fungus containing regional CENs, implying
that simple point CENs arose from complex regional CENs
and not the other way round.
To delineate further which kinetochore proteins are specific
to point CENs, and which are more widely distributed, we
analyzed all known S. cerevisiae kinetochore proteins for

sequence conservation. As a starting point we examined
scMis12
Mtw1
and scNdc80
Hec1
, kinetochore proteins first iden-
tified in yeast and subsequently shown to have human
orthologs (hsMis12 and hsNdc80
Hec1
) that localize to kineto-
chores and play a role in chromosome segregation [20,25].
Experimental and sequence data establish that yeast and
higher cell Ndc80
Hec1
and Mis12
Mtw1
proteins represent true
orthologs [20,32-34]. Nonetheless, the overall degree of sim-
ilarity among Ndc80
Hec1
and Mis12
Mtw1
proteins across
eukaryotes was found to be relatively modest (approximately
15% to 30%) as compared to proteins involved in DNA repli-
cation (PCNA, approximately 75%) or protein translocation
(SRP54, approximately 60%). Multiple protein sequence
alignments of fungal, plant, and metazoan Ndc80
Hec1
and

Mis12
Mtw1
showed that sequence similarity is confined to 30
to 100 residue blocks interspersed by stretches of non-homol-
ogy, many of which correspond to coiled coils (Figure 2a, b).
This pattern of block-by-block similarity was also observed
with five other kinetochore proteins for which orthology has
been established experimentally, and is consistent with previ-
ous proposals that kinetochore proteins have evolved rapidly
[35] (Figure 2c). Importantly, for our purposes, data obtained
from known kinetochore orthologs suggests that it is neces-
sary to use conserved blocks, rather than complete sequences,
when searching kinetochore proteins for patterns of sequence
conservation.
Sequence similarity between kinetochore proteins is restricted to short stretches between orthologsFigure 2 (see following page)
Sequence similarity between kinetochore proteins is restricted to short stretches between orthologs. Multiple sequence alignments of the (a) Mis12
Mtw1
and (b) Ndc80
Hec1
families. Schematic drawing above the alignment indicate the length of the S. cerevisiae proteins and the percentages denote the degree
of similarity of successive sequence blocks (black boxes) within fungi (red letters) or fungi, metazoa and plantae (green letters). The schematic drawing
above the Ndc80 multiple sequence alignment also indicates the relative position of the globular and coiled-coil domain of Ndc80, as determined by
electron-microscopy [32,33]. White letters on black denote identical residues, white letters on green, identical residues in ≥ 80% of the organisms and
black letters on green, similar residues in ≥ 80% of the organisms. (c) Schematic drawings indicating the percentage similarity of successive sequence
blocks (black boxes) within fungi (red letters) or fungi, metazoa and plantae (green letters) based on multiple sequence alignments of the Nuf2, Spc25,
Spc24. CENP-C
Mif2
and Mis6
Ctf3/CENP-I
, PCNA and SRP54 protein families

R23.6 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Figure 2 (see legend on previous page)
(c)
(b)
Nuf2/Nuf2R family
33%
Ctf3/Mis6/CENP-I family
3% 30% 30% 40%5% 9%
Mif2/CENP-C family
60% 26%2% 1%
Spc24 family
44%24%
50aa
59%50%
20%
59%48%
29%
37%
62%22%
50%60% 8%
6%
3%43%
6% 35% 45% 42%13% 7%
Spc25 family
7%
6%
53%
65%
33%
34%

R
RKNF SAIQEE IY D KKNK DI ETNHP ISIKFLKQ G II IKW LRL GYG TK S IEN IYQ I NLR F L ES N QI S VG-SNHK F MHMVRTNIKLD
R RKNF NL LQQE I F S T DQK DV ETNHP ISLKSLKQ DIYMKW LRL GYV TK S LE H VYSI RTIH Y LA T N QI S VG-SNPK FV MHLVIINKKLD
M KKKY EL IQKE I I R I DYK EI KTNIA LTENILKS NNAIKF NQL NYM IK SS IE Q IVTL LLN Y M HT TR HF S VG-NNPT F IYLV E NL S L S
R VR R H Y QQ ISQQ IYE VTNH EQETRHP LNQRTLSN DKTMEWIFRRI G YP HK S IE N VHAV RAAKWLDS T QI V VG-QSAY FS MHMVENT TIE
K S RRY QE CAT QV V N LE S- GF S QP LGLNNRFM STRE AA IKH NKL NFR GA R Y EE D V TT C ALN FLDS SR RL V ISPHVPA I MVVSIQCTE
K KRSY NR IGQE L L D T QHN EL DMNHN LSQNVIKS DNYIQW NRI SYK MK N ID Q VPPL QLR Y E KG T QI A VG-QNST F MHMMQAQ MIE
R KRQF NR IGQE L L E A KNN EM EMNHK LSDNFTKS DNYLQW HRI SYR QK N ID Q VPPL QLR Y E KS T QI A VG-QNST F LHMMQAQ MLE
R KRSF AR IGQE I M E MVQHN EM E MKHV LSQ NVLKS DNYMQW HRI S HK QK N ID Q VPPL Q MR FERS T QI A VG-QNST F LHMMQAQ MLD
RE T IK K H YK TR MGLT VKEH E RTG TM AG W DAN KGVHE SA VG MKHI ATCIDTNF V MG VDGKK FE D VLTLM E IK AA D ELS TK LT A QS H PY C AM E MV N GN QA E
K RKVF SN CM
RN
VN
E ISVRY P- LP LTAKTLTS A EQSIKFVN DL VD PGAAW GKK -FE D D TLSI D LK GM DS VS TALT P APQ S PN M AM N LV D CK A LD
S
L
G
D P F F YP
YP
A G W L W
D P F F YP
YP
A G W L W
D P F F YP
YP
A G W L W
D P F F YP
YP
A G W L W
D P F F YP

YP
A G W HW
D P F F YP
YP
A G W L W
D P F F YP
YP
A G W L W
D P F F YP
YP
A G W L W
D P F F YP
YP
A W L W
D P F F YP
YP
A G W L W
RP
RP
RP
RP
RP
RP
RP
RP
RP
RP
RP
RP
RP

RP
RP
RP
RP
RP
RP
RP
DP
DP
L Q YL
YL
F TQ
TQ
K LY
LY
DP
DP
F E LK
LK
I KS
KS
LG
LG
DP
DP
L Q YL
YL
F TQ
TQ
K LY

LY
DP
DP
F E L I KS
KS
G
DP
DP
L Q YL
YL
F TQ
TQ
K LY
LY
DP
DP
F E LK
LK
I S LG
LG
L
DP
DP
Q YL
YL
F TQ
TQ
K DP
DP
F E L I KS

KS
G L
DP
DP
L Q YL
YL
LY
LY
DP
DP
F LK
LK
I S LG
LG
L
DP
DP
L Q YL
YL
F TQ
TQ
K LY
LY
DP
DP
F E LK
LK
I KS
KS
LG

LG
L
DP
DP
L Q YL
YL
F TQ
TQ
K LY
LY
DP
DP
F E LK
LK
I KS
KS
LG
LG
L
DP
DP
L Q Y F TQ
TQ
K LY
LY
DP
DP
F E LK
LK
I KS

KS
LG
LG
L
L F TQ
TQ
Y E K K L L
DP
DP
L Q YL
YL
T K L LK
LK
K L L
Saccharomycotina
Schizosaccharomycetes
Pezizomycotina
Basidiomycota
scNdc80
klNdc80
caNdc80
ylNdc80
spNdc80
mgNdc80
ncNdc80
fgNdc80
umNdc80
cnNdc80
K K RSYQNRIGQE LL DY T QH NF ELDMN HNLS QNVIKS TQ D NY Q W NR ID S KF MKN-I DQ VPPLL Q R Y EK G ITK QIAAV G-QN ST FL GM H MM QLA QMIE
R K RQFQNRIGQE LL EY A KN NF EMEMN HKLS DNFTKS TQ D NY L Q W HR ID S RF QKN-I DQ VPPLL Q R YEKSITK QIAAV G-QN ST FL GL H MM Q LA QML E

K S RRYQQECAT QV VNY LE S GFS Q P LGLNNRFM STRE AA KH NK LD N F RF GAR-Y EEDV TT CLAN FLDSISR RLVAI SPHV PA IL GM H VV SLIQCTE
R R KNFQSAIQEE IY DY K KN KF DIETN HPIS IKFLKQ TQ GII KW LR LD G GF TKS-I EN IYQIL N R FLESINK QISAV G-SN HK FL GM H MV R TN IKLD
M
K KKYQELIQKE II RY ID YK FEIKT NIA LT ENILKS TQ N NA K F NQ LD N MF IKSSI EQ IVTLL L N YMHTITR HF SA V G-NN PT FL GI Y LV ELN LSLS
K A H KAFVQQCIKQ LYEF VDR GFP G S IT VKALQS ST E LK YEFI NF LE SF QM PTAKV EE IPRML D G FALSK SMYSI APHT PL A LG A I LM DAV KLFG
K N KAFIQQCIRQ LC EF T EN GYA HNVS MKSLQA SV D LK T F GF LC S EL PDTKF EE VPRIF D G FALSK SMYTV APHT PH IV AA V LI D CI KIH T
K N KAFIQQCIRQ LY EF T EN GYV YSVS MKSLQA ST E LK A F GF LC S EL PGTKC EE VPRIF A G FTLSK SMYTV APHT PH IV AA V LI D CI KID T
K H KAFIQQCIRQ LC EF N EN GYS QALT VKSLQG ST D LK A FI TF IC N EN PESKF EE IPRIF E G FALSK SMYTV APHT PQ IV AA V LI DCV KLCC

GAS DD RSSM IRFINAFST H N FPIS IRGNPV SV DI SE T LKF LS ALD PC DSIKW DE DLVF FLSQKCFKITK SLKAPNT PHN PT VL AVVH LA ELARFHQ
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
L P K P S W W
DP
DP
RPL
PL
D K F IF
IF
LY
LY
P Y E L Y G L
DP

DP
RPL
PL
D K F F LY
LY
P Y E L Y G L
DP
DP
RPL
PL
D F IF
IF
LY
LY
P L Y G L
DP
DP
RPL
PL
D K F IF
IF
LY
LY
P Y E L Y G L
DP
DP
RPL
PL
D K F IF
IF

LY
LY
P Y E L Y G L
DP
DP
R L D K F I Y P E L Y G L
DP
DP
RPL
PL
D K F IF
IF
LY
LY
P Y E L Y G L
DP
DP
RPL
PL
D K F IF
IF
LY
LY
P Y E L Y G L
DP
DP
RPL
PL
D K F IF
IF

Y P Y E L Y G L
K L Y
mgNdc80
ncNdc80
spNdc80
scNdc80
caNdc80
drNdc80
hsNdc80
mmNdc80
xlNdc80
Fungi
Metazoa
atNdc80
Plantae
75%
75%
PCNA family
68%
72%
SRP54 family
8%
13%
50aa
Block 1
114 233 691
Outer head
Coiled-coil core
MT
Spc25

binding
scNdc80
Fungi
Metazoa
Plantae
Saccharomycotina
Schizosaccharomycetes
Pezizomycotina
Basidiomycota
scMtw1
klMis12
caMis12
ylMis12
spMis12
mgMis12
ncMis12
fgMis12
umMis12
cnMis12
mgMis12
ncMis12
spMis12
scMtw1
caMis12
drMis12
hsMis12
mmMis12
xlMis12
atMis12
14 46Block 1 60 98Block 2

scMtw1
(a)
10% 67% 0% 74% 13%
0% 54% 58% 17%
0%
289
50aa
Similarity amongst fungi
Similarity amongst fungi,
metazoa and plantae
EHFGYPPVSLLDDIINSINILAEQALNSVERGL
EHFGYPPVSLLDDIINSINILAERALNSVEQGL
ELLEFTPLSFIDDVINITNQLLYKGVNGVDKAF
EHLGYPPISLVDDIINAVNEIMYKCTAAMEKYL
EHLEFAPLTLIDDVINAVNEIMYKGTTAIETYL
QFFGFTPETCTLRVRDAFRDSLNHILVAVESVF
QFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVI
QFFGFTPQTCLLRIYIAFQDHLFEVMQAVEQVI
QLFEFTPQTCILRIYIAFQDYLFEVMLVVEKVI
DSMNLNPQIFINEAINSVEDYVDQAFDFYARDA
EH
EH
LGYPPISLVDD
DD
IIN
IN
AVNEIMYKCTAAMEKYL
EH
EH
LGYPPISLVDD

DD
IIN
IN
AVNEIMYKCTNAMEKYL
EH
EH
LEFAPLTLIDD
DD
VIN
IN
AVNEIMYKGTTAIETYL
EH
EH
FGYAPLAVVDD
DD
VIN
IN
AVNQVLYTVTDAMEDFL
ELLEFTPLSFIDD
DD
VIN
IN
ITNQLLYKGVNGVDKAF
EH
EH
FGYPPVSVLDD
DD
IIN
IN
SINILAEQALNSVERGL

EH
EH
FGYPPVSLLDD
DD
IIN
IN
SINILAERALNSVEQGL
EH
EH
FGYPPASLLDD
DD
IIN
IN
TVNVLADRALDSVERLL
EH
EH
FGYNPKSFIDALVYLSNEHLYSIATEFENVV
QLVGVNPKNLGADLTETARLEMYNAVTSIDNWT
EIKSGVAKLESLL
LL
ENSVDKNFD
FD
KLELYVLRN
RN
VLRIPEE
EIKSGVAKLESLL
LL
ENSVDKNFD
FD
KLELYVLRN

RN
ILSIPSD
EIEIGMGKLESLL
LL
ESTIDKNFD
FD
KFE
FE
LYVLRN
RN
IFRIPKE
EIEIGTAKMETLL
LL
ETKVDEKFD
FD
LFE
FE
LDALRN
RN
VFNVPSE
EIEEGLHKFEVLFESVVDRYYDGFE
FE
VYTLRN
RN
IFSYPPE
EVENGTHQLETLL
LL
CASIDRNFD
FD
IFE

FE
IWVMRN
RN
ILTVRPD
EIENGTHQLETLL
LL
CASIDRNFD
FD
KFE
FE
IYVMRN
RN
ILTVRPD
EIEHGTHQLETLL
LL
NASIDKNFD
FD
LFE
FE
LYTMRN
RN
ILTVKPD
EAEQGMHAILTLMENSIDHTLDTFE
FE
LYCFRSVFGIRSR
ELIHGLHALETLL
LL
ETHVDKAFD
FD
MFTSWLMRN

RN
PFEFSPD
EVENGTHQLETLLCASIDRNFDIFEIWVMRNILTVRPD
EVENGTHQLETLLCASIDRNFDKFEIYVMRNILCVRPE
EIEEGLHKFEVLFESVVDRYFDGFEVYTMRNIFSYPPE
EIKSGVAKLESLLENSVDKNFDKLELYVLRNIFRIPEE
EIEIGMGKLESLLESTVDKNFDKFELYVLRNIFRIPKD
TARESTQKLRGFLQERFEIMFQRMKGMLIDRMLSIPQN
QIRKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSN
QTRKCTEKFLCFMKGRFDNLFGKMEQLILQSILCIPPN
RVRQSTEKYLHFMRERFDFLFQKMETFLLNLVLSIPSN
ALSNGIARVRGLLLSVIDNRLKLWESYSLRFCFAVPDG
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.7
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
When 55 S. cerevisiae kinetochore proteins (including the
CBF3 subunits discussed above) were used in PSI-BLAST
queries to search 14 fully annotated fungal genomes (Addi-
tional data file 1), 41 were found to have orthologs in organ-
isms with both point and regional CENs (Figure 3). These
proteins included kinetochore regulators such as the Mad1-3,
Bub1, BubR1/Mad3 and Mps1 checkpoint proteins and the
Ipl1-AuroraB kinase, as well as many structural components.
In addition to the 41 proteins mentioned above, conservation
was observed for proteins such as Skp1 [31], Cbf1 [30,36] and
some MAPs [37] that function at kinetochores as well as at
other locations in the cell. As noted above, these proteins are
likely to have been conserved for reasons other than their
presence at kinetochores, and they cannot be used to infer
overall similarity in kinetochore structure. In this respect,

kinesin motor proteins are also difficult to analyze. Eukaryo-
tic cells contain multiple kinesins, which are known to fall
into 14 highly conserved protein families based on sequence,
structure and function [38]. Typically, each kinesin has more
than one cellular function and kinetochores in different
organisms recruit different kinesin family members, making
it difficult to determine (in the absence of experimentation)
which kinesins should be considered kinetochore associated.
Leaving these complications aside, among 55 fungal kineto-
chore components analyzed, 11 were found in the 7 organisms
with point CENs and nowhere else, implying that they are
specific to a CDEI-II-III CEN architecture (Figure 3). These 11
proteins include the CBF3 subunits scCtf13, scCep3 and
scNdc10 described above, the non-essential CNN1 gene prod-
uct, 1 subunit of the SPC105 complex (Ydr532c), two subunits
of the COMA linker complex (scAme1 and scOkp1) and 4 pro-
teins that require COMA for CEN-association (scMcm22,
scMcm16, scNkp1 and scNkp2). Among organisms in which
they are found, the 11 point CEN-specific proteins are as well
or better conserved than ubiquitous kinetochore proteins,
implying that failure to identify orthologs in more distant
fungi is a consequence of their actual absence. We therefore
propose that approximately 20% of the overall kinetochore in
fungi containing CDEI-II-III CENs is specialized to their sim-
ple CENs. As expected, these specialized kinetochore subunits
include proteins in direct contact with CEN DNA (Figure 3).
Identification of novel human kinetochore proteins
Based on success in identifying fungal orthologs of S. cerevi-
siae kinetochore proteins, we expanded our set of target
organisms to higher eukaryotes (see Figure 4 for a schematic

of the approach). Alignments were created for 41 ubiquitous
fungal proteins and conserved blocks determined. The non-
redundant NCBI protein database was then searched for
these conserved blocks using PSI- BLAST or Prosite pattern
searching algorithms (see Materials and methods for details).
Potential orthologs differing greatly in size from the fungal
proteins and candidates with well-established non-kineto-
chore functions were eliminated from further consideration.
The remaining proteins were then aligned to confirm the
presence of conserved blocks. This search led to the identifi-
cation, in a wide variety of organisms, of previously unre-
ported orthologs of many S. cerevisiae kinetochore proteins
(Additional data file 1), among which were four new human
kinetochore proteins (Figure 4). Recent analysis of S. pombe
kinetochore complexes by mass spectrometry revealed the
presence of a set of proteins for which orthologs could not be
found in S. cerevisiae [39,40]. When conserved sequence
blocks from these S. pombe proteins were used to search the
genomes of higher eukaryotes, two additional human pro-
teins were flagged as likely kinetochore subunits (Figure 4).
Regardless of which fungi contributed to the sequence blocks,
the most highly conserved kinetochore subunits were invari-
ably regulatory proteins such as the Mad and Bub checkpoint
proteins and the Aurora B kinase. Structural proteins such as
Ndc80
Hec1
, Nuf2, CENP-C
Mif2
and Mis12
Mtw1

were considera-
bly more diverged.
The four human proteins representing hitherto unrecognized
orthologs of S. cerevisiae kinetochore subunits were provi-
sionally named hsNnf1-Related (hsNnf1R; also known as
PMF1 [41]; Figures 4 and 5), hsNsl1R (also known as DC8 or
DC31), hsMcm21R and hsChl4-R. hsNnf1R shares with its
fungal counterpart 2 conserved blocks of 30 to 35 residues
with 47% and 67% similarity, hsNsl1R shares 1 conserved
block of 35 residues with 43% similarity, hsMcm21R shares 3
conserved blocks of 15 to 30 residues with 46%, 87% and 33%
similarity and hsChl4R shares 2 conserved blocks of 20 and
50 amino acids with 45% and 40% similarity (Figure 5). The
potential human orthologs of S. pombe Fta1 and Sim4 were
provisionally named hsFta1R and hsSim4R (also known as
Solt [42]). hsFta1R shares with its fungal counterpart three
conserved sequence blocks of 40, 25 and 30 residues with
48%, 49% and 58% similarity and hsSim4R one block of 27
residues with 65% similarity (Figure 6). Elsewhere we will
describe experimental data showing that hsChl4R, hsNsl1R,
Fungal kinetochores contain a set of point centromere specific componentsFigure 3 (see following page)
Fungal kinetochores contain a set of point centromere specific components. Schematic model of kinetochore subunitorganization based on the
architecture of the S. cerevisiae kinetochore. Kinetochore proteins can be roughly divided into DNA-binding (pink), linker (blue), MT-binding (green) and
regulatory layers (yellow). Within each layer many proteins are organized into multi-protein complexes, for example, the linker layer is composed of at
least four complexes (gray boxes (a) to (d)): COMA, NDC80, MIND and SPC105. Protein names are given for S. cervisiae first and S. pombe second, while
essential genes (italic letters) and non-essential (normal letters) is indicated. Protein names followed by an asterisk indicate that this specific ortholog is
known not to localize to kinetochores. The kinesins present at kinetochores in S. cerevisiae are Kip3 (Kinesin-8), Cin8 (Kinesin-5), Kip1 (Kinesin-5) and
Kar3 (Kinesin-14), while in S. pombe they are Klp5 (Kinesin-8), Klp6 (Kinesin-8) and Klp2 (Kinesin-14) (for nomenclature see [38].
R23.8 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Figure 3 (see legend on previous page)

Okp1
DASH com.
Dam1/Dam1
Duo1/Duo1
Spc19/Spc19
Spc34/Spc34
Dad1/Dad1
Dad2/Dad2
Dad3/Dad3
Dad4/Dad4
Ask1/Ask1
Hsk3/Hsk3
Ame1
Spc24
Spc25
Ndc80
Nuf2
Dsn1/
Mis13
Nnf1
Nsl1/
Mis14
Mtw1/
Mis12
Cse4/Cnp1
Mif2
Skp1
CBF3 com.
(a) COMA com.
(b) MIND com.

(c) NDC80 com.
Ctf3/
Mis6
Mcm16
Mcm22
Chl4/
Mis15
Iml3/
Mis17
Ctf13
Cep3
Ndc10
Ndc10
Ndc10
Ndc10
Cbf1
Slk19/Alp7
Cnn1
Nkp1
Nkp2
Ydr532
Spc105/Spc7
(d) SPC105 com.
Sgt1/
Git7
SLI15 com.
Point or regional
Bir1/
Cut17
Sli15/

Pic1
Ipl1/
Ark1
Bub1
Mad1
Bim1/
Mal3
Bik1/
Tip1*
Bub3
Mad2
Mps1/
Mph1
Mcm21
/Mal2
Present in point CEN only
Present in point and regional fungal CENs
Multi-protein complexes
DNA-binding components
Linker components
MT-binding components
Regulatory components
Ctf19
CEN
Stu2/
Alp14,Dis1
Proteins that function elsewhere in the cell
Mad3
Kinesin
or

Kinesin
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.9
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
hsMcm21R, hsNnf1R, hsFta1R and hSim4R localize to kineto-
chores in human cells and are required for accurate chromo-
some segregation (AD McAinsh et al., submitted).
Importantly, for the purposes of the current analysis, the
identification of new human kinetochore proteins means that
one or more subunits are present in metazoans for each of the
four multi-protein linker complexes forming the core of the S.
cerevisiae kinetochore. Thus, it appears that simple point
CENs in budding yeast and complex regional CENs in human
cells probably share fundamental architectural similarities.
S. cerevisiae DASH is a 10-protein MT-binding complex that
has attracted considerable recent interest because it forms
rings encircling MTs [43,44]. DASH subunits are conserved
among fungi but we have found few if any potential orthologs
in higher eukaryotes. The closest match to a DASH protein in
humans, NYD-SP28 [45], has an amino-terminal domain of
about 30 amino acids 40% similar to S. cerevisiae Spc34
(Additional data file 2). The Chlamydomonas rheinhardtii
ortholog of NYD-SP28 localizes to the flagellum [46], imply-
ing that NYD-SP28 might be involved in interactions with
MTs. Our preliminary conclusion is that higher eukaryotes do
not contain a protein complex closely related to fungal DASH,
although further investigation of NYD-SP28 is warranted.
Correspondence between human kinetochore proteins
and their yeast counterparts
Several kinetochore proteins first identified in human cells

have previously been shown to have fungal orthologs, includ-
ing CENP-C (orthologous to scMif2p [47]) and CenH3
CENP-A
(orthologous to scCse4 [48]). We therefore wondered
whether additional orthologs might be found in fungi for
kinetochore proteins hitherto characterized only in higher
eukaryotes, such as CENP-E, CENP-H, Rod, Zwint and
Zwilch [49-53]. We found that, among fungal proteins,
hsCENP-H is most similar to S. pombe spFta3 (Figure 7a),
which was shown recently to be a fission yeast kinetochore
protein [39]. It has been suggested previously that S. cerevi-
siae scNnf1 is the budding yeast CENP-H ortholog [54] (Fig-
ure 7b) but we find that scNnf1 is actually much more similar
to hsNnf1R
Pmf1
and spNnf1 than to CENP-H (Figure 7c). We
therefore propose that CENP-H is orthologous to the fungal
Fta3 family of proteins. Searches using PSI-BLAST revealed
that the Fta3 protein, like the Sim4 and Fta1 proteins with
which it interacts in S. pombe [39], has apparent orthologs
only in organisms with regional CENs (Additional data file 1).
The presence of Sim4 and Fta1 in the budding yeast Yarrowia
lipolytica, which has regional CENs, but not in yeasts with
point CENs, is striking, since Y. lipolytica is significantly
closer in overall sequence to S. cerevisiae than to S. pombe.
We therefore conclude that Fta3, Sim4 and Fta1 are members
of a class of kinetochore proteins found specifically in fungi
and metazoa with regional CENs and not in fungi with point
CENs.
In contrast to CenH3

CENP-A
, CENP-C and CENP-H, potential
orthologs of the human CENP-E, Rod, Zwint and Zwilch pro-
teins were not found in any of the fungi examined. The appar-
ent absence of a fungal Rod or Zwilch is particularly
interesting, since their binding partner at human kineto-
chores, Zw10, has a potential ortholog in S. cerevisiae, Dsl1
Schematic describing the sequence-search based approach used to identify fungal, metazoan, and plant orthologs of the kinetochore proteins scNnf1, scNsl1, scChl4, scMcm21, spSim4 and spFta1Figure 4
Schematic describing the sequence-search based approach used to identify
fungal, metazoan, and plant orthologs of the kinetochore proteins scNnf1,
scNsl1, scChl4, scMcm21, spSim4 and spFta1. Since such sequence-based
searches can yield a significant number of false positives, strict exclusion
criteria were applied to ensure the identification of orthologs.
PSI-Blast search
in 14 fungal proteomes
Clustal-W
and T-Coffee
PSI-Blast search in NR database using conserved domain
or Scanprosite search using amino acid motif
Is the protein
already characterized?
no
Is the protein
similar in size?
yes
PSI-Blast search based on potential mammalian
ortholog in plants and metazoan NR and EST database
Clustal-W
and T-Coffee
Are the homology

blocks conserved?
yes
Is the aproximate position of the
homology blocks conserved?
Fungal linker
kinetochore proteins
Fungal linker
kinetochore protein family
Multiple sequence alignment
of fungal proteins
Similar mammalian protein
Potential mammalian ortholog
Metazoan/plant
orthologs
Multiple sequence alignment
of metazoan/plant proteins
Identification of novel orthologs
e.g. New human kinetochore proteins:
Nnf1R (Pmf1), Nsl1R (DC31), Chl4R,
Mcm21R, Fta1R and Sim4R (Solt)
Conserved protein domain or
amino acid motif
Identification of conserved
domain
yes
no
Exclusion
Combined multiple sequence alignment
of fungi/metazoan/plant proteins
Clustal-W

and T-Coffee
no
no
Exclusion
yes
Exclusion
Exclusion
R23.10 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Figure 5 (see legend on next page)
(a)
scNnf1
hsNnf1R
(PMF1)
47%0% 67%18% 17%
ncNnf1
anNnf1
spNnf1
caNnf1
scNnf1
hsPmf1
trNnf1R
xlNnf1R
atNnf1R
Metazoa
mmNnf1R
R
R
R
R
R

R
R
R
QLFSTT
L
RHTLDKIS-R
DN
F
AA
CY
EIYAQA
L
ARTLRANS-YS
NF
AA
CF
AFLSRT
L
SETIAHIP-L
EKF
AQ
CF
LVARKA
L
EQLIKKSLTMEQVKT
CF
QVFNRA
LD
QSISKLQSWDKVSS
CF

TMVDTF
L
QKLVAAGS-Y
QRF
TD
CY
AIVDTF
L
QKLVADRS-Y
ERF
TT
CY
KVMQKS
L
EKFIELAS-F
HRF
SSVF
TLVDKF
L
DGLVQAGS-Y
QRF
AR
CY
KSFKST
L
RHLLTACS-K
QDF
VDIF
AR
LQ

VTR
LQ
KEQ
LD
FER
LR
Y
IR
LK
VKL
LD
VKH
LD
FKL
FE
MLI
FN
R
R
RTH
RTH
LK
NKEF
NSILHTRQVVPK
L
NE
LE
TLVGEANK
R
K

KA
EF
EEILA
E
RNAIAQ
L
NE
LDRLV
GEARA
R
R
KQE
YANLI
KE
RDLNKK
L
DM
LDE
CIHDAEF
R
K
L
DEF
DLIY
K
E
KDIESK
L
DE
LDDII

QNAQRTK
QREF
KEIM
EE
RNVEQK
L
NE
LDELI
LEAKE
R
Y
REEI
SDIK
EE
GNLEAV
L
NA
LDKIV
EEGKV
R
K
REEI
SEIK
EE
GNLEAV
L
NS
LDKII
EEGRE
R

G
QDDI
CKL
V
EE
GLLEAK
L
NE
LDKL
ERAAKD
R
P
RDEI
QEIR
DE
GNLEAL
L
DS
LDKM
EKEAGD
R
P
EEEF
DEQC
HE`
TQVGPI
L
DT
VEELVLL
EEQSL

D
P
(b)
6%43%
E
PF
DGKLA
-ARVA-S YAQLESLTTTV
A
QL
RR
DA
P
EPF
DPRKR-
-
SRLE-T AREEED
LLRSIALLKR
RV
P
E
PFDLALR
-TRVQ-Q FNEVEDAH LV
A
RY
RK
SV
P
EK
IDSEIT

-IQLR-K FQEFEQET DVTKL
RR
DL
P
EPF
DLDLN
-EQVR-
KY
QEWEDET KV
A
QL
RQ
TG
P
E
ASDNCFMDSDIK
-V EDQFDE
A
TK
RK
QY
P
AK
EHGLMDSDIK
-V EDEFDELI DV
A
TK
RR
QY
P

H
AP
ETQNE
-P
-L EDKLDDAI DT
A
LQ
RN
RY
P
E
TSE
EYCE-
-
DYEST NNILDEKI ET
A
SK
RSSY
P
10%
hsNsl1R
(DC31)
anNsl1
ncNsl1
spMis14
caNsl1
scNsl1
mmNsl1R
xtNsl1R
drNsl1R

Metazoa
hsDC31
E
E
V
V
V
V
V
V
P
ggNsl1R
P
V
I
L
V
L
L
L
L
V
V
Fungi
Metazoa
87%46%4%
G
LRIE
V
FAR GRFMRPYYVLM

G
VRIDICVRN-GRFTKPYYILL
G
LRFDLYSNFTKCFQQPHYCIL
G
IRLEVFSERTSQFEKPHYVLL
G
MDEELEGG
-
NRFDVPYYIIF
G
VCVCISTAFEGNLL
DSYFV
DL
G
VCMCISTAFEGNLLDSYFVDL
G
VCVCISSAFEGAYLDSFHLDI
G
VCISLATAYNDVFMETYNLEL
G
I
QFETSTA GETYEVYHCVL
V
H
RHTV
P
PCIPI
SGL
VH

RHTI
P
AFIPVERL
VY
KHTL
P
S
YIPVDE
Y
LF
KHTI
P
SFI
D
VQGI
LY
KYTI
P
S
FLN
IQEW
IH
HHSV
P
VFIPLEEI
IH
HHSV
P
V
FIPLEKI

IS
RHSV
P
P
FIPLEQI
IG
RHD
I
P
P
FIPLKRL
VL
E
H
TI
F
FLPL
SDL
ncMcm21
nMcm21
caMcm21
scMcm21
spMal2
mmMcm21R
xtMcm21R
tnMcm21R
atMcm21R
3%33%
F
ARS RRE

L
VRYHH
R
F
VRE RRQ
L
VAWHM
R
F
AES QLT
L
TKTQYK
F
AKR FLQ
L
VEVQKR
F
LWK DKL
L
TAY
ICR
F
LFS CEY
L
NAYSG
R
F
LFS WAY
L
NAY

AGR
F
LSV FEH
L
NAYAG
R
F
LDA SQH
L
NA
YVG
R
F
IDN GDL
L
QAYVD
R
hsMcm21R
scMcm21
hsMcm21R
(c)
(d)
40%
45%4%
4% 10%
LVKQLGKLPRQSLLDLVFQ
W
VFKILNRLSRASLLTLALD
W
IQKLLNRFPRDFLVKLCVE

W
LYNILDRLSKNSILQFIIL
W
VFKQLMKLPVTVLYDLTLS
W
LNRVIRRIPNKNIKNLLSK
W
IKRTILKIPMNELTTILKA
W
LRRTILKIPLSEMKSILEA
W
I
K
RTILKLPFSETATILKT
W
anChl4
ncChl4
spMis15
caChl4
scChl4
hsChl4R
mmChl4R
xlChl4R
Fungi
Metazoa
L
Q FR
K
GGKRE-VI-DRILDGDWRHGIT RQIAMI LRYLDDHPASLR-
W

TALELTR
L
Q SRKGSKRE-VI-DRIMEGDWRHGLT YQLAMA IQYLYDHPTSQK-
W
AAYRIMP
FYKNVPKSMLKRSII-HRMLVYDWPNGFY GQIAQLEILALAHGFVSMR-
W
TASKVHH
FRKLINRTPKRK-LI-DKII
FEYWT
QGLN LQISQI CQLIVDKSNSAQS
W
IYSTVKD
DLLIEKGVRRNVIV-NRILYVYWPDGLNVFQLAEI CHLMISKPEKFK-
W
LPSKALR
QALDYTKPKRM-IV-EHIID
C
CESSSLN KHITNLEMIYHLDNPDQGT-
W
YACQLTD
QTVNFRQR-KESVV-QHLIHLCEEKRASISDAALL IIYMQFHQHQ-KV
W
DVFQMSK
QTINLKQR-KD-YLAQEVILLCEDKRAS DDVVLL IVYTQFHRHQ-KL
W
NVFQMSK
QTFTLRYP-KE-VTATEVV
RFCEA
RNAT DHAAAL LVFNHAYSNK-KT

W
TVYQMSK
drChl4R
scChl4
hsChl4R
(BM039)
I
R
RT LK P
W
ggChl4R
L
L
L
L
L
L
L
L
-
L
L
L
L
L
L
D
D
D
D

D
D
D
D
scNsl1
Fungi
Fungi
Plantae
Plantae
Block 1
Block 2
Block 1
Block 1
Block 2 Block 3
50aa
50aa
50aa
50aa
L
L
M
I
L
L
TSADSEPNSANIKILEDQLDELIVETATKRKQW
V
DIK
IIVDI
T
V

I
R
DEIMAVLQT
QTINFRQT
KEG
ISHSVAQL KQAALL
CEESSAD
IIYNHIYPNK RTWSVYHMNK
Block 1 Block 2
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.11
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
[55]. Both hsZw10 and scDsl1 play a role in membrane traf-
ficking during interphase [56], but scDsl1 is not known to
localize to kinetochores. Thus, whereas human hsZw10 func-
tions in vesicle-MT and chromosome-MT interaction, scDsl1
appears to have only the former function, presumably
because Rod and Zwilch are not present. The absence of Zw10
from fungal kinetochores is also sufficient to explain the
absence of Dynein: the Rod/Zw10/Zwilch (RZZ) complex is
needed for the association of Dynein with human and Dro-
sophila kinetochores [57]. Considering these data together,
we conclude that animal cell kinetochores contain proteins,
currently comprising perhaps 25% of the total (and likely to
increase), that are absent in fungi with either regional or
point CENs.
Evolutionary relationships among kinetochores
Thus far, we have distinguished only between point and
regional CENs but a more nuanced view can be obtained from
phylogenetic analysis of kinetochore structural proteins. As a

reference for these comparisons, a tree was constructed by
combining data on three well-conserved eukaryotic proteins:
α-tubulin, PCNA and SRP54 (Figure 8a; this reference tree
closely matches reference trees constructed by others
[58,59]). The reference tree exhibited prototypical clustering
of fungi in one branch and metazoa in another so that Dro-
sophila and C. elegans were much closer to humans than S.
pombe or S. cerevisiae. However, the phylogenetic trees for
Ndc80
Hec1
and Nuf2 were remarkably different: overall
sequence divergence was much greater and Drosophila and
C. elegans Ndc80
Hec1
(or Nuf2) proteins were not signifi-
cantly more similar to their human than their fungal counter-
parts (Figure 8b, c). Drosophila Ndc80 and Nuf2 were
particularly striking in occupying a branch of the phyloge-
netic tree distant from all other animals. This great diver-
gence in Drosophila kinetochore protein sequence is also
illustrated by the fact that, apart from regulatory compo-
nents, such as the Mad-Bub proteins and a few MAPs, only a
limited number of structural kinetochore proteins have been
identified in flies (for example, CENP-C [60], CenH3
CID
[61],
the RZZ complex [62], Ndc80
Hec1
, Nuf2, and Mis12
Mtw1

;Fig-
ure 9).
Organization of the simplest kinetochore
Encephalitozoon cuniculi is a microsporidium and intracellu-
lar parasite that has been subjected to considerable evolu-
tionary pressure to reduce its genome to the smallest possible
size. As a consequence, E. cuniculi and related microsporidia
have the smallest known eukaryotic proteome (1,997
potential open reading frames) and many cellular structures
in E. cuniculi lack redundant and non-essential genes [63].
Using our HMM for CDEI-II-III, no sequences similar to
point CENs were found on any of the 11 E. cuniculi chromo-
somes, nor were CBF3 proteins found by PSI-BLAST (Figure
10a). We therefore speculate that E. cuniculi contains a
regional CEN of some sort. Orthologs of CenH3 and CENP-
C
Mif2
are present in E. cuniculi, as are all four components of
the NDC80 linker complex, three components of MIND and
SPC105 (Figure 10b, Additional data file 3). No subunits of
COMA, the fourth S. cerevisiae linker, were found. Among
regulatory proteins, E. cuniculi Ipl1/Aurora B and Survivin
Bir1
orthologs were present as were Mps1 and Bub3, but not other
proteins required for the spindle assembly checkpoint in
yeast or human cells (Figure 10b). When Cdc20, an essential
activator of the anaphase promoting complex (APC/C) was
examined for sequence motifs, further evidence was obtained
that E. cuniculi lacks a spindle checkpoint. APC/C is an E3
ligase required for the ubiquitination of proteins whose

destruction is necessary at the metaphase-anaphase transi-
tion [64]. In all eukaryotes examined to date, an activated
form of the Mad2 checkpoint protein binds to Cdc20 via a
short conserved peptide so as to block Cdc20 from activating
APC/C, thereby arresting cells at the metaphase-to-anaphase
transition [65,66] (Figure 10c). E. cuniculi Cdc20 contains
the WD-domain implicated in APC/C interaction but lacks
any sequence similar to a Mad2 binding domain (Figure 10c),
implying that it is not subject to checkpoint control. From
these data we conclude that E. cuniculi probably contains a
very simple kinetochore, based on a regional CEN that con-
tains about one-half the proteins found in S. cerevisiae. In
contrast, other large multi-protein structures in E. cuniculi
are only slightly less complex than their higher eukaryotic
counterparts. For example, E. cuniculi ribosomes are com-
posed of 77 subunits as compared to 84 subunits in S. cerevi-
siae. Symptomatic of the simplicity of the E. cuniculi
kinetochore is the absence of the vast majority of potential
MAPs. Nonetheless, it is significant that the E. cuniculi kine-
tochore contains three of the four linker complexes that
appear to form the core of budding yeast and human
kinetochores.
Discussion
Extensive genetic and biochemical experimentation has made
S. cerevisiae kinetochores the best characterized structures
involved in chromosome-MT attachment [5]. S. cerevisiae
kinetochores contain upwards of 70 protein subunits assem-
bled into 14 or more multi-protein complexes. In this study
we used similarity-based sequence searching to ascertain
Identification of potential orthologs of scNnf1, scNsl1, scMcm21 and scChl4 in humansFigure 5 (see previous page)

Identification of potential orthologs of scNnf1, scNsl1, scMcm21 and scChl4 in humans. S. cerevisiae (a) Nnf1, (b) Nsl1, (c) Mcm21 and (d) Chl4 were
aligned with five fungal, four metazoan and one plant sequence. White letters on black denote identical residues, white letters on green, identical residues
in ≤ 80% of the organisms and black letters on green, similar residues in ≤ 80% of the organisms. Schematic drawings above the alignments indicate the
length of the S. cerevisiae proteins and the percentages denote the degree of similarity of successive sequence blocks (black boxes).
R23.12 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
which S. cerevisiae kinetochore proteins have orthologs in 15
fungi, 11 metazoa and 2 plants (Additional data file 1) with the
overall aim of determining which structural features of S. cer-
evisiae kinetochores have been conserved throughout evolu-
tion. The analysis is not as straightforward as might be
assumed, because kinetochore proteins are among the most
rapidly evolving proteins in the genome [67]. In addition, the
structure and sequence of CEN DNA has diverged widely
from organism to organism. Whereas fungi closely related to
S. cerevisiae contain 125 to 225 bp CENs with a CDEI-CDEII-
CDEIII structure, most other organisms contain much longer
regional CENs with few if any conserved sequence elements.
Guided by experimental data on established orthologies in
yeast, humans and other organisms, we base most of the con-
clusions in this paper on the characterization of proteins that
share blocks of homologous sequence. In several cases, we
also draw inferences from a failure to identify homologous
proteins. We recognize that this failure represents a negative
result with many potential causes. However, in cases in which
a kinetochore protein is conserved among organisms A, B and
C whereas a second kinetochore protein is well-conserved
only in species A and B and undetectable in C (and multiple
related species), a tentative conclusion can be drawn that the
second protein is actually absent from C. For example, we find
that CBF3, an essential CEN-binding protein in S. cerevisiae,

has orthologs in seven budding yeasts containing CEN DNA
conforming to a CDEI-CDEII-CDEIII organization but not in
organisms with regional CENs. In contrast, other kinetochore
proteins similar in their degree of sequence conservation to
CBF3 subunits among point CEN-containing yeast (approxi-
mately 45% to 50% similarity) are found throughout fungi.
Thus, we provisionally conclude that CBF3 is present in only
fungi with CDEI-CDEII-CDEIII centromeres. Despite the
potential for occasional error, our use of both positive and
negative findings makes it possible to draw broad conclusions
about the organization and possible origins of simple and
Identification of potential orthologs of spFta1 and spSim4 in humansFigure 6
Identification of potential orthologs of spFta1 and spSim4 in humans. S. pombe (a) Fta1 and (b) Sim4 were aligned with five fungal, and three to five
metazoan sequences. White letters on black denote identical residues, white letters on green, identical residues in ≥ 80% of the organisms and black
letters on green, similar residues in ≥ 80% of the organisms. Schematic drawings above the alignments indicate the length of the S. cerevisiae proteins and
the percentages denote the degree of similarity of successive sequence blocks (black boxes).
(a)
spFta1
hsFta1R
49% 11%48%6% 58%
anFta1
ncFta1
mgFta1
fgFta1
SpFta1
hsFta1R
xlFta1R
ggFta1R
drFta1R
Metazoa

mmFta1R
(b)
65%
22%
hsSim4
(Solt)
anSim4
fgSimn4
mgSim4
ncSim4
spSim4
mmSim4R
xlSim4R
Metazoa
hsSim4R
spSim4
Fungi
Fungi
Block 1
Block 2
Block 1
50aa
50aa
LLNTSWT S HRLS HYY SLLTN RTA DTYAAR RD YLTNS LA V AGAPTL
FYNTTFS T HRVS YVG KQPLD RVR QTLSQ R RE ILVGD VV R VEV G L
LYNTTFHTYRVS HIG NEPLT TAR SRL SH G RD ALVGD VV R VQVR L
FFNTTFS T HRVS HVG EKRLT GQR EVIA SR RD T LVGD VV R IQLR L
FYNVTYT A YRLS FGF EYSNL TEIG KKLT RF RY G TDRT GYFTN STRF
LLHKQWT L YSLT YKF SYSN- KEYSR L NA FIVAE KQ K LAVE V
LLHKQWT I YSLT YKF SYSN- KDYSR L SA FIVAE KQ K VAVE V

LLSKQWT L YSVT MH KF SYTN- KEYAR L SA HICAE KQ K LAVE V
LLRKQWT L YSVS YKF SSAD- KDYAR M GV FIAAE KQ K LAVD V
LVKNEWKLSYVT HHF R HTQ- KSYAK H SA FIVAE KQ Q VAVE V
PL
PL
PL
PL
PL
PL
PL
PL
PL
PL
LL
LL G
LL G
LL G
L
LL G
LL G
LG
LL G
LL G
L L RLPKP LRESLFSFLSSN TYVSA R I
L L RMPTP LKSVISDFLSTT RVSS R L
L I RMPPA LRLAVIKFLESE RISP A L
L M RMPQQ LKSVVADWLAVA RVSRVAL
FS A V RMDGALWMAVEQFLQQE TQILPCL I
F A NGAES NTAIIGTWFQKT YFSP A I
F A NGAES NTSLIRDWFQKT CFSP A I

F V NGAET LTAMVGTWFQKA SFSS P I
F V NGAES YTAIVGSWFQKT CFRR A I
FCTSGPESLTNLVKSWFER A NFGS AL
LFD
LFD
LFD
LFD
LFD
LFD
LFD
LFD
LFD
LFD
LP L L
LP L C L
LP L C L
LP L C
L
LP L C L
LP L C L
LP L C L
LP L C L
LP C L
VLSG SAYLSK LALD RGGYVR TRIACAGFVVTSE RL IV
TEA GRYVDK LGLN FHPGVRVAKIACGGFVMS- E RL VF
TAA KSYLQA IALDMDHPAVL T KV ACGGFVIS-E RL LF
TDA ANYLHR LALK FHPSVRVTQISCSGFVLS- QSRV IL
YPI MEHIKRCTSLD TN SVFS SKVNTDCAILTS S KL IF
MDC YSHFHR FKIH S ATR VRVSTSVASAHTD KI IL
MNC YSHFHR FKIH A ATR VRVSTSVASAHTD KI IL

FQC YTHFFR FKIH S ATH VKVSTSVASAHCD KV IL
MEC YSHFHR FKIH S ATK VKVSTGIASAHCD I IFL
MSGMETHFFR FKIH S AGT LKVSTALGSAHHD KI IG
K
K
K
K
K
K
K
K
K
K
LHLL G
FL H L G
FL H L G
FL H L
FL L L G
FL H L L G
FL H L L G
FL H L L G
FL H L L G
FHLL G
Block 3
RF V AKV QF RARKL ID GR
RF V SKI QF RAMRL VD GR
EL L NGI LR EPTRI EA HQ
RF I SRV QF KATRL ID GK
EL L YGI LR EPSQI EA HQ
RF V SKV VF RATRL VD GR

SF VLASLCTVDQPSKVK IP S D
EM L YGI LR EPKRI EA HQ
LPDLF
LPDLF
LPDLF
LPDLF
LPDLF
LPDLF
LPDLF
LPDLF
RAH R
RAH R
RAH R
RAH R
RAH R
RAH R
RAH R
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.13
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
The human kinetochore protein CENP-H is more closely related to a novel family of fungal proteins than the Nnf1 familyFigure 7
The human kinetochore protein CENP-H is more closely related to a novel family of fungal proteins than the Nnf1 family. Multiple sequence alignments of
metazoan CENP-H proteins and either (a) fungal Fta3 family proteins or (b) fungal Nnf1 family of proteins. Sequences were annotated as in Figure 5. (c)
Comparison of sequence similarity between human conserved domains of CENP-H and C. albicans Nnf1 and C. albicans Fta3.
caFta3
hsCENP-H
Block 1
73% 50%10% 22%
Block 2
ncFta3

anFta3
ylFta3
spFta3
caFta3
hsCENP-H
drCENP-H
xlCENP-H
ggCENP-H
Fungi
Metazoa
mmCENP-H
(b)
VR
DHILDLELERLRV
TL
KKL
S
SLEV
EN
LQL
L
R
KQLDVLIC
EN
RAS
IT
FTSEEIEKEKENI
NS
KNQNDLQSGNYNL
LE

EKLLDIRKKRLQL
LE
KKLLDVRKKRLQL
LQ
DQITDVQKQRLDL
LE
DRLYDIRKKRLVL
IR
ENLNDIRRKRYFL
RKLGKTKGE RVSRQKWKTIKGATSALVAGTGVDWARDERL
R
DLVLDPPG
QQLEQLEAEHKQSKAKWETMKSVASAI VGSGVDWAEDEDLTALVLDDLN
SSIEGVLAD KLARARWEIVRNVTQIL LESGISFLENKKLAYIMDLCGD
QAISELEAK HNTKNKRRVLAEIFTAL TASGIDWCEDEDLVQLIVGAGE
GAKRKQYYE VRKWRELENMVTEIPVL SDLPINWYDDPTLLQIMQEVDE
ERIKIIRQN QMEIKITTVIQHVFQNL LGSKVNWAEDPALKEIVLQLEK
EMIKTMKKK QTEIKITTVVQHTFQGL LASKTNWAEDPALRETVLQLEK
KTKERAEAV QKYQRITTILQNVLRGI LASKVNWREDPKLSDIVMKLEH
EKIKRLNKI KDEIDSTTVLQNVFQNI LASHVDWAKDPQLTDIGLKLET
EK
L
KFIHRNVQYERKVTTLVQNILQNI
L
I
L L
L I
L I
L I
L I

L I
L I
IVGCQINWAKDPSLRAIILQLEK
caNnf1
hsCENP-H
Block 1
23%0% 10% 20%
Block 2
ncNnf1
anNnf1
ylNnf1
dhNnf1
caNnf1
hsCENP-H
drCENP-H
xlCENP-H
ggCENP-H
Fungi
Metazoa
mmCENP-H
RATRLQQLFSTTLRHTLDKI-SRDNFAACY
RVTRLQEIYAQALARTLRAN-SYSNFAACF
RYRRLVTSSSTALAHAVKSL-TYKRVAACY
RYERLKLVCKKALEQSIKKSLSIDQIKSCY
RFERLRLVARKALEQLIKKSLTMEQVKTCF
QLLEYKSMVDASEEKTPEQIMQEKQIEAKI
KQLVMEFDTACPPDEGCNSGAEVNFIESAK
QLLEYKSMIDTNEEKTPEQIMQEKQIEVKI
QSLEVQTVAEASEEVSSEALSSEKLAEIAK
KEIMENQCFEMNVXVSMGKHRESCEAEADL

KEFNSILHTRQVVPKLNELETLVGEANKR
AEFEEILAERNAIAQLNELDRLVGEARAR
TEFDAIFDERHLPEKLAQLDQLVKDARLR
KEFDLIFKERNIETKLNELDEIIQKAQER
DEFDLIYKEKDIESKLDELDDIIQNAQRT
RQSSVLMDNMKHLLELNKLIMKSQQESWD
GCARLIVETMRDIIKLNWEIIQAHQQARV
PENCVLTDDMKHILKLQKLIMKSQEESSE
NESRLIQEKFTHILT
L
SSAVINSQQETRE
ADAEELKELTNQNSEICEKTIQILKETRE
34%
(a)
caFta3
73% 50%10% 22%
caNnf1
hsCENP-H
23%0% 10% 20%34%
50aa
50aa
50aa
(c)
Similarity to hsCENP-H:
Similarity to hsCENP-H:
R23.14 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Phylogenetic analysis of kinetochore protein conserved domainsFigure 8
Phylogenetic analysis of kinetochore protein conserved domains. Radial phylogenetic trees were assembled for (a) reference proteins (α-tubulin, the signal
recognition protein SRP54 and the DNA replication factor PCNA), (b) the Ndc80 family and (c) the Nuf2 family. For bootstrap analysis, sample size
equals 100. Nodes with support less than 50% were collapsed. The accession number for each protein is described in Additional data file 1.

(b)
Nematoda
Chordata
Arthropoda
Embryophyta
Apicomplexa
Ascomycota
Microspora
xxl
0.1
0.1
66
(a)
(c)
At
Os
Dm
Cb
Ce
Cp
Um
Ec
Sp
Yl
An
Fg
Nc
Mg
Cg
Kl

Eg
Dh
Ca
Cn
Dr
Xl
Gg
Hs
Rn
Mm
At
Os
Cp
Cb
Ce
Um
Cn
Sp
Yl
Dh
Ca
Kl
Eg
Cg
Sc
An
Fg
Mg
Nc
Dm

Dr
Gg
Hs
Mm
Rn
Ec
Reference proteins Ndc80 family
Dm
Cb
Ce
At
Os
Cp
Dr
Xl
Gg
Hs
Mm
Rn
Ca
Dh
Eg
Kl
Sc
Cg
Sp
An
Mg
Fg
Nc

Um
Cn
Yl
Ec
Nuf2 family
100
100
100
100
100
57
100
100
66
95
100
100
100
100
100
79
100
100
100
100
100
99
96
62
74

85
100
100
51
67
57
100
76
0.1
100
68
100
94
81
70
96
99
92
57
97
94
94
100
73
84
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.15
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
Identification and annotation of (a) Nuf2, (b) Ndc80 and (c) Mis12 orthologs in D. melanogasterFigure 9
Identification and annotation of (a) Nuf2, (b) Ndc80 and (c) Mis12 orthologs in D. melanogaster. Schematic drawing above the alignment indicate the

length of the S. cerevisiae proteins and the percentages denote the degree of similarity of successive sequence blocks (black boxes). White letters on black
denote identical residues, white letters on green, identical residues in ≥ 80% of the organisms and black letters on green, similar residues in ≥ 80% of the
organisms. Accession numbers are described in Additional data file 1.
FTFTDLTKPT HDRLVKIFSYLINFVRFRE
FSFNDLYKPT HDRLVRMLSYVINFVRFRE
FTIQDLLKPDR NRLQLILSAVINFAKLRE
FNMTDLYKPEA QRTQRLLSAVVNYARFRE
FVLTDIARPD GYRIRRILSAVINFIRFRE
FETADILCPK AKRTSRFLSGIINFIHFRE
FEIVDILNPR TNRTSRFLSGIINFIHFRE
FHPSDVLNPK GKRTLHSLSGIVNFLHFSA
FSLSDLLNPK TKRTITILSAIQNFLHFRK
LTMCDLVTPAKHEHRFRKLTSFLVDFLKLHE
ISFKDLLRPE SSRTEFFISALLNYGLYKD
FTLRDLLRPDP RRLVQVLSALINFLYYRD
YTYMDIIKPA VKKTLATLSYLFNHLAYYK
anNuf2
ncNuf2
spNuf2
scNuf2
caNuf2
hsNuf2
mmNuf2
xlNuf2
drNuf2
ceNuf2
atNuf2
osNuf2
dmNuf2
Fungi

Metazoa
Plantae
Newly
annotated
protein
Previously
annotated
proteins
(a)
scNuf2
54%14% 25%
(c)
QELLEYLTHNNFELEMKHSLGQNTLRSP TQKDFNYIFQWLYHRIDPGYRFQKA MDAEVPPILKQLRYPYEKGITK-SQIAAVGGQNWPTFLGMLHWLME
QELLEYLAKNNFEMEMNHKLSDNFTKSP TQKDFNYLFQWLYHRIDPSYRFQKN IDQEVPPLLKQLRYPYEKSITK-SQIAAVGGQNWSTFLGLLHWMMQ
TQVVNYLLES GFSQPLGLNNRFMP STREFAAIFKHLYNKLDPNFRFGAR YEEDVTTCLKALNYPFLDSISRSRLVAIGSPHVWPAILGMLHWVVS
KEIIRYLIDYKFEIKTNIALTENILKSP TQKNFNAIFKFLYNQLDPNYMFIKSS-IEQEIVTLLKLLNYPYMHTITR-SHFSAVGGNNWPTFLGILYWLVE
EEIYDYLKKNKFDIETNHPISIKFLKQP TQKGFIIIFKWLYLRLDPGYGFTKS IENEIYQILKNLRYPFLESINK-SQISAVGGSNWHKFLGMLHWMVR
KQLYEFLVDR GFPGSITVKALQSP STKEFLKIYEFIYNFLEPSFQMPTAK-VEEEIPRMLKDLGYPFALSKS SMYSIGAPHTWPLALGALIWLMD
RQLCEFLTEN GYAHNVSMKSLQAP SVKDFLKIFTFLYGFLCPSYELPDTK-FEEEVPRIFKDLGYPFALSKS SMYTVGAPHTWPHIVAALVWLID
RQLYEFLTEN GYVYSVSMKSLQAP STKEFLKIFAFLYGFLCPSYELPGTK-CEEEVPRIFKALGYPFTLSKS SMYTVGAPHTWPHIVAALVWLID
RQLCEFLNEN GYSQALTVKSLQGP STKDFLKIFAFIYTFICPNYENPESK-FEEEIPRIFKELGYPFALSKS SMYTVGAPHTWPQIVAALVWLID
SKIYNFLVEY ESSDAPSEQLIMKPR GKNDFIACFELIYQHLSKDYEFPRHERIEEEVSQIFKGLGYPYPLKNS YYQPMGSSHGYPHLLDALSWLID
RFINAFLSTHN FPISIRG NPVP SVKDISETLKFLLSALDYP CDSIKWDEDLVFFLKSQKCPFKITKS SLKAPNTPHNWPTVLAVVHWLAE
RVVNAYLAPA V-SLRPPLP SAKDIVAAFRHLFECLDFPL EGAFEDDLLFVLRVLRCPFKLTRS ALKAPGTPHSWPPLLSVLYWLTL
QQILEYLHGIQ-NSEAPTGLIADLFSRPGGLRHMTIKQFVSILNFMFHHIWRN-RVTVGQNHVEDITSAMQKLQYPYQVNKS WLVSPTTQHSFGHVIVLLDFLMD
anNdc80
ncNdc80
spNdc80
caNdc80
scNdc80

hsNdc80
mmNdc80
xlNdc80
drNdc80
ceNdc80
atNdc80
osNdc80
dmNdc80
Fungi
Metazoa
Plantae
Newly
annotated
protein
Previously
annotated
proteins
(b)
scNdc80
3%
40% 10% 26%
EHFGYPPVSLLDDIINSINILAEQALNSVERGL
EHFGYPPVSLLDDIINSINILAERALNSVEQGL
EHLEFAPLTLIDDVINAVNEIMYKGTTAIETYL
EHLGYPPISLVDDIINAVNEIMYKCTAAMEKYL
ELLEFTPLSFIDDVINITNQLLYKGVNGVDKAF
QFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVI
QFFGFTPQTCLLRIYVAFQDHLFEVMQAVEQVI
QLFEFTPQTCILRVYIAFQDYLFEMVLVVEKVI
QFFGFTPETCTLRVRDAFRDSLNHILVAVESVF

SFFGFSAGSFSDSIFNIYVDAWEEVCSNEFKNS
QLFNFHSRSVYATLKYIVNERIHCTIKKMCETI
KFFNFTAAQLSAEREHIVQDIIKKGIGQIIDKI
mgMis12
ncMis12
spMis12
scMtw1
caMis12
hsMis12
mmMis12
xlMis12
drMis12
ceMis12
atMis12
dmMis12
Fungi
Metazoa
Plantae
Newly
annotated
protein
Previously
annotated
proteins
ENGTHQLETLLCASIDRNFDIFEIWVMRNILTVRPD
ENGTHQLETLLCASIDRNFDKFEIYVMRNILCVRPE
EEGLHKFEVLFESVVDRYYDGFEVYTLRNIFSY
P
PE
KSGVAKLESLLENSVDKNFDKLELYVLRNVLRI

P
EE
EIGMGKLESLLESTIDKNFDKFELYCLRNIFNI
P
KD
RKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRI
P
SN
RKCTEKFLCFMKGRFDNLFGKMEQLILQSILCI
P
PN
RQSTEKYLHFMRERFNFLFQKMETFLLNLVLSI
P
SN
RESTQKLRQFLQERFEIMFQRMKGMLIDRMLSI
P
QN
DRLCLLSFPFSDKTNEKAFKMMKRFCVTNIFRI
P
AS
KTNQKHLEKAYCKGAIPHLTNIKTI-VKKCIAV
P
SN
EAEKETVERRFQASASKGLKALREL-DSKVFHV
P
PH
scMtw1
39% 0% 4%44%
R23.16 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
complex kinetochores that would not be possible based on a

more conservative approach.
Origins of point centromeres
Based on the simple structure of their CENs, it is widely
assumed that S. cerevisiae kinetochores represent an ances-
tral structure from which complex regional kinetochores
evolved. Several findings in the current work suggest, how-
ever, that CDEI-II-III CENs arose in combination with a set
of 11 proteins as a specialization of a regional CEN. First, all
annotated organisms containing point CENs (S. cerevisiae, C.
glabrata, K. lactis, and E. gossypii) have a common origin in
one relatively shallow branch of the fungal phylogenetic tree.
Were CDEI-II-III sequences an ancestral CEN, the current
distribution of regional CENs would require loss of point
CENs from multiple independent evolutionary branches. Sec-
ond, we could obtain no evidence for CDEI-II-III CEN DNA
or CBF3 proteins in the microsporidium E. cuniculi, which is
thought to have arisen through an ancient divergence in the
fungal kingdom [68].
If the speculation that CDEI-II-III point-CENs evolved from
regional CENs is correct, we must consider the possible exist-
ence of other short CENs that are also based on sequence-spe-
cific DNA binding interactions just not CBF3. By way of
precedent, the emergence of CDEI-II-III CENs is coincident
with large-scale chromosomal changes that gave rise to the
HMR, HML and MAT loci, thereby changing the sexual
potential of S. cerevisiae and related yeasts [29]. S. pombe
and its close relatives undergo mating type switching analo-
gous to that in S. cerevisiae, but the molecular mechanisms of
switching are completely different [69]. Functional analysis
of fungi with short uncharacterized CENs will be needed to

test the speculation that just as different forms of mating-type
switching have developed based on distinct biochemistry,
point CENs with structures other than CDEI-II-III might
exist.
Evolution of kinetochore proteins
Sequence comparison reveals that conservation among
orthologous kinetochore proteins is invariably restricted to
relatively short sequence blocks embedded in longer regions
of low sequence similarity. The restriction of sequence simi-
larity to small blocks explains the relative difficulty in finding
orthologs and the widespread assumption that yeast and
human kinetochores are very different. Henikoff and
colleagues [67] have studied the evolutionary divergence of
CenH3 and CENP-C
Mif2
in some detail and propose that kine-
tochore proteins are under positive selection in plants and
animals as a consequence of meiotic drive by CEN DNA
during female meiosis. Rapid evolution in protein sequence is
most apparent in worms and flies, and in this study we have
added only dmNdc80, dmNuf2 and dmMis12 to the list of
likely structural Drosophila kinetochore proteins. Why the
rate of kinetochore protein evolution is so much greater in
flies and worms as compared to mammals, plants and fungi
remains a mystery but it is reminiscent of data on other key
regulators of chromosome segregation. Securin and its pro-
tease separase are also highly diverged in D. melanogaster:
Drosophila securin, unlike the human and yeast proteins,
consists of two separate gene products, called three rows and
pimples, that interact with an unusually short separase [70].

Moreover, unlike the majority of eukaryotes that utilize an
Identification of a minimal kinetochore in E. cuniculiFigure 10
Identification of a minimal kinetochore in E. cuniculi. (a) HMM described in
Figure 1a, b failed to find a CDEI-II-III structure in the genome of E. cuniculi.
The Green bar indicates point CENs identified and black bars the number
of chromosomes. (b) Speculative model of E. cuniculi kinetochore subunit
organization. Proteins colored in pink, blue, green or yellow represent
components of the DNA binding, linker, regulatory or microtubule-
binding layers, respectively, based on kinetochore organization in S.
cerevisiae. Potential multi-protein complexes are highlighted with a grey
box. (c) Sequence alignment of fungal, metazoan and plantae Cdc20
showing the conserved Mad2 binding site. Note: E. cuniculi lacks both the
conserved Mad2 binding site and an ortholog of the Mad2 protein.
Schematic drawings indicate the length of the S. cerevisiae and E. cuniculi
proteins, the position of the WD-domain (black box) and the position
where Mad2 binds.
(b)
scCdc20
WD-domain
RI
I
LAFKPPP

RILQFKPAP

RVLAFKLDA

RILLYQPLP

RILQYMPEP


RILAFRNKP

KILRLGGRP
KILHLGGKP

KILRLSGKP

KILRLSGKP

RILSFQSAP

RILSFRNKP

RILCYQNKA

RILCYKKNL

RILEFKPAP
anCdc20
ncCdc20
spCdc20
caCdc20
scCdc20
atCdc20
xlCdc20
drCdc20
hsCdc20
mmCdc20
cnCdc20

osCdc20
dmCdc20
ceCdc20
mgCdc20
Fungi
Metazoa
Plantae
WD-domain
ecCdc20
MFTLEHIRPMDINLGDIFTTTGRRQAQKEDRYTKTTIGIHTSVLASIRLMTSSRS
(c)
Mad2
Mad2 BD-domain
(a)
Saccharomyces
cerevisiae
0
2
4
6
8
10
12
14
16
Encephalitozoon
cuniculi
Spc24
Spc25
Ndc80

Nuf2
Dsn1
Nnf1
Nsl1
Mis12
CENP-A
CENP-C
Skp1
(ii) MIND com. (iii) NDC80 com.
Spc105
(iv) SPC105 com.
Sgt1
SLI15 com.
Bir1
Ipl1
Bim1
Stu2
Bik1
Bub3
Mps1/
Mph1
Multi-protein complexes
DNA-binding components
Linker components
MT-binding components
Regulatory components
regional CEN
Number of predicted point CENs
Number of chromosomes
Kinesin

Kinesin
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.17
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
RNA-templated reverse transcriptase to replicate telomeres,
D. melanogaster uses an alternative mechanism based on
transposition of the HeT-A and TART retrotransposable ele-
ments [71]. It seems very likely that several distinct classes of
kinetochore arose early in evolution. Perhaps surprisingly,
fungal kinetochores appear to be as good a model for their
human counterparts as kinetochores in organisms such as
worms and flies.
For the majority of kinetochore proteins we have little knowl-
edge of their biochemical functions or their structure. It is
tempting to speculate that conserved sequence blocks repre-
sent protein-protein interaction domains or interaction sur-
faces under tight evolutionary pressure. However, with very
few exceptions (for example, the kinase domains of check-
point and regulatory proteins and motor domains of kines-
ins), blocks of conserved sequence do not correspond to
recognizable functional domains. This stands in contrast to
the situation in nuclear pore complexes, in which highly con-
served and recognizable domains correspond to key func-
tional units [72]. The most abundant structural elements in
kinetochore proteins are coiled coils, which are known to
function in protein-protein association [73] and act as springs
and levers [74]. Coiled coils in the budding yeast spindle pole
body protein scSpc42 also create a crystalline core involved in
spindle pole body duplication [75]. Biochemical and electron
microscopy experiments have shown that the heptad repeat

domains in all four subunits of the S. cerevisiae NDC80
complex associate to form an extended stalk that is linked to
two globular heads [32,33]. Whether the stalk is simply a
spacer or some sort of mechanical element remains unex-
plored. Only detailed structural and biochemical experi-
ments, backed up by analysis in vivo, will reveal the logic of
sequence conservation among kinetochore subunits.
A conserved molecular core of the kinetochore
A key conclusion in this paper is that four multi-protein link-
ers that form the core of the S. cerevisiae kinetochore, MIND,
the SPC105 complex, the NDC80 complex and COMA are also
likely to be present in a wide variety of species (Figure 11).
Along with CenH3 and CENP-C, SPC105, MIND and NDC80
complexes are ubiquitous. In budding yeast, linker complexes
are thought to form a bridge between proteins in direct con-
tact with DNA and those that bind MTs [5,14,15], and it will
be important to show that this is also true in other organisms.
Prior to the current work, biochemical experiments had led to
the identification of SPC105, MIND and NDC80 complexes in
S. pombe, C. elegans and human cells [18,19] but our
systematic sequence analysis extends these observations to a
greater variety of organisms, including E. cuniculi, a micro-
sporidium with a remarkably small proteome. The presence
of the structural kinetochore proteins listed above appears to
be more fundamental for chromosome segregation than a
Mad2-dependent spindle assembly checkpoint, which does
not seem to exist in E. cuniculi. Thus, ascertaining the precise
molecular functions of the MIND complex, NDC80 complex,
Evolutionary development of kinetochores from yeast to mammalsFigure 11
Evolutionary development of kinetochores from yeast to mammals. (a)

Model of the kinetochore using protein subunit positions derived from the
organization of the S. cerevisiae kinetochore. Proteins present in all fungal
and mammalian CENs are outlined in black while proteins present only in
fungi and mammals with regional CENs are outlined in red. Red dotted
outlines indicate proteins that are only present in fungi. Black dotted
outlines indicate that either this protein only exists in metazoans or that
only the metazoan ortholog is present at kinetochore. Proteins colored in
pink, blue, green or yellow represent components of the DNA binding,
linker, regulatory or microtubule-binding layers, respectively, based on
kinetochore organization in S. cerevisiae. Potential multi-protein complexes
are highlighted with a light gray box and the conserved kinetochore core,
or COMA/Sim4 adaptor with a dark gray box. Protein names are given for
H. sapiens first and then S. cerevisiae when different. Italic lettering indicates
that the protein has additional functions in the cell. The kinesins present at
kinetochores in H. sapiens are CENP-E (Kinesin-7) and MCAK (Kinesin-
13), and in S. cerevisiae Kip3 (Kinesin-8), Cin8 (Kinesin-5), Kip1 (Kinesin-5)
and Kar3 (Kinesin-14) (for nomenclature see [38]). (b) Quantification of
the number of kinetochore proteins, and their respective evolutionary
class, in S. cerevisiae, S. pombe, E. cuniculi and H. sapiens.
Rod
Zwilch
ZW10
/Dsl1
CENP-H
/Fta3
Spc24
Spc25
Ndc80
Nuf2
Dsn1

Nnf1
Nsl1
Mis12
/Mtw1
Cse4/CENP-A
nucleosomes
Skp1
(ii) MIND com.
(iii) NDC80 com.
CENP-I
/Ctf3
Chl4
Spc105
(iv) SPC105 com.
Sgt1
SLI15 com.
Survivin
Bir1
INCENP
/Sli15
Aurora B
/Ipl1
Bim1/
EB1
CLIP-170
/Bik1
Bub3
Mad2
Mps1
Present at regional CEN only

Present at all fungal and metazoan CENs
Multi-protein complexes
DNA-binding components
Linker components
MT-binding components
Regulatory components
Regional CEN
CENP-C
/Mif2
Conserved Core Kinetochore
DASH com.
Dam1
Duo1
Spc19
Spc34
Dad1
Dad2
Dad3
Dad4
Ask1
Hsk3
APC
/Kar9
CENP-F
CLASP
/Stu1
Iml3
Present at metazoan CEN only
ch-Tog1
/Stu2

Dynein
/Dyn1
Present at fungal CEN only
RanGAP1
RanBP2
mDia3
Plk1
/cdc5
TD-60
Zwint-1
Dynactin
(i) COMA com.
Ctf19
Bub1 Mad1
Fungal MAPs
Metazoan MAPs
Kinesin
BubR1
/Mad3
Kinesin
Lis1
/Pac1
Mcm21
10 20 30 40 50 60 70 80 90
H. sapiens
S. pombe
S. cerevisiae
Number of proteins
Point CEN only (e.g. CBF3)
Fungi only (e.g. Dam1)

Common core (e.g. Ndc80/Hec1)
Common regulators (e.g. Mad2)
Regional CEN only (e.g. Fta3/CENP-H)
Metazoa only (e.g. Zwilch-Rod)
Proteins shared with other cellular structures (e.g. Skp1, Dynein)
E. cuniculi
Slk19
0
(b)
(a)
Borealin
Sim4
Fta1
COMA/Sim4
Adapter(s)
R23.18 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Spc105, CenH3 and CENP-C and of the macromolecular
assemblies in which they participate is a key task in the study
of kinetochore biology.
Diverged kinetochore components
Budding yeast with CDEI-II-III point CENs contain a set of 11
proteins that are not present in fungi such as S. pombe or C.
albicans (Figure 11). Three of the eleven point-CEN specific
proteins are involved in sequence-specific binding to CDEIII
while six are part of the COMA complex or of a COMA-
dependent assembly pathway. Only three of the eleven com-
ponents of the COMA pathway in S. cerevisiae (Mcm21
Mal2
,
Chl4

Mis15
and Ctf3
Mis6/CENP-I
) are conserved among fungi and
mammalian kinetochores (Figure 11). In S. pombe, an alter-
native set of eight proteins, including spSim4 and spFta1-7,
are bound to the COMA components spMcm21
Mal2
,
spChl4
Mis15
and spMis6
Ctf3
. At least three of these proteins
(CENP-H
Fta3
, Fta1 and Sim4) are members of a class of pro-
teins found in fungal and metazoan organisms with regional
CENs whereas the other four proteins have no obvious
orthologs (Figure 11a). Overall, these data point to COMA and
COMA-associated proteins as kinetochore components with a
particularly high degree of sequence divergence through evo-
lution. It seems reasonable to speculate that COMA helps to
accommodate kinetochore subunits that are highly conserved
among regional and point CENs, such as the NDC80 complex,
to diverged components, such as CBF3. By analogy, it seems
likely that specialized proteins have evolved to meet the
special structural demands of holocentric CENs; ceKNL-3, a
kinetochore protein bound to the C. elegans MIND and
NDC80 complexes [18] but absent from other kinetochores,

may be an early example of a holocentric adaptor.
The logic of kinetochore assembly
The MT binding components of kinetochores are unlike kine-
tochore structural components in that almost all are involved
in multiple MT-based processes (Figure 11). In humans for
example, EB1
Bim1
and APC
Kar9
are found not only at kineto-
chores, but also at sites of MT association with the cell cortex;
CLIP-170
Bik1
and Dynein play important roles in vesicle traf-
ficking and ch-Tog1
Stu2
is required for spindle assembly. From
yeast to humans, only one or two of the six to ten kinetochore
MAPs and motors are specific to kinetochores. CENP-A func-
tions in most organisms to determine CEN location without
recognizing CEN-specific sequences; similarly, the NDC80-
MIND-SPC105-COMA complexes must determine the
specialized biochemistry of MT-kinetochore linkages without
resort to many kinetochore-specific MAPs.
Conclusion
We conclude that critical structural features of kinetochores
are conserved from yeast to man, despite highly divergent
CEN sequences. It appears that both short S. cerevisiae point
centromeres and complex metazoan regional centromeres
arose from a common ancestor that probably had regional

centromeres. Both simple and complex kinetochores contain
conserved SPC105, MIND and NDC80 complexes along with
more variable COMA complexes. This core assembly is sup-
plemented by adaptor proteins specific to organisms with
point, regional or holocentric CENs. The key to understand-
ing kinetochore biology is now to determine how specialized
adaptors and conserved core complexes interact with inner
centromere components such as CenH3 and CENP-C to
assemble structures capable of binding to and regulating
microtubules through the recruitment of MAPs and motors.
Materials and methods
Sequence-similarity searches
Database searches were performed on NCBI non-redundant
and EST databases using PSI-BLAST and BLAST (protein-
protein BLAST (blastp) and genomic BLAST (tblastn)) [76].
Pattern searches were performed using ScanProsite [77].
Multiple sequence alignments were built with ClustalW,
MUSCLE and T-Coffee and edited by hand [78-80]. Coiled
coil predictions were based on the COILS program using a
window size of 28 [81]. Human Nnf1R and Fta1R were identi-
fied in PSI-BLAST searches using the full-length S. pombe
Fta1 or S. cerevisiae Nnf1 protein sequences as the queries. To
identify human Mcm21R, the S. cerevisiae Mcm21 protein
sequence was first used as a query in PSI-BLAST searches
that yielded fungal Mcm21 related proteins. These proteins
were assembled in a multiple sequence alignment from which
the motif [HYF]- [KRHDENQ]- [VLI]-x- [HYF]- [ST]- [IVL]-
[P]-x-x- [IL]-x- [ILV] was derived, and then used in a pattern
search to identify metazoan orthologs. To identify orthologs
of S. cerevisiae Nsl1 and Chl4, S. pombe Sim4 or H. sapiens

CENP-H conserved blocks were first identified. PSI-BLAST
searches were carried out using S. pombe Sim4, S. cerevisiae
Nsl1, S. cerevisiae Chl4 or H. sapiens CENP-H as query
sequences. This approach identified a set of fungal Sim4, Nsl1
or Chl4 related proteins and a set of metazoan CENP-H
related proteins. Each set of proteins was then assembled into
multiple sequence alignments and conserved blocks identi-
fied (amino acids 1 to 143 for U. maydis Nsl1, 1 to 114 for S.
cerevisiae Chl4, 341 to 373 for S. pombe Sim4 and 224 to 269
for H. sapiens CENP-H). The sequences present in these con-
served blocks were then used in PSI-BLAST searches to iden-
tify new fungal (for CENP-H) or metazoan (for Sim4, Chl4 or
Nsl1) proteins.
Phylogenetic analysis
Phylogenetic alignments were generated with MUSCLE using
GBlocks to identify conserved blocks [82]. Conserved blocks
were selected only if single positions were conserved in at
least 50% of the sequences, with higher stringency at flanking
positions (80%). A maximum of eight contiguous non-con-
served positions were allowed. The minimum block length
was five amino acids. Positions with gaps were allowed only if
their number did not exceed 50%. Conserved blocks and the
number of positions used for each protein family are
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.19
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
described in Additional data file 5. To calculate the distances
between sequences we took a maximum likelihood approach
using TREE-PUZZLE [83] with the 'Pairwise distance calcu-
lation only' option, the Jones-Taylor-Thornton substitution

matrix [84] and gamma-distributed rates (eight categories) to
account for rate heterogeneity (parameters were estimated
from the dataset). A neighbor-joining tree was constructed
from the distance matrix with the NEIGHBOR program from
the PHYLIP package [85]. Reliability of the dataset was
assessed by bootstrap. We generated 100 permutation data-
sets using the SEQBOOT program from the PHYLIP package.
From these 100 datasets we calculated distance matrices and
constructed neighbor-joining trees using the parameters
described above. TREE-PUZZLE was then used with the
'Consensus of user defined trees' option to generate a consen-
sus tree from all neighbor-joining trees (nodes with support
less than 50% were collapsed) [86]. Trees were visualized
using the SPLITS TREE tool [87]. Amino acid similarity
percentages used in multiple sequence alignments are given
in Additional data file 4.
Hidden Markov model based-modeling
The point CEN model was constructed from three different
sub-models based on the known structure of point CENs. The
first sub-model searched for CDEI-like regions in the query
sequence using the [T|G]CA[C|G|T][A|C|G]TG motif. The
second sub-model then searched for adjacent CDEII-like AT
rich regions. The CDEII region was modeled with a HMM
using CDEII from S. cerevisiae [88]. For the negative model,
S. cerevisiae genomic DNA was used (the effect of including
CENs in the genomic DNA was disregarded). For both data-
sets, base transition frequencies were determined and the
transition matrix for the HMM was calculated. The quality of
the HMM was evaluated by screening annotated budding
yeast genomes and assessment with a bit score:

Given the identification of CDEI and CDEII sequence ele-
ments, a third sub-model searched for an adjacent CDEIII
motif using an expression based on the highly conserved
CCGGAA motif. Positive hits were evaluated with the bit score
calculated from the CDEII HMM, length distribution, AT
length, AT runs and synteny.
Additional data files
The following additional data are available with the online
version of this paper. Additional data file 1 contains accession
numbers of all proteins that are used in this study. Additional
data file 2 shows the multiple sequence alignment of S. cere-
visiae Spc34, a subunit of the multi-protein DASH complex,
with a set of fungal orthologs and a set of related metazoan
proteins (NYD-Sp28 family). Additional data file 3 contains
multiple sequence alignments of the E. cuniculi kinetochore
proteins Ndc80, Nuf2R, Mis12/Mtw1, Nnf1, Spc105 and
CENP-C amongst five fungi. Additional data files 4 and 5 list
amino acid similarities used in all multiple sequence
alignments and homology blocks used in phylogenetic analy-
sis, respectively.
Additional File 1Accession numbers of all proteins that are used in this studyAccession numbers of all proteins that are used in this study.Click here for fileAdditional File 2Multiple sequence alignment of S. cerevisiae Spc34 with a set of fungal orthologs and a set of related metazoan proteinsIdentification of a potential ortholog of the DASH complex subunit Spc34 in humans. S. cerevisiae Spc34 was aligned with five fungal and four metazoan sequences. Percentages denote the degree of similarity of successive sequence blocks (black boxes). White let-ters on black denote identical residues, white letters on green, iden-tical residues in ≤ 80% of the organisms and black letters on green, similar residues in ≤ 80% of the organisms. Accession numbers are described in additional data file 1.Click here for fileAdditional File 3Multiple sequence alignments of the E. cuniculi kinetochore pro-teins Ndc80, Nuf2R, Mis12/Mtw1, Nnf1, Spc105 and CENP-C amongst five fungiIdentification of E. cuniculi kinetochore proteins. Multiple sequence alignments of the Ndc80, Nuf2, Nnf1, Mis12
Mtw1
, CENP-C
Mif2
and Spc105 proteins amongst five fungi and E. cuniculi. Per-centages denote the degree of similarity of successive sequence blocks (black boxes). White letters on black denote identical resi-dues, white letters on green, identical residues in ≥ 80% of the organisms and black letters on green, similar residues in ≥ 80% of the organisms. Accession numbers are described in additional data file 1.Click here for fileAdditional File 4Amino acid similarities used in all multiple sequence alignmentsAmino acid similarities used in all multiple sequence alignments.Click here for fileAdditional File 5Homology blocks used in phylogenetic analysisHomology blocks used in phylogenetic analysis.Click here for file
Acknowledgements
We thank Daniel Huson (University of Tübingen) for help with the compu-
tational centromere screen and members of the Sorger lab for helpful dis-
cussions. ADM was supported by a fellowship from the Jane Coffin Childs
Fund for Medical Research and PM by an EMBO long-term fellowship. This

work was supported by NIH grants CA84179 and GM51464.
References
1. Koshland DE, Mitchison TJ, Kirschner MW: Polewards chromo-
some movement driven by microtubule depolymerization in
vitro. Nature 1988, 331:499-504.
2. Cleveland DW, Mao Y, Sullivan KF: Centromeres and kineto-
chores: from epigenetics to mitotic checkpoint signaling. Cell
2003, 112:407-421.
3. Choo KH: Domain organization at the centromere and
neocentromere. Dev Cell 2001, 1:165-177.
4. Fitzgerald-Hayes M, Clarke L, Carbon J: Nucleotide sequence
comparisons and functional analysis of yeast centromere
DNAs. Cell 1982, 29:235-244.
5. McAinsh AD, Tytell JD, Sorger PK: Structure, function, and reg-
ulation of budding yeast kinetochores. Annu Rev Cell Dev Biol
2003, 19:519-539.
6. Sanyal K, Baum M, Carbon J: Centromeric DNA sequences in
the pathogenic yeast Candida albicans are all different and
unique. Proc Natl Acad Sci USA 2004, 101:11374-11379.
7. Houben A, Schubert I: DNA and proteins of plant centromeres.
Curr Opin Plant Biol 2003, 6:554-560.
8. Schueler MG, Higgins AW, Rudd MK, Gustashaw K, Willard HF:
Genomic and genetic definition of a functional human
centromere. Science 2001, 294:109-115.
9. Sun X, Wahlstrom J, Karpen G: Molecular structure of a func-
tional Drosophila centromere. Cell 1997, 91:1007-1019.
10. Clarke L, Amstutz H, Fishel B, Carbon J: Analysis of centromeric
DNA in the fission yeast Schizosaccharomyces pombe. Proc
Natl Acad Sci USA 1986, 83:8253-8257.
11. Albertson DG, Thomson JN: The kinetochores of Caenorhabditis

elegans. Chromosoma 1982, 86:409-428.
12. Wiens GR, Sorger PK: Centromeric chromatin and epigenetic
effects in kinetochore assembly. Cell 1998, 93:313-316.
13. Mellone BG, Allshire RC: Stretching it: putting the CEN(P-A) in
centromere. Curr Opin Genet Dev 2003, 13:191-198.
14. De Wulf P, McAinsh AD, Sorger PK: Hierarchical assembly of the
budding yeast kinetochore from multiple subcomplexes.
Genes Dev 2003, 17:2902-2921.
15. Nekrasov VS, Smith MA, Peak-Chew S, Kilmartin JV: Interactions
between centromere complexes in Saccharomyces cerevisiae.
Mol Biol Cell 2003, 14:4931-4946.
16. Scharfenberger M, Ortiz J, Grau N, Janke C, Schiebel E, Lechner J:
Nsl1p is essential for the establishment of bipolarity and the
localization of the Dam-Duo complex. EMBO J 2003,
22:6584-6597.
17. Bharadwaj R, Qi W, Yu H: Identification of two novel compo-
nents of the human NDC80 kinetochore complex. J Biol Chem
2004, 279:13076-13085.
18. Cheeseman IM, Niessen S, Anderson S, Hyndman F, Yates JR 3rd,
Oegema K, Desai A: A conserved protein network controls
assembly of the outer kinetochore and its ability to sustain
tension. Genes Dev 2004, 18:2255-2268.
19. Obuse C, Iwasaki O, Kiyomitsu T, Goshima G, Toyoda Y, Yanagida M:
A conserved Mis12 centromere complex is linked to hetero-
chromatic HP1 and outer kinetochore protein Zwint-1. Nat
Cell Biol 2004, 6:1135-1141.
20. Goshima G, Kiyomitsu T, Yoda K, Yanagida M: Human
centromere chromatin protein hMis12, essential for equal
segregation, is independent of CENP-A loading pathway. J
Sx

Px
Px
() log
(| )
(| )
=
+

model
model
R23.20 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. />Genome Biology 2006, 7:R23
Cell Biol 2003, 160:25-39.
21. Liu ST, Hittle JC, Jablonski SA, Campbell MS, Yoda K, Yen TJ: Human
CENP-I specifies localization of CENP-F, MAD1 and MAD2
to kinetochores and is essential for mitosis. Nat Cell Biol 2003,
5:341-345.
22. McCleland ML, Gardner RD, Kallio MJ, Daum JR, Gorbsky GJ, Burke
DJ, Stukenberg PT: The highly conserved Ndc80 complex is
required for kinetochore assembly, chromosome congres-
sion, and spindle checkpoint activity. Genes Dev 2003,
17:101-114.
23. Tirnauer JS, Canman JC, Salmon ED, Mitchison TJ: EB1 targets to
kinetochores with attached, polymerizing microtubules. Mol
Biol Cell 2002, 13:4308-4316.
24. Kitagawa K, Hieter P: Evolutionary conservation between bud-
ding yeast and human kinetochores. Nat Rev Mol Cell Biol 2001,
2:678-687.
25. Wigge PA, Kilmartin JV: The Ndc80p complex from Saccharo-
myces cerevisiae contains conserved centromere compo-
nents and has a function in chromosome segregation. J Cell

Biol 2001, 152:349-360.
26. Dujardin D, Wacker UI, Moreau A, Schroer TA, Rickard JE, De Mey
JR: Evidence for a role of CLIP-170 in the establishment of
metaphase chromosome alignment. J Cell Biol 1998,
141:849-862.
27. Henikoff S, Dalal Y: Centromeric chromatin: what makes it
unique? Curr Opin Genet Dev 2005, 15:177-184.
28. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine
I, De Montigny J, Marck C, Neuveglise C, Talla E, et al.: Genome evo-
lution in yeasts. Nature 2004, 430:35-44.
29. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing
and comparison of yeast species to identify genes and regu-
latory elements. Nature 2003, 423:241-254.
30. Kuras L, Thomas D: Identification of the yeast methionine bio-
synthetic genes that require the centromere binding factor
1 for their transcriptional activation. FEBS Lett 1995, 367:15-18.
31. Cardozo T, Pagano M: The SCF ubiquitin ligase: insights into a
molecular machine. Nat Rev Mol Cell Biol 2004, 5:739-751.
32. Ciferri C, De Luca J, Monzani S, Ferrari KJ, Ristic D, Wyman C, Stark
H, Kilmartin J, Salmon ED, Musacchio A: Architecture of the
human ndc80-hec1 complex, a critical constituent of the
outer kinetochore. J Biol Chem 2005, 280:29088-29095.
33. Wei RR, Sorger PK, Harrison SC: Molecular organization of the
Ndc80 complex, an essential kinetochore component. Proc
Natl Acad Sci USA 2005, 102:5363-5367.
34. Kline-Smith SL, Sandall S, Desai A: Kinetochore-spindle microtu-
bule interactions during mitosis. Curr Opin Cell Biol 2005,
17:35-46.
35. Malik HS, Henikoff S: Conflict begets complexity: the evolution
of centromeres. Curr Opin Genet Dev 2002, 12:711-718.

36. Biswas K, Rieger KJ, Morschhauser J: Functional characterization
of CaCBF1, the Candida albicans homolog of centromere
binding factor 1. Gene 2003, 323:43-55.
37. Browning H, Hackney DD, Nurse P: Targeted movement of cell
end factors in fission yeast. Nat Cell Biol 2003, 5:812-818.
38. Lawrence CJ, Dawe RK, Christie KR, Cleveland DW, Dawson SC,
Endow SA, Goldstein LS, Goodson HV, Hirokawa N, Howard J, et al.:
A standardized kinesin nomenclature. J Cell Biol 2004,
167:19-22.
39. Liu X, McLeod I, Anderson S, Yates JR 3rd, He X: Molecular analy-
sis of kinetochore architecture in fission yeast. EMBO J 2005,
24:2919-2930.
40. Pidoux AL, Richardson W, Allshire RC: Sim4: a novel fission yeast
kinetochore protein required for centromeric silencing and
chromosome segregation. J Cell Biol 2003, 161:295-307.
41. Wang Y, Devereux W, Stewart TM, Casero RA Jr: Cloning and
characterization of human polyamine-modulated factor-1, a
transcriptional cofactor that regulates the transcription of
the spermidine/spermine N(1)-acetyltransferase gene. J Biol
Chem 1999, 274:22095-22101.
42. Yamashita A, Ito M, Takamatsu N, Shiba T: Characterization of
Solt, a novel SoxLZ/Sox6 binding protein expressed in adult
mouse testis. FEBS Lett 2000, 481:147-151.
43. Miranda JJ, De Wulf P, Sorger PK, Harrison SC: The yeast DASH
complex forms closed rings on microtubules. Nat Struct Mol
Biol 2005, 12:138-143.
44. Westermann S, Avila-Sakar A, Wang HW, Niederstrasser H, Wong J,
Drubin DG, Nogales E, Barnes G: Formation of a dynamic kine-
tochore- microtubule interface through assembly of the
Dam1 ring complex. Mol Cell 2005, 17:277-290.

45. Jishage M, Fujino T, Yamazaki Y, Kuroda H, Nakamura T: Identifica-
tion of target genes for EWS/ATF-1 chimeric transcription
factor. Oncogene 2003, 22:41-49.
46. Pazour GJ, Agrin N, Leszyk J, Witman GB: Proteomic analysis of a
eukaryotic cilium. J Cell Biol 2005, 170:103-113.
47. Meluh PB, Koshland D: Evidence that the MIF2 gene of Saccha-
romyces cerevisiae encodes a centromere protein with
homology to the mammalian centromere protein CENP-C.
Mol Biol Cell 1995, 6:793-807.
48. Stoler S, Keith KC, Curnick KE, Fitzgerald-Hayes M: A mutation in
CSE4, an essential gene encoding a novel chromatin-associ-
ated protein in yeast, causes chromosome nondisjunction
and cell cycle arrest at mitosis. Genes Dev 1995, 9:573-586.
49. Williams BC, Li Z, Liu S, Williams EV, Leung G, Yen TJ, Goldberg ML:
Zwilch, a new component of the ZW10/ROD complex
required for kinetochore functions. Mol Biol Cell 2003,
14:1379-1391.
50. Sugata N, Munekata E, Todokoro K: Characterization of a novel
kinetochore protein, CENP-H. J Biol Chem 1999,
274:27343-27346.
51. Starr DA, Williams BC, Li Z, Etemad-Moghadam B, Dawe RK, Gold-
berg ML: Conservation of the centromere/kinetochore pro-
tein ZW10. J Cell Biol 1997, 138:1289-1301.
52. Rattner JB, Rao A, Fritzler MJ, Valencia DW, Yen TJ: CENP-F is a.ca
400 kDa kinetochore protein that exhibits a cell-cycle
dependent localization. Cell Motil Cytoskeleton 1993, 26:214-226.
53. Yen TJ, Compton DA, Wise D, Zinkowski RP, Brinkley BR, Earnshaw
WC, Cleveland DW: CENP-E, a novel human centromere-
associated protein required for progression from metaphase
to anaphase. EMBO J 1991, 10:1245-1254.

54. Westermann S, Cheeseman IM, Anderson S, Yates JR 3rd, Drubin
DG, Barnes G: Architecture of the budding yeast kinetochore
reveals a conserved molecular core. J Cell Biol 2003,
163:215-222.
55. Andag U, Schmitt HD: Dsl1p, an essential component of the
Golgi-endoplasmic reticulum retrieval system in yeast, uses
the same sequence motif to interact with different subunits
of the COPI vesicle coat. J Biol Chem 2003, 278:51722-51734.
56. Hirose H, Arasaki K, Dohmae N, Takio K, Hatsuzawa K, Nagahama
M, Tani K, Yamamoto A, Tohyama M, Tagaya M: Implication of
ZW10 in membrane trafficking between the endoplasmic
reticulum and Golgi. EMBO J 2004, 23:1267-1278.
57. Starr DA, Williams BC, Hays TS, Goldberg ML: ZW10 helps
recruit dynactin and dynein to the kinetochore. J Cell Biol 1998,
142:763-774.
58. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF: A kingdom-
level phylogeny of eukaryotes based on combined protein
data. Science 2000, 290:972-977.
59. Gribaldo S, Philippe H: Ancient phylogenetic relationships.
Theor Popul Biol 2002, 61:391-408.
60. Heeger S, Leismann O, Schittenhelm R, Schraidt O, Heidmann S, Leh-
ner CF: Genetic interactions of separase regulatory subunits
reveal the diverged Drosophila Cenp-C homolog. Genes Dev
2005, 19:2041-2053.
61. Blower MD, Karpen GH: The role of Drosophila CID in kineto-
chore formation, cell-cycle progression and heterochroma-
tin interactions. Nat Cell Biol 2001, 3:730-739.
62. Karess R: Rod-Zw10-Zwilch: a key player in the spindle
checkpoint. Trends Cell Biol 2005, 15:386-392.
63. Vivares CP, Gouy M, Thomarat F, Metenier G: Functional and evo-

lutionary analysis of a eukaryotic parasitic genome. Curr Opin
Microbiol 2002, 5:499-505.
64. Peters JM: The anaphase-promoting complex: proteolysis in
mitosis and beyond. Mol Cell 2002, 9:931-943.
65. Luo X, Tang Z, Rizo J, Yu H: The Mad2 spindle checkpoint
protein undergoes similar major conformational changes
upon binding to either Mad1 or Cdc20. Mol Cell 2002, 9:59-71.
66. Sironi L, Mapelli M, Knapp S, De Antoni A, Jeang KT, Musacchio A:
Crystal structure of the tetrameric Mad1-Mad2 core com-
plex: implications of a 'safety belt' binding mechanism for the
spindle checkpoint. EMBO J 2002, 21:2496-2506.
67. Talbert PB, Bryson TD, Henikoff S: Adaptive evolution of centro-
mere proteins in plants and animals. J Biol 2004, 3:18 .
68. Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier
G, Barbe V, Peyretaillade E, Brottier P, Wincker P, et al.: Genome
sequence and gene compaction of the eukaryote parasite
Encephalitozoon cuniculi. Nature 2001, 414:450-453.
Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.21
comment reviews reports refereed researchdeposited research interactions information
Genome Biology 2006, 7:R23
69. Dalgaard JZ, Vengrova S: Selective gene expression in multigene
families from yeast to mammals. Sci STKE 2004, 2004:re17.
70. Jager H, Herzig A, Lehner CF, Heidmann S: Drosophila separase is
required for sister chromatid separation and binds to PIM
and THR. Genes Dev 2001, 15:2572-2584.
71. Louis EJ: Are Drosophila telomeres an exception or the rule?
Genome Biol 2002, 3:REVIEWS0007.
72. Bapteste E, Charlebois RL, MacLeod D, Brochier C: The two tem-
pos of nuclear pore complex evolution: highly adapting pro-
teins in an ancient frozen structure. Genome Biol 2005, 6:R85.

73. Newman JR, Wolf E, Kim PS: A computationally directed screen
identifying interacting coiled coils from Saccharomyces
cerevisiae. Proc Natl Acad Sci USA 2000, 97:13203-13208.
74. Rose A, Meier I: Scaffolds, levers, rods and springs: diverse cel-
lular functions of long coiled-coil proteins. Cell Mol Life Sci 2004,
61:1996-2009.
75. Bullitt E, Rout MP, Kilmartin JV, Akey CW: The yeast spindle pole
body is assembled around a central crystal of Spc42p. Cell
1997, 89:1077-1086.
76. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lip-
man DJ: Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. Nucleic Acids Res 1997,
25:3389-3402.
77. Gattiker A, Gasteiger E, Bairoch A: ScanProsite: a reference
implementation of a PROSITE scanning tool. Appl
Bioinformatics 2002, 1:107-108.
78. Higgins DG: CLUSTAL V: multiple alignment of DNA and
protein sequences. Methods Mol Biol 1994, 25:307-318.
79. Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method
for fast and accurate multiple sequence alignment. J Mol Biol
2000, 302:205-217.
80. Edgar RC: MUSCLE: multiple sequence alignment with high
accuracy and high throughput. Nucleic Acids Res 2004,
32:1792-1797.
81. Lupas A, Van Dyke M, Stock J: Predicting coiled coils from pro-
tein sequences. Science 1991, 252:1162-1164.
82. Castresana J: Selection of conserved blocks from multiple
alignments for their use in phylogenetic analysis. Mol Biol Evol
2000, 17:540-552.
83. Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZ-

ZLE: maximum likelihood phylogenetic analysis using quar-
tets and parallel computing. Bioinformatics 2002, 18:502-504.
84. Jones DT, Taylor WR, Thornton JM: The rapid generation of
mutation data matrices from protein sequences. Comput Appl
Biosci 1992, 8:275-282.
85. Felsenstein J: PHYLIP - Phylogeny Inference Package (Version
3.2). Cladistics 1989, 5:164-166.
86. Felsenstein KM, Lewis-Higgins L: Processing of the beta-amyloid
precursor protein carrying the familial, Dutch-type, and a
novel recombinant C-terminal mutation. Neurosci Lett 1993,
152:185-189.
87. Huson DH, Bryant D: Application of phylogenetic networks in
evolutionary studies. Mol Biol Evol 2006, 23:254-267.
88. Durbin R, Eddy SR, Krogh A, Mitchison G: Probabilistic Models of Pro-
teins and Nucleic Acids Cambridge: Cambridge University Press; 1998.

×