Tải bản đầy đủ (.pdf) (12 trang)

Báo cáo khoa học: The astacin protein family in Caenorhabditis elegans docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (466.74 KB, 12 trang )

The astacin protein family in
Caenorhabditis elegans
Frank Mo¨ hrlen
1
, Harald Hutter
2
and Robert Zwilling
1
1
Institute of Zoology, University of Heidelberg;
2
Max-Planck-Institute for Medical Research, Heidelberg, Germany
In the nematode Caenorhabditis elegans, 40 genes code for
astacin-like proteins (nematode astacins, NAS). The astacins
are metalloproteases present in bacteria, invertebrates and
vertebrates and serve a variety of physiological functions like
digestion, hatching, peptide processing, morphogenesis and
pattern formation. With the exception of one distorted
pseudogene, all the other C. elegans astacins are expressed
and are evidently functional. For 13 genes we found splicing
patterns differing from the Genefinder predictions in
WormBase, sometimes markedly. The GFP expression
pattern for NAS-4 shows a specific localization in anterior
pharynx cells and in the whole digestive tract (as the secreted
form). In contrast, NAS-7 is found in the head of adult
hermaphrodites, but not in pharynx cells or in the lumen of
the digestive tract. In embryos, NAS-7 fluorescence becomes
detectable just before hatching. In C. elegans astacins, three
basic structural and functional moieties can be discerned:
a prepro portion, the central catalytic chain and long
C-terminal extensions with presumably regulatory func-


tions. Within the regulatory moiety, EFG-like, CUB, SXC,
and TSP-1 domains can be distinguished. Based on struc-
tural differences of the regulatory unit we established six
NAS subgroups, which seemingly represented different
functional and evolutionary clusters. This pattern deduced
exclusively from the domain arrangement in the regulatory
moiety is perfectly reflected in an evolutionary tree con-
structed solely from amino acid sequence information of the
catalytic chain. Related catalytic chains tend to have related
regulatory extensions. The notable gene, NAS-39 shows a
striking resemblance to human BMP-1 and the tolloids.
Keywords: astacin family; Astacus astacus; Caenorhabditis
elegans; protein evolution; metalloproteases.
The first evidence for the existence of the astacin protein
family can be traced back to the year 1967, when one of us
(R. Zwilling) observed a proteolytic activity in the digestive
fluid of the decapode crayfish Astacus astacus that was
different to all other proteases known at that time [1].
Investigations of the cleavage and inhibition specificity
confirmed this notion [2–4] and the elucidation of its unique
amino acid sequence demonstrated definitely that the
crayfish protease represented a new protein family [5]. In
subsequent studies, the X-ray crystal structure of the
Astacus protease, for which we proposed the denomination
ÔastacinÕ, was solved to a resolution of 1.8 A
˚
[6]. Astacin was
recognized to be a metalloprotease exhibiting a penta-
coordinated zinc ion in its active center [7]. In addition, the
site of biosynthesis [8], genome organization [9], and mode

of activation [10,11] have been elucidated, which made the
crayfish protease a prototype for the astacins.
A second member of the astacin protein family was
identified when Wang et al. and Wozney et al. (1988) studied
the human bone-inducing factor BMP-1, into which a
domainwithhighresemblancetocrayfishastacinisinserted
[12,13]. After that many more astacin-like proteins or genes
were described in rapid succession in vertebrates, inverte-
brates and even in prokaryotes [14], where they serve as
different physiological functions as food digestion, hatching,
peptide processing, morphogenesis and pattern formation
(for an overview see [15]). In the crayfish Astacus astacus,a
second astacin gene can be found in the embryo that is
activated only during a narrow time window just before
hatching [16].
In the model organism Caenorhabditis elegans metallo-
proteases are present in a great variety, as we have seen in
data bank analysis (also [17]). On the other hand we have
shown recently that the bulk of total proteolytic activity
found in crude extracts of mixed stage populations consists
of acidic aspartyl proteases [18,19]. However, with regard to
the number of expressed astacin genes C. elegans surpasses
any other organism studied so far. This investigation
therefore was stimulated by the question, what for this
959-cell organism would need more than 30 different and
active astacin genes.
Correspondence to R. Zwilling, Zoologisches Institut, Universita
¨
t
Heidelberg, Im Neuenheimer Feld 230, D-69120 Heidelberg,

Germany. Fax: + 49 6221 544913, Tel.: + 49 6221 545887,
E-mail:
Abbreviations: cDNA, complementary DNA; dsRNA, double-stran-
ded RNA; EST, expressed sequence tag; GFP, green fluorescent
protein; L1-4, larval stage 1–4; OST, open reading frame sequence tag;
RNAi, RNA interference; RT-PCR, reverse transcription-polymerase
chain reaction; NAS, nematode astacin.
Note: Supplementary figures are available at -
heidelberg.de/moehrlen
Note: The sequences and the alignment reported in this paper have
beensubmittedtoGenBank/EMBL/DDBJdatabankwithaccession
numbers AJ561200, AJ561201, AJ561202, AJ561203, AJ561204,
AJ561205, AJ561206, AJ561207, AJ561208, AJ561209, AJ561210,
AJ561211, AJ561212, AJ561213, AJ561214, AJ561215, AJ561216,
AJ561217, AJ561218, AJ561219, AJ561220, AJ561221 and
ALIGN_000543.
(Received 3 September 2003, revised 15 October 2003,
accepted 22 October 2003)
Eur. J. Biochem. 270, 4909–4920 (2003) Ó FEBS 2003 doi:10.1046/j.1432-1033.2003.03891.x
Materials and methods
Preparation of
C. elegans
The C. elegans wild-type strain N2 variant Bristol was
grown as a liquid culture in S-medium [20] supplemented
with Escherichia coli OP50 as food source. The cultures were
incubated at 18 °C for 6–8 days under vigorous shaking.
When the E. coli food source appeared to have been nearly
exhausted, the nematodes, representing a mixed population
of adults, all four larval stages and eggs, were harvested and
separated from bacteria as described elsewhere [20].

RNA purification
For the isolation of RNA, 100 lg fresh or frozen nematode
pellets from a liquid culture were ground by means of a
pestle in a mortar containing liquid nitrogen. Total RNA
was extracted from the resulting powder following the
protocol of Chomcynski and Sacchi [21]. Contamination by
genomic DNA was avoided by treating total RNA with
DNase I (RNase-free, Boehringer). Poly(A)-rich RNA was
isolated by the OligotexÓ mRNA procedure (Qiagen,
Germany).
DNA purification
Genomic DNA was isolated from 1 mL fresh nematodes
from a liquid culture using a standard protocol [22].
PCR amplification and cloning
Polyadenylated RNA (1 lg) was converted into single-
stranded cDNA using a d(T)
17
primer or a random hexa-
mer primer as described [23]. For the amplification of the
predicted astacin-like cDNA fragments specific oligo-
nucleotide primers derived from the genome sequencing
data were used. Primer sequences are available at http://
www.zoo.uni-heidelberg.de/moehrlen/docs/WebFig1.htm.
PCR amplification was performed on single-stranded
cDNA or genomic DNA as a control with 2 U high fidelity
Taq DNA polymerase (Invitrogen, Germany) to diminish
the mutation rate inherent to the PCR reaction. The cycling
conditions were 94 °Cfor3min,94°Cfor40s,55°Cfor
40 s, 68 °C for 1 min per kb for 35 cycles, and 68 °Cfor
8 min. After PCR, samples were analyzed in 2% agarose

gels and discrepancies between expected and observed size
of any PCR product were readily detected on visual
inspection of the gels. The PCR products were then excised
from 1.5% agarose gels and purified with a NucleoSpinÓ
gel-extraction kit (Macherey and Nagel, Germany). The
purified fragments were subjected to the SureCloneÓ
Ligation procedure and cloned into a pUC18 vector
according to manufacturer’s instructions (Pharmacia,
Sweden).
Plasmid DNA was prepared and subsequently nucleotide
sequences were determined by double-strand sequencing
according to the dideoxynucleotide chain-termination
method, using T7 DNA polymerase (Amersham, Sweden).
Universal M13 primers were used for sequencing. All
sequences have been deposited in EMBL/GenBank/DDBJ
under accession numbers AJ561200, AJ561201, AJ561202,
AJ561203, AJ561204, AJ561205, AJ561206, AJ561207,
AJ561208, AJ561209, AJ561210, AJ561211, AJ561212,
AJ561213, AJ561214, AJ561215, AJ561216, AJ561217,
AJ561218, AJ561219, AJ561220, AJ561221.
GFP fusion genes for expression studies
The genomic sequence data in WormBase [24] were used to
identify a genomic DNA fragment suitable for fusion to a
GFP reporter gene. In order to make sure that the gene
specific promoter and all proper cis-elements necessary for
guiding tissue specific expression are included in the
reporter, the whole upstream region between the gene of
interest and the neighboring upstream gene was used. For
PCR amplification of the genomic DNA fragment the
forward primers NAS-4:GFP/SacI/F1 (5¢-CGA GCT CTT

GAG TGA AGA TGC CAA GA-3¢), NAS7:GFP/BamHI/
F1 (5¢-CGG GAT CCT TCC GCC AAA GTC ATT TAG-
3¢), NAS-15:GFP/PstI/F1 (5¢-AAC TGC AGC TTT TCG
GAA GAC TTT TGC-3¢), NAS33:GFP/KpnI/F1 (5¢-GGG
GTA CCC CGG ACC ACA GTA AAG AAT-3¢)and
the corresponding reverse primers NAS4:GFP/KpnI/R1
(5¢-GGG GTA CCC TGA CAC GCT GAC CCA TAC-3¢),
NAS7:GFP/KpnI/R1 (5¢-GGG GTA CCC GATC CTC GCA
TTC TA-3¢), NAS15:GFP/KpnI/R1 (5¢-GGG GTA CCC
GCT GGG TAG TGG AGT TG-3¢), NAS33:GFP/SacI/
F1 (5¢-CGA GCT CTG ACA AGA AAG GCA CAA AG-
3¢) were used. A 8–10 kb PCR fragment containing
approximately 3–5 kb upstream sequences down to the last
30–50 codons of the astacin genes was fused in frame to the
reporter gene GFP. Thus, the intergenic region as well as the
protein coding regions of the astacin-like genes NAS-4,
NAS-7; NAS-15 and NAS-33 were amplified with 2 U
ElongaseÓ DNA polymerase (Invitrogene, Germany), gel
purified (NucleoSpinÓ gel-extraction kit, Macherey and
Nagel, Germany) and cloned in frame into a pBD95.85
vector (having the S65C mutation and artificial introns
to increase the expression of GFP; A. Fire Vector Kit,
Baltimore, USA) according to standard protocols [23,25].
The molecular details of all fusion constructs are available
on request. The construct, together with the marker plasmid
pBx, was introduced into pha-1 hermaphrodites, and the
worms having the constructs as extrachromosomal arrays
were isolated at 25 °C and observed for GFP fluorescence
under a Zeiss Axiovert 200 microscope.
Sequence analysis and phylogenetic studies

To identify metalloprotease genes in the genome of
C. elegans, we used representative vertebrate and insect
proteins, or their conserved domains according to the
PFAM [26] and PRINTS database [27], as queries for
BLAST searches [28,29] of WormBase [24]. For astacin
genes the astacin domain, the zinc binding motif or the
Met-turn sequences, as listed by PRINTS, were used to
repeatedly screen the whole C. elegans genomic sequence,
available from WormBase.
DNA sequences of all astacin genes were further analyzed
using the
HUSAR
package [30] and the predicted gene
structures were compared to the Genefinder predictions as
annotated in WormBase, and to the alternative GenieGene
open reading frame predictions of Kent and Zahler [31]. The
4910 F. Mo
¨
hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003
splicing patterns were subsequently refined using the EST/
OST sequences available in the latest WormBase release
(WormBase97, 7 March 2003) and the cDNA sequences
resulting from this work. Discrepancies between the
WormBase, GenieGene predictions and our own cDNA
sequences were communicated to those annotating the
sequences ( />docs/WebFig2.htm). The corrected cDNA sequences were
translated into amino acid sequences using the
HUSAR
package and aligned using
CLUSTAL

[32]. For remaining
unconfirmed splicing patterns, those protein predictions
were used for further analysis, which are in accordance with
the protein family alignment showing no exceptional
insertions, deletions or frame shifts (EMBL:
ALIGN_000543).
For identification and annotation of protein domains and
the analysis of domain architectures the tools of the
SMART [33], PFAM [26], ProDom [34] and INTERPRO
[35] protein domain databases were used.
For phylogenetic studies the active protease domains,
covering the region from Ala-1 to Leu-200 in the
prototype crayfish astacin, from the C. elegans astacins
and selected other astacin family members were aligned
using
CLUSTAL
[32] and imported into
GENEDOC
[36] for
further manipulation. The alignment is available at EMBL
database with accession number ALIGN_000543. Phylo-
genetic analyses were carried out using the neighbor-
joining method and the Bayesian phylogenetic method.
For neighbor-joining analysis the
PHYLIP
3.5 software
package [37] was used. Distances between the pairs of
protein sequences were calculated and corrected for
multiple changes according to the PAM001 distance
matrix. The reliability of the tree was tested by bootstrap

analysis with 100 replications. Bayesian phylogenetic
analysis [38,39] was performed by the
MR BAYES
3.0
BETA
4
program [40] with the WAG matrix [41] assuming a
gamma distribution of substitution rates. Prior probabil-
ities for all trees and amino acid replacement models were
equal; the starting trees were random. Metropolis-coupled
Markov chain Monte Carlo sampling was performed with
one cold and three heated chains that were run for 50 000
generations. Trees were sampled every 10th generation.
Posterior probabilities were estimated on 2000 trees
(burnin ¼ 3000). The tree presented here was visualized
using
TREE VIEW
[42].
Results and discussion
Astacin homologue proteins in
C. elegans
During a preliminary data base survey we observed in 1996
that the 959-cell organism C. elegans accommodates a
surprising number of gene sequences coding for astacin-like
proteins, while for other species with a much larger genome
not more than 2–3 astacin genes had been reported (G. Geier
and R. Zwilling, unpublished).
The complete sequencing of the 97 megabase genome of
C. elegans by the C. elegans Sequencing Consortium in
1998 [43] then made a thorough analysis possible. The

latest WormBase release (WormBase97, 7 March 2003)
contains now 21 437 coding sequences when counting
1891 alternate splice forms. Of these the MEROPS
protease database (latest release 6.11: 20 January 2003)
lists 382 protease genes (E.C.3.4), of which 158 genes
belong to the group of metalloproteases (E.C.3.4.24). The
metalloproteases of C. elegans can be arranged into 11
protein clans and subdivided into 27 protein families,
according to the nomenclature of Barrett et al.[44].Our
own BLAST searches in WormBase, using protein family
consensus sequences according to the PFAM or PRINTS
databases as queries, revised the number of identified
genes temporarily listed by MEROPS (see Table 1).
BLAST searches based on the whole astacin domain, the
zinc binding motif or the Met-turn sequence revealed
some more astacin genes in C. elegans in addition to those
listed by MEROPS so far, which finally brought up the
total number of astacin genes in C. elegans to 40 (Tables 1
and 2).
The nomenclature proposed for these 40 C. elegans
astacin genes is in accordance with suggestions of the
Table 1. One hundred and fifty-one genes coding for metalloproteases in C. elegans. Identification of genes was based on data available in MEROPS
(The protease database, release 6.11: 20 January 2003, ) and subsequently corrected by BLAST searches using the
genome sequencing data of C. elegans. Nomenclature is according to Barrett et al. [44].
Clan Protease family
Number
of genes Clan Protease family
Number
of genes
MA(E) M1 aminopeptidase 12 MF M17 leucyl aminopeptidase 2

M2 peptidyl-dipeptidase 1 MG M24A methionyl aminopeptidase I 5
M3A oligopeptidase 2 M24B aminopeptidase P 3
M13 neprilysin 23 MH M18 aminopeptidase I 1
M41 E. coli endopeptidase 3 M20A/B glutamate carboxypeptidase 5
MA(M) M8 leishmanolysin 1 M28B aminopeptidase Y 2
M10A MMP 6 M28X 4
M12A Astacin 40 MJ M38 beta-aspartyl dipeptidas 1
M12B/C ADAM 10 MK M22 O-sialoglycoprotein endopeptidase 2
MC M14A carboxypeptidase A 9 MM M50 S2P protease 1
M14B carboxypeptidase E 3 MX M48A Ste24 endopeptidase 1
ME M16A pitrilysin 5 M49 dipeptidylpeptidase 1
M16B mitochondrial processing peptidase 3 M67 proteasome regulatory subunit RPN11 3
M16X 2
Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4911
Table 2. Denomination of astacin genes in C. elegans, data base entries, approximate genetic map position and matching EST or OST clones (see WormBase release 94, Jan – 24–2003 [24]). For RT-PCR
sequences (fmNAS-x) resulting from this work see For further explications see text.
Gene
name
Wormpep
name EMBL/GenBank
Genetic map
position EST/OST
RT-PCR
sequencing Comment
NAS-1 F45G2.1 Z93382 III:22.1 OST Aberrant splice, corrected full-length sequence
( />NAS-2 F56A4.1 AC006645 AC006722 V:13.99 No PCR product Expression confirmed by microarrays only, translation fits best
with GenieGene prediction g-V-409
NAS-3 K06A4.1 Z70755 V:1.98 fmNAS-3 cDNA fits best with Genie Gene prediction g-V-1836
NAS-4 C05D11.6 U00048 III:1.33 OST fmNAS-4 cDNAs fit best with Genie Gene prediction g-II-1042
NAS-5 T23H4.3 Z83240 I:4.03 No PCR product Expression confirmed by microarrays only

NAS-6 4R79.1 AL031254 IV: 30.16 fmNAS-6 translation fits best with Genie Gene prediction g-IV-3005
NAS-7 C07D10.4 U13072 II:0.41 fmNAS-7 cDNA fits best with Genie Gene prediction g-II-1703
NAS-8 C34D4.9 U58755 IV:3.29 EST fmNAS-8
NAS-9 C37H5.9b U88315 V:6.52 EST Full-length sequence is confirmed by overlapping cDNAs
NAS-10 K09C8.3 Z68006 X:2.51 EST
NAS-11 K11G12.1 U23525 X:2.66 EST
NAS-12 C24F3.3 Z81055 IV:4.54 fmNAS-12
NAS-13 F39D8.4 Z69791 X:21.46 fmNAS-13 Translation fits best with Genie Gene prediction g-X-2412
NAS-14 F09E8.6 Z73896 IV:8.02 fmNAS-14 Translation fits best with Genie Gene prediction g-IV-2471
NAS-15 T04G9.2 U41274 X:19.12 EST fmNAS-15 cDNAs and translation fit best with Genie Gene
prediction g-X-2732
NAS-16 K03B8.1 Z74039 V:3.16 No PCR product Expression confirmed by microarrays only
NAS-17 K03B8.2 Z74039 V:3.16 No PCR product Expression confirmed by microarrays only
NAS-18 K03B8.3 Z74039 V:3.16 No PCR product Expression confirmed by microarrays only
NAS-19 K03B8.5 Z74039 V:3.16 fmNAS-19
NAS-20 T11F9.3 Z74042 V:3.2 fmNAS-20 cDNA and translation fits best with Genie Gene
prediction g-V-2325
NAS-21 T11F9.5 Z74042 V:3.2 fmNAS-21 Aberrant splice, corrected sequence
( />NAS-22 T11F9.6 Z74042 V:3.2 fmNAS-22 Aberrant splice, corrected sequence
( />translation fits best with Genie Gene
prediction g-V-2327
NAS-23 D1022 unassigned U23517 II:0.45 fmNAS-23 Not in
WORMBASE
, predicted using Genescan see
( />NAS-24 F20G2.4 Z79753 V:5.42 fmNAS-24 Translation fits best with Genie Gene prediction g-V-2804
NAS-25 F46C5.3 Z54281 II:0.92 EST fmNAS-25
NAS-26 T24A11.3 Z49072 III:4.54 EST cDNA Fits best with Genie Gene prediction g-V-483
toh-1
NAS-27 T23F4.4 AF025466 II:13.27 fmNAS-27
4912 F. Mo

¨
hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003
C. elegans Sequencing Consortium. In Table 2 we have
numbered these C. elegans astacins (nematode astacins,
NAS) from 1 to 40. The two proteins NAS-23 and NAS-40
(located on cosmids F54B8 and D1022) are not recorded in
the WormPep database (predicted proteins from Worm-
Base) but could be detected by a genomic TBLASTN search
and the use of the program
GENSCAN
. However, for NAS-40
GENSCAN
did not predict a complete protein but rather an
88 amino acid fragment which is interrupted by two stop
codons.
Hishida et al. [45] reported that HCH-1 (¼ F40E10.1,
NAS-34) is required for normal hatching and neuroblast
migration in C. elegans. For all other astacin genes,
beyond the Genefinder protein prediction in WormBase
and the partial transcription analysis by the EST or open
reading frame sequence tags (OST) projects no further
details were known. It therefore was indispensable to
confirm as a first step for each gene the existence of
expression products.
Transcriptome analysis
Comparing all genomic DNA sequences of astacin genes
identified by our BLAST search to the cDNA data of
WormBase it became evident that for 12 of the total of 40
genes EST or OST clones [46,47] were already known
(WormBase release 57, 17 December 2001). This confirmed

that the 12 genes in question were expressed on the mRNA
level.
The remaining 28 genes were analyzed by RT-PCR
followed by sequencing of the DNA fragments in order to
demonstrate their transcription activity. For each gene
specific primer pairs were synthesized, the gene frag-
ments amplified by PCR and the products analyzed on
agarose gels ( />docs/WebFig1.htm). In each case the PCR reaction with
reverse-transcribed RNA was accompanied by a control
reaction with genomic DNA. Introns within the amplified
DNA regions gave rise to correspondingly larger DNA
fragments when compared to their cDNA fragments. For
unambiguous identification and for the correction of
erroneous splicing pattern predictions for all DNA frag-
ments the PCR products were eluted from a agarose gel,
blunt end cloned into the vector pUC18 and subsequently
sequenced ( />docs/WebFig2.htm).
In combination with the recently available EST and OST
sequences (WormBase release 97, 7 March 2003) we found
for 13 genes (Table 2) splicing patterns differing from the
Genefinder predictions in WormBase, sometimes markedly.
In these cases, the experimental cDNA transcripts were in
good accordance with the alternative GenieGene open
reading frame predictions of Kent and Zahler [31] (Table 2
and />WebFig2.htm). For NAS-1, NAS-21, NAS-22 and NAS-
28 we observed aberrant splice sites from both, the
Genefinder and the GenieGene prediction. The manually
corrected cDNA sequences can be found at http://www.
zoo.uni-heidelberg.de/moehrlen/docs/WebFig2.htm. All
new sequence data including corrected gene structures have

been submitted to WormBase and EMBL/GenBank/DDBJ
databases (for accession number, see footnote). The genes
Table 2. (Continued).
Gene
name
Wormpep
name EMBL/GenBank
Genetic map
position EST/OST
RT-PCR
sequencing Comment
NAS-28 F42A10.8 U10414 III:1.38 OST fmNAS-28 Aberrant splice, corrected full-length sequence is confirmed by overlapping
cDNAs, ( />NAS-29 F58A6.4 U53339 II:1.98 fmNAS-29 Translation fits best with Genie Gene prediction g-II-1160
NAS-30 Y95B8 A1 AC024877 I:20.88 No PCR product Expression confirmed by microarrays only
NAS-31 F58B4.1 Z74038 V:2.87 EST fmNAS-31 cDNAs fits best with Genie Gene prediction g-V-2200,
possible alternative splice site
NAS-32 T02B11.7 AF022979 V:19.07 EST fmNAS-32
NAS-33 K04E7.3 U39666 X:2.93 fmNAS-33
NAS-34 F40E10.1 D85744Z69792 X:19.9925 EST Full-length cDNA confirmed by Hishida et al.
hch-1
NAS-35 R151.5 U00036 III:0.76 EST Full-length sequence is confirmed by overlapping cDNAs
toh-2
NAS-36 C26C6.3 Z72503 I:2.05 EST
NAS-37 C17G1.6 Z78415 X:1.48 EST
NAS-38 F57C12.1 U41554 X:19.47 EST
NAS-39 F38E9.2 U46668 X:23.83 EST
NAS-40 F54B8 unassigned Z93383 V:9.77 No PCR product; Not in
WORMBASE
, Pseudogene
Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4913

NAS-2, NAS-5, NAS-16, NAS-17, NAS-18 and NAS-30
showed no apparent PCR product in our RT-PCR analysis
(Table 2, />WebFig2.htm). However, the microarray projects of Hill
et al. [48,49], Kim et al. [50], or Jiang et al.[51](foran
overview see WormBase) support the expression of these
genes. We would like to point out that this technique has
no way to unerringly verify either the identity or the
splicing pattern of a gene because no sequence data are
produced.
Nevertheless, in summary it may be stated that with the
exception of pseudogene NAS-40 for all other 39 astacin
genes a transcription activity could be confirmed.
Functional analysis
We made an attempt to analyze the function of selected
astacin genes in C. elegans investigating the expression
pattern of four representative astacin genes of different
subgroups (see section on Structural and phylogenetic
analysis, Fig. 2.) using GFP-fusion constructs. All astacin-
GFP fusions were assayed for expression in animals from
embryonic stages onwards. At least three independent
transgenic lines were generated from at least two inde-
pendent clones of each of the astacin-GFP fusion
constructs to control for PCR-induced sequence errors.
The reporter gene fusion NAS-15::GFP and NAS-
33::GFP failed to give detectable expression in any life
stage. The fusion protein NAS-4::GFP showed extensive
GFP fluorescence throughout the digestive tract in larval
stages and in adult worms (Fig. 1A). At higher magnifi-
cation, we saw GFP staining within pharynx cells of the
procorpus, metacorpus, isthmus and terminal bulb, and

extracellular staining in the lumen of the terminal bulb
(Fig. 1B, arrows). Therefore, NAS-4 most likely is secreted
by the pharynx cells into the lumen and then is found in
secreted form all the way down in the lumen of the gut.
We conclude from this expression pattern that NAS-4 is
associated with digestive functions. Of special interest is
the notion that NAS-4 and the digestive enzyme astacin
from crayfish [8] have a similar domain arrangement, both
lacking a C-terminal extension (see section on Structural
and phylogenetic analysis). They also cluster in the
phylogenetic tree (Fig. 3), suggesting that they have
similar functions. These considerations might be extended
to the whole subgroup I (Fig. 2, NAS-2–6) which shares
these features.
By contrast, NAS-7::GFP staining was observed only in
the head of adult hermaphrodites, but not within pharynx
cells (Fig. 1C). The expressing cells are located outside of
the pharynx, around the metacarpus and the terminal
bulb, and could include neurons, cells of the excretory
system or gland cells of still unknown functions [20].
Reporter gene expression also became detectable in the
embryo before hatching (Fig. 1D). While at this moment
the function of the gene expressed in the adult remains
open, in the embryo it possibly could serve as a hatching
enzyme.
To further characterize the function of astacin genes in
C. elegans we analyzed the genome wide RNAi analysis of
Gonczy et al.[52],Fraseret al.[53],Maedaet al.[54],
Kamath et al. [55,56], Ashrafi et al.[57],Leeet al. [58] and
Pothof et al. [59]. Although nearly all astacin genes have

been investigated for gene silencing by RNAi, most of them
lack of an obvious phenotype and no function could be
deduced from the attempted inactivation. Whether this
phenomenon reflects the dsRNA interference being incom-
plete or a redundancy in functions for the high number of
expressed astacin genes remains to be established. Strong
RNAi phenotypes were observed for NAS-9, -11 and -37
only, revealing these three astacin genes to be essential.
Inactivated NAS-9 showed 6% embryonic lethality [54],
Fig. 1. GFP expression pattern images for NAS-4 (A, B) and NAS-7
(C, D). (A) Extensive GFP fluorescence throughout the digestive tract
in an adult hermaphrodite and a L2 larvae for a NAS-4::GFP fusion
gene; 100 · magnification. (B) Higher magnification of the head of an
adult hermaphrodite showing GFP expression for the same construct
in pharynx cells and in the lumen of the terminal bulb; 400 · magni-
fication. (C) GFP expression of a NAS-7::GFP fusion gene is found in
the head of adult hermaphrodites, but not in pharynx cells or in the
lumen of the digestive tract; 300 · magnification. (D) In embryos
NAS-7::GFP reporter gene fluorescence became detectable just before
hatching; 400 · magnification.
4914 F. Mo
¨
hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003
NAS-11 showed retarded growth [56] and NAS-37 showed
long body deviancy and a molt defect [54,56]. As a rule it
can be stated that all known astacin gene inactivations had
only little, if any, effect. One explanation for this could be
that C. elegans astacins have overlapping functions, which
is also suggested by structural homologies.
Structural and phylogenetic analysis

All known sequence data of astacin-like proteins are derived
from cDNA and genomic sequences, with the exception of
crayfish astacin, which in addition had been completely
sequenced by Edman degradation [5].
The present analysis is based on protein sequences
available from SwissProt, TrEMBL, EMBL, and GenBank
databases. If necessary, open reading frames of DNA
sequences were translated by the HUSAR Package into
amino acid sequences. For C. elegans we used the Gene-
finder or GenieGene predictions corrected by our cDNA
data ( />WebFig1.htm). Altogether, we found over a hundred
complete sequences of astacin-like proteins, which
Fig. 2. Schematic representation of homologues and domain structures in astacin genes in C. elegans. Pre-pro sequences, catalytic domain and
presumably regulatory appendices. Diagram scale is related to amino acid length. Presequences, purple shaded boxes; prosequences, grey oval;
astacin domain, red box; six cysteins, SXC; EGF-like, yellow oval; CUB domains, CUB; thrombospondin-1 like, TSP1; low complexity sequences,
striped boxes; not specified, open boxes.
Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4915
are known at present ( />moehrlen/docs/WebFig2.htm). Considering only the euca-
ryote genomes sequenced completely, in human and mouse
six, and in Drosophila melanogaster 12 astacin genes are
found. However, the tiny 959-cell organism C. elegans
exhibits the striking number of 40 astacin genes, a number
by far not reached in any other organism studied up to now.
With the only exception of the pseudogene NAS-40 all these
genes are expressed and seem to have specific functions.
Therefore, these findings not only allow the study of an
extraordinary divergence of a protein family within one
single organism, but also shed light on a multiple functional
fine modulation evolving from a common structural source.
In the astacins typically three basic structural and

functional moieties can be discerned: a pre-pro portion,
the catalytic astacin chain, and long C-terminal extensions,
which presumably contain messages for proper function
(Fig. 2). Pro-sequences are found in all functional C. elegans
astacins, while presequences (signal peptides) are lacking in
nine genes (Fig. 2). The missing of signal peptides in these
genes may reflect specific intracellular functions of non-
secreted proteins. On the other hand the lack of these signal
peptides could also reflect problems with the still uncon-
firmed 5¢-gene predictions of Genfinder or GenieGene as the
sequencing data produced here have been limited to PCR-
derived fragments, and to the reanalysis of EST and OST
fragments. In some rare cases in other organisms prepro
structures may be lacking completely, often combined
with a N-terminally truncated catalytic domain [Cortunix
cortunix (quail) CAM-1, Swissprot P42326; Drosophila
melanogaster CG6974, TrEMBL Q9VFD6; Hydra vulgaris
FARM-1, TrEMBL Q9U4 · 9], but in C. elegans (with the
exception of the not expressed pseudogene NAS-40) this
feature never could be seen. In the central domain of all
C. elegans astacin genes, the amino acid residues that have
been identified in crayfish astacin as essential for catalytic
activity [6,7,60,61] are preserved without exception. From
this fact it may be concluded that all C. elegans astacins
potentially have catalytic activity, too.
C. elegans astacins typically are characterized by long,
complex C-terminal extensions adjacent to the catalytic
domain, which presumably define time and place of their
activity (Fig. 2). Based on homology criteria within these
appendices CUB-, EGF-, SXC-, and TSP-1 domains can

be discerned, while other sequences must be classified as
Ônon specificÕ or having Ôlow compositional complexityÕ
(LC). LC regions are often Ser/Thr-rich, are found in
many astacins and could serve as sites for O-glycosylation.
EGF domains are epidermal growth factor like modules
(PFAM accession number: PF00008). CUB domains
(SMART accession number: SM0042) are named after
their occurrence in complement components C1r/C1s,
embryonic sea urchin protein Uegf, and BMP-1 [62].
These domains may be involved in calcium-binding and
protein-protein or enzyme–substrate interactions [63]. The
SXC (six-cysteine) motif was observed in several hypo-
thetical C. elegans proteins [64,65] but was originally
described in metridin, a toxin from sea anemone and is
also called ShK toxin domain (SMART accession number:
SM0254). TSP-1-like domains are thrombospondin type 1
repeats (SMART accession number: SM0209) which are
present in several families of metalloproteases namely in
the ADAM-TS proteases (ADAM-TS, a disintegrin-like
and metalloproteinase with thrombospondin type I motifs;
family M12B/C, see Table 1). TSP-1 domains are reported
here for the first time for astacins.
According to the structural differences in their
C-terminal extensions we arranged all 40 C. elegans genes
into the subgroups I–VI (Fig. 2). Subgroup I comprises
five genes with no C-terminal extension (NAS-1), or with
short, unspecific extensions, where probably no specific
signals can be accommodated. Subgroup II exhibits in its
10 genes exclusively the SXC domain, while other domain
types are completely lacking. The SXC domain appears in

a single, double or triple arrangement and the domains
may be attached directly to the catalytic chain or
separated from it and from each other by short, unspecific
sequences. A tandem-like arrangement can only be seen
with these SXC domains, while other domain types are
represented only once in a regulatory chain (for an
exception see subgroup VI). Subgroup III combines 15
genes that typically have an EGF-like domain directly
attached to the catalytic chain, followed by a CUB
domain. In gene NAS-18 the CUB domain and in gene
NAS-21 the EGF-like domain is missing. In subgroup IV
(two genes) a SXC domain and in subgroup V (six genes)
a TSP-1 domain is added to EGF and CUB domains,
Fig. 3. Phylogenetic relationship of the astacins, including all C. elegans
astacin proteins (shaded yellow) and selected examples from other
organisms. The tree was deduced by Bayesian and neighbor-joining
analysis based on the alignment of the amino acid sequences of the
catalytic chain. At branching points, Bayesian posterior probabilities
and bootstrap values greater than 50 of 100 replications (values in
parentheses) and are given as an indication for the confidence of the
tree presented. The scale bar represents a distance of 0.1 accepted point
mutations per site (PAM). Evolutionary subgroups of the astacin
protein family are indicated on the right side. The schematic repre-
sentation of the protein domains (colored bars) corresponds to that in
Fig. 2. Meprin domains: MAM domain, MAM; MATH domain,
MATH; I-domain, I; intervening sequence, inter; transmembrane
domain, TM; cytoplasmic domain, c. For an overview, see [66].
Abbreviations and Swissprot/TREMBL/PIR accession number of the
astacins: AA Astacin, Astacus astacus (crayfish) astacin (P07584); AC
TBL-1, Aplysia californica TBL-1 (P91972); AJ EHE-4, Anguilla

japonica (fish) EHE-4 (Q90Y89); CC Nephrosin, Cyprinus carpio
(fish) Nephrosin (O42326); DM Tolloid and Tolkin, Drosophila
melanogaster Tolloid (P25723) and Tolkin (Q23995); FM Flavast,
Flavobacterium meningosepticum Flavastacin (Q47899); HS BMP-1,
Homo sapiens bone morphogenetic protein 1 (Q14874); HS Meprin A
and B, Homo sapiens Meprin a (Q16819) and b (Q16820); HS TLL and
TLL-2, Homo sapiens Tolloid like 1 (Q9NQS4) and 2 (Q9UQ00); HV
HMP-2, Hydra vulgaris (Cnidaria) Metalloprotease 2 (Q9XZG0);
MM BMP-1, Mus musculus BMP-1 (I49540); MM Meprin A and B,
Mus musculus Meprin a (P28825) and b (Q61847); OL LCE and HCE-
1, Oryzias latipes (fish) low choreolytic enzyme (P31581) and high
choreolytic enzyme 1 (EMBL:M96170); PC PMP-1, Podocoryne car-
nea (Cnidaria) Metalloprotease 1 (O62558PL); PL BP-10, Paracen-
trotus lividus (sea urchin) blastula protease 10 (P42674); SP BMP-H,
Strongylocentrotus purpuratus (sea urchin) BMP-1 homolog (P98069);
SP SPAN, Strongylocentrotus purpuratus (sea urchin) SPAN (P98069);
TR MP, Takifugu rubripes (fish) HCE-1 (AAL40376); XL BMP-1,
Xenopus laevis BMP-1 (P98070).
4916 F. Mo
¨
hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003
which show an identical arrangement as in subgroup III.
Subgroup VI is a special case: the only entry NAS-39
shows a striking similarity to human bone inducing factor
BMP-1. A comparison between both proteins reveals a
sequence identity of the catalytic chains of 74%, while for
other nematode astacins this value reaches on average only
40%. But also xolloid (Xenopus), tolloid and tolkin
(Drosophila) and TBL-1 (Aplysia) have corresponding
structures. The Number and arrangement of CUB- and

EGF-domains are identical in these genes. NAS-39
exceeds in its length by far all other C. elegans genes. It
will be interesting to see what physiological role a factor
almost identical to human BMP-1 might perform in
C. elegans and this could give us also some insight into the
Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4917
primordial functions from which human BMP-1 has
evolved. The distinctive and complex pattern, which
appears in the subgroups I–VI seems to provide a specific
function for each C. elegans astacin gene. Members of the
same subgroup might have similar or identical functions.
We constructed a phylogenetic tree comprising all 39
expressed C. elegans astacins and in addition selected
astacin proteins from a variety of other organisms
(Fig. 3). The tree is based on a multiple alignment of the
amino acid sequence of the active protease domain, covering
the region from Ala1 to Leu200 in the prototype, crayfish
astacin. Results were corrected with help of the known
secondary structures and conserved regions of crayfish
astacin. The alignment has been submitted to EMBL
databank with accession number ALIGN_000543.
Phylogenetic relationships were initially established on
the basis of the neighbor-joining method using the
PHYLIP
program package. As outgroup we used the phylogeneti-
cally most remote flavastacin from bacteria. However, an
isolated occurrence of an astacin sequence in a single
bacteria species could be due to a lateral gene transfer,
which would render this sequence unsuitable as an out-
group. Because recently at least one more astacin-like

protein has been detected in bacteria (-
heidelberg.de/moehrlen), lateral gene transfer is most
unlikely. Moreover, we also tried the phylogenetically
remote Cnidaria astacins (HMP-2 and PMP-1) as an
outgroup, which gave exactly the same phylogenetic tree.
For statistical verification a consensus tree including 100
sequences was calculated and bootstrap values were estab-
lished for each point of divergence. However, the phylo-
genetic tree based on the neighbor-joining method showed
rather low bootstrap values (< 50) for the most ancestral
nodes (Fig. 3). Pro sequences could not be used additionally
to strengthen these branching points because they are
differing extremely in length, are changing rapidly or are
lacking completely. A similar consideration can be made for
the C-terminal extensions. The robustness of the tree was
therefore verified additionally by the Bayesian phylogenetic
method. With this study the confidence of the tree
significantly increased and resulted in high posterior prob-
abilities. The evolutionary tree now presented in Fig. 3
summarizes all above-mentioned approaches and exhibits
therefore the best reliability.
From this analysis it becomes evident that similar
sequences of the catalytic chain tend to have similar
C-terminal extensions (Fig. 3). All 39 complete NAS
proteins can be subdivided into two different types: one
having CUB domains in their regulatory domains, and
another one where these are lacking completely (see also
Fig. 2). This pattern is clearly reflected in the amino acid
sequence based phylogenetic tree, where all NAS proteins
exhibiting a CUB domain come closely together in one

cluster (Fig. 3). The CUB domain is almost always preceded
by an EGF domain (exception NAS-21). To these either no
further segments are attached (subgroup III), or a SXC
domain (subgroup IV) or a TSP-1 domain (subgroup V)
might follow. The second cluster comprises the NAS-1 to
NAS-15 proteins, characterized by having no distinct
extensions (subgroup I) or showing one, two or three
SXC domains (subgroup II). NAS-39 (subgroup VI) is
strikingly different from all other C. elegans astacins, but
can perfectly be inserted into the BMP-1/Tolloid-group,
likewise on the basis of the sequence homologies or the
complex, but identical arrangement of the 5 CUB- and the 2
EGF-segments (Figs 2 and 3).
One might wonder about the expression of such large a
number of related, but different astacin genes in a 959-cell
organism. Potentially all these genes could have different
functions, showing in each case at least clear, in some cases
marked structural differences. However, much of this diver-
gence seems to be due to relatively recent gene duplications.
In the closely related species Caenorhabditis briggsae the
genes NAS-16, -18, -19, -22, -24 and the pseudo-gene
NAS-40 are missing. C. elegans and C. briggsae share,
however, the neighboring genes NAS-17, -20, and -21. In
addition, these genes show a tandem-like arrangement in
clusters and are all located on chromosome V, where NAS-
16, -17, -18, -19 form one cluster, and separated by different
other genes a second cluster comprising NAS-20, -21, -22
can be found. These notions are also supported by the
position of these genes in the evolutionary tree (see Table 2,
and Figs 2 and 3). It therefore seems reasonable to assume

that these genes comprising one half of subgroup III resulted
from recent gene duplications, which implies that they might
have more or less similar functions. If one extends this kind
of reasoning with some caution to the whole of the analyzed
C. elegans astacins one could conclude that only the six
established subgroups actually represent major functional
differences, as these are based on marked differences in their
regulatory units. This would reduce the number of func-
tionally different gene types to six, a number that comes
close to that found for astacins also in other organisms.
Nevertheless, the fact remains that each NAS gene is
expressed and structurally distinct from the others. This
constitutes a favorable starting point for the rapid acquisi-
tion of new functions, a capacity, which might be a
prerequisite for the ubiquous occurrence of C. elegans in
nearly all soil types. However, most NAS genes are dispersed
over all six chromosomes of C. elegans, which indicates a
long evolutionary history of the astacin protein family in the
nematodes. The identical and complex arrangement of the
seven regulatory domains in NAS-39 and BMP-1 suggests
furthermore that this distinct structure has been retained
unchanged for long periods and was already present in the
common ancestor of nematodes and vertebrates.
Acknowledgements
This study was supported by a grant from the Deutsche Forschungsg-
emeinschaft, Bonn, to RZ (Zw 17/14–2). We also wish to thank
Thorsten Burmester, University of Mainz, Germany for supporting the
Bayesian phylogenetic analysis.
References
1. Pfleiderer, G., Zwilling, R. & Sonneborn, H.H. (1967) On the

evolution of endopeptidases, 3: a protease of molecular weight
11,000 and a trypsin-like fraction from Astacus fluviatilis fabr.
Hoppe Seylers. Z. Physiol. Chem. 348, 1319–1331.
2. Sonneborn, H.H., Zwilling, R. & Pfleiderer, G. (1969) Evolution
of endopeptidases. X. Cleavage specificity of low molecular weight
protease from Astacus leptodactylus Esch. Hoppe Seylers. Z.
Physiol. Chem. 350, 1097–1102.
4918 F. Mo
¨
hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003
3. Krauhs, E., Do
¨
rsam, H., Little, M., Zwilling, R. & Ponstingl, H.
(1982) A protease from Astacus fluviatilis as an aid in protein
sequencing. Anal. Biochem. 119, 153–157.
4. Zwilling, R., Do
¨
rsam, H., Torff. H J. & Ro
¨
dl, J. (1981) Low
molecular mass protease: evidence for a new family of proteolytic
enzymes. FEBS Lett. 127, 75–78.
5. Titani, K., Torff, H.J., Hormel, S., Kumar, S., Walsh, K.A., Rodl,
J., Neurath, H. & Zwilling, R. (1987) Amino acid sequence of a
unique protease from the crayfish Astacus fluviatilis. Biochemistry
26, 222–226.
6. Bode, W., Gomis-Ru
¨
th, F., Huber, R., Zwilling, R. & Sto
¨

cker, W.
(1992) Structure of astacin and implications for activation of as-
tacins and zinc-ligation of collagenases. Nature 358, 164–167.
7. Gomis-Ru
¨
th, F., Sto
¨
cker, W., Huber, R., Zwilling, R. & Bode, W.
(1993) Refined 1.8 A
˚
X-ray crystal structure of astacin, a zinc-
endopeptidase from the crayfish Astacus astacus L. Structure
determination, refinement, molecular structure and comparison
with thermolysin. J. Mol. Biol. 229, 945–968.
8. Vogt, G., Sto
¨
cker, W., Storch, V. & Zwilling, R. (1989) Bio-
synthesis of Astacus protease, a digestive enzyme from crayfish.
Histochemistry 91, 373–381.
9. Geier, G., Jacob, E., Sto
¨
cker, W. & Zwilling, R. (1997) Genomic
organization of the zinc-endopeptidase astacin. Arch. Biochem.
Biophys. 337, 300–307.
10. Mo
¨
hrlen,F.,Baus,S.,Gruber,A.,Rackwitz,H.R.,Schno
¨
lzer, M.,
Vogt, G. & Zwilling, R. (2001) Activation of pro-astacin:

immunological and model peptide studies on the processing of
immature astacin, a zinc-endopeptidase from the crayfish Astacus
astacus. Eur. J. Biochem. 268, 2540–2546.
11. Yiallouros, I., Kappelhoff, R., Schilling, O., Wegmann, F.,
Helms,M.W.,Auge,A.,Brachtendorf,G.,Berkhoff,E.G.,Be-
ermann, B., Hinz, H.J., Konig, S., Peter-Katalinic, J. & Stocker,
W. (2002) Activation mechanism of pro-astacin. role of the pro-
peptide, tryptic and autoproteolytic cleavage and importance of
precise amino-terminal processing. J. Mol. Biol. 324, 237–246.
12. Wozney, J.M., Rosen, V., Celeste, A.J., Mitsock, L.M., Whitters,
M.J., Kriz, R.W., Hewick, R.M. & Wang, E.A. (1988) Novel
regulators of bone formation: molecular clones and activities.
Science 242, 1528–1534.
13. Wang,E.A.,Rosen,V.,Cordes,P.,Hewick,R.M.,Kriz,M.J.,
Luxenberg, D.P., Sibley, B.S. & Wozney, J.M. (1988) Purification
and characterization of other distinct bone-inducing factors. Proc.
NatlAcad.Sci.USA85, 9484–9488.
14. Tarentino, A.L., Quinones, G., Grimwood, B.G., Hauer, C.R. &
Plummer, T.H. J. (1995) Molecular cloning and sequence analysis
of flavastacin: an O-glycosylated prokaryotic zinc metalloendo-
peptidase. Arch. Biochem. Biophys. 319, 281–285.
15. Zwilling, R. & Sto
¨
cker, W. (1997) The Astacins: Structure and
Function of a New Protein Family. Dr Kovac Verlag, Hamburg,
Germany.
16. Geier, G. & Zwilling, R. (1998) Cloning and characterization of a
cDNA coding for Astacus embryonic astacin, a member of the
astacin family of metalloproteases from the crayfish Astacus
astacus. Eur. J. Biochem. 253, 796–803.

17. Hutter, H., Vogel, B.E., Plenefisch, J.D., Norris, C.R., Proenca,
R.B.,Spieth,J.,Guo,C.,Mastwal,S.,Zhu,X.,Scheel,J.&
Hedgecock, E.M. (2000) Conservation and novelty in the evolu-
tion of cell adhesion and extracellular matrix genes. Science 287,
989–994.
18. Sarkis, G.J., Kurpiewski, M.R., Ashcom, J.D., Jen-Jacobson, L.
& Jacobson, L.A. (1988) Proteases of the nematode
Caenorhabditis elegans. Arch. Biochem. Biophys. 261, 80–90.
19. Geier, G., Banaj, H.J., Heid, H., Bini, L., Pallini, V. & Zwilling, R.
(1999) Aspartyl proteases in Caenorhabditis elegans.Iso-
lation, identification and characterization by a combined use of
affinity chromatography, two-dimensional gel electrophoresis,
microsequencing and databank analysis. Eur. J. Biochem. 264,
872–879.
20. Wood, W.B., (ed.) (1988) The Nematode Caenorhabditis elegans.
Cold Spring Harbor Laboratory, New York.
21. Chomczynski, P. & Sacchi, N. (1987) Single-step method of RNA
isolation by acid guanidinium thiocyanate-phenol-chloroform
extraction. Anal. Biochem. 162, 156–159.
22. Moss, E. (1995) Making genomic DNA from worms: C. elegans
Comprehensive Protocol Collection of the Ambros Laboratory at
Dartmouth. />23. Sambrook, J. & Russell, D.W. (2001) Molecular Cloning: A
Laboratory Manual. Cold Spring Harbor Laboratory Press, New
York.
24. Anonymous (2003) WormBase, release WS 97, 7 March 2003;

25. Hope, I.A. (1999) C. Elegans. Oxford University Press, New York.
26. PFAM database, version 8. />Pfam.
27. The PRINTS Fingerprint Database, release 35.0; http://www.
bioinf.man.ac.uk/dbbrowser/index/biblio.html.

28. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J.
(1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–
410.
29.Altschul,S.F.&Gish,W.(1996)Localalignmentstatistics.
Methods Enzymol. 266, 460–480.
30. HUSAR package, version, 4.1; .
31. Kent,W.J.&Zahler,A.M.(2000)Theintronerator:exploring
introns and alternative splicing in Caenorhabditis elegans. Nucleic
Acids Res. 28, 91–93.
32. Higgins, D.G., Thompson, J.D. & Gibson, T.J. (1996) Using
CLUSTAL for multiple sequence alignments. Methods Enzymol.
266, 383–402.
33. SMART, Version 3.5, 21 March 2001. l-heidel
berg.de.
34. ProDom, release.1/CG67; />html.
35. InterPro database, release 6.0, 11 March 2003. .
ac.uk/interpro.
36. GeneDoc, version 2.6; />37.
PHYLIP
package 3.5; />phylip.html.
38. Huelsenbeck, J.P., Larget, B., Miller, R.E. & Ronquist, F. (2002)
Potential applications and pitfalls of Bayesian inference of phy-
logeny. Syst. Biol. 51, 673–688.
39. Huelsenbeck, J.P., Ronquist, F., Nielsen, R. & Bollback, J.P.
(2001) Bayesian inference of phylogeny and its impact on evolu-
tionary biology. Science 294, 2310–2314.
40. Huelsenbeck, J.P. & Ronquist, F. (2001) MRBAYES: Bayesian
inference of phylogenetic trees. Bioinformatics 17, 754–755.
41. Whelan, S. & Goldman, N. (2001) A general empirical model of
protein evolution derived from multiple protein families using a

maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699.
42. TreeView 1.6.6; />43. C. elegans Sequencing Consortium (1998) The C. elegans
Sequencing Consortium genome sequence of the nematode
C. elegans: a platform for investigating biology. Science 282, 2012–
2018.
44. Barrett, A.J., Rawlings, N.D. & Woessner, J.F. (1998) Handbook
of Proteolytic Enzymes. Academic Press, London.
45. Hishida, R., Ishihara, T., Kondo, K. & Katsura, I. (1996) hch-1, a
gene required for normal hatching and normal migration of a
neuroblast in C. elegans, encodes a protein related to TOLLOID
and BMP-1. EMBO J. 15, 4111–4122.
46. Reboul, J., Vaglio, P., Tzellas, N., Thierry-Mieg, N., Moore, T.,
Jackson, C., Shin-i.T., Kohara, Y., Thierry-Mieg, D., Thierry-
Mieg, J., Lee, H., Hitti, J., Doucette-Stamm, L., Hartley, J.L.,
Ó FEBS 2003 Astacin protein family in C. elegans (Eur. J. Biochem. 270) 4919
Temple, G.F., Brasch, M.A., Vandenhaute, J., Lamesch, P.E.,
Hill, D.E. & Vidal, M. (2001) Open-reading-frame sequence tags
(OSTs) support the existence of at least 17,300 genes in C. elegans.
Nat. Genet. 27, 332–336.
47. Vaglio, P., Lamesch, P., Reboul, J., Rual, J.F., Martinez, M., Hill,
D. & Vidal, M. (2003) WorfDB: the Caenorhabditis elegans OR-
Feome Database. Nucleic Acids Res. 31, 237–240.
48. Hill, A.A., Hunter, C.P., Tsung, B.T., Tucker-Kellogg, G. &
Brown, E.L. (2000) Genomic analysis of gene expression in
C. elegans. Science 290, 809–812.
49. Baugh, L.R., Hill, A.A., Slonim, D.K., Brown, E.L. & Hunter, C.P.
(2003) Composition and dynamics of the Caenorhabditis elegans
early embryonic transcriptome. Development 130, 889–900.
50. Kim,S.K.,Lund,J.,Kiraly,M.,Duke,K.,Jiang,M.,Stuart,J.M.,
Eizinger, A., Wylie, B.N. & Davidson, G.S. (2001) A gene expres-

sion map for Caenorhabditis elegans. Science 293, 2087–2092.
51. Jiang, M., Ryu, J., Kiraly, M., Duke, K., Reinke, V. & Kim, S.K.
(2001) Genome-wide analysis of developmental and sex-regulated
gene expression profiles in Caenorhabditis elegans. Proc. Natl
Acad. Sci. USA 98, 218–223.
52. Gonczy, P., Echeverri, C., Oegema, K., Coulson, A., Jones, S.J.,
Copley, R.R., Duperon, J., Oegema, J., Brehm, M., Cassin, E.,
Hannak,E.,Kirkham,M.,Pichler,S.,Flohrs,K.,Goessen,A.,
Leidel,S.,Alleaume,A.M.,Martin,C.,Ozlu,N.,Bork,P.&
Hyman, A.A. (2000) Functional genomic analysis of cell division
in C. elegans using RNAi of genes on chromosome III. Nature 408,
331–336.
53. Fraser, A.G., Kamath, R.S., Zipperlen, P., Martinez-Campos, M.,
Sohrmann, M. & Ahringer, J. (2000) Functional genomic analysis
of C. elegans chromosome I by systematic RNA interference.
Nature 408, 325–330.
54. Maeda, I., Kohara, Y., Yamamoto, M. & Sugimoto, A. (2001)
Large-scale analysis of gene function in Caenorhabditis elegans by
high-throughput RNAi. Curr. Biol. 11, 171–176.
55. Kamath,R.S.,Martinez-Campos,M.,Zipperlen,P.,Fraser,A.G.
& Ahringer, J. (2001) Effectiveness of specific RNA-mediated
interference through ingested double-stranded RNA in
Caenorhabditis elegans. Genome Biol. 2, RESEARCH0002.
56. Kamath, R.S., Fraser, A.G., Dong, Y., Poulin, G., Durbin, R.,
Gotta, M., Kanapin, A., Le Bot., N., Moreno, S., Sohrmann, M.,
Welchman, D.P., Zipperlen, P. & Ahringer, J. (2003) Systematic
functional analysis of the Caenorhabditis elegans genome using
RNAi. Nature 421, 231–237.
57. Ashrafi, K., Chang, F.Y., Watts, J.L., Fraser, A.G., Kamath,
R.S., Ahringer, J. & Ruvkun, G. (2003) Genome-wide RNAi

analysis of Caenorhabditis elegans fat regulatory genes. Nature
421, 268–272.
58. Lee, S.S., Lee, R.Y., Fraser, A.G., Kamath, R.S., Ahringer, J. &
Ruvkun, G. (2003) A systematic RNAi screen identifies a critical
role for mitochondria in C. elegans longevity. Nat. Genet. 33,40–
48.
59. Pothof, J., Van Haaften, G., Thijssen, K., Kamath, R.S., Fraser,
A.G., Ahringer, J., Plasterk, R.H. & Tijsterman, M. (2003)
Identification of genes that protect the C. elegans genome against
mutations by genome-wide RNAi. Genes Dev. 17, 443–448.
60. Sto
¨
cker, W., Gomis-Ru
¨
th, F., Bode, W. & Zwilling, R. (1993)
Implications of the three-dimensional structure of astacin for the
structure and function of the astacin family of zinc-
endopeptidases. Eur. J. Biochem. 214, 215–231.
61. Yiallouros,I.,Grosse-Berkhoff,E.&Sto
¨
cker, W. (2000) The roles
of Glu93 and Tyr149 in astacin-like zinc peptidases. FEBS Lett.
484, 224–228.
62. Bork, P. & Beckmann, G. (1993) The CUB domain: a widespread
module in developmentally regulated proteins. J. Mol. Biol. 231,
539–545.
63. Sieron, A.L., Tretiakova, A., Jameson, B.A., Segall, M.L., Lund,
K.S., Khan, M.T., Li, S. & Sto
¨
cker, W. (2000) Structure and

function of procollagen C-proteinase (mTolloid) domains
determined by protease digestion, circular dichroism, binding to
procollagen type I, and computer modeling. Biochemistry 39,
3231–3239.
64.Gems,D.,Ferguson,C.J.,Robertson,B.D.,Nieves,R.,Page,
A.P., Blaxter, M.L. & Maizels, R.M. (1995) An abundant,
trans-spliced mRNA from Toxocara canis infective larvae encodes
a 26-kDa protein with homology to phosphatidylethanolamine-
binding proteins. J. Biol. Chem. 270, 18517–18522.
65. Blaxter, M. (1998) Caenorhabditis elegans is a nematode. Science
282, 2041–2046.
66. Wolz, R.L. & Bond, J.S. (1995) Meprins A and B. Methods
Enzymol. 248, 325–345.
4920 F. Mo
¨
hrlen et al. (Eur. J. Biochem. 270) Ó FEBS 2003

×