Tải bản đầy đủ (.pdf) (13 trang)

Báo cáo khoa học: In silico analysis of the adenylation domains of the freestanding enzymes belonging to the eucaryotic nonribosomal peptide synthetase-like family pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.35 MB, 13 trang )

In silico analysis of the adenylation domains of the
freestanding enzymes belonging to the eucaryotic
nonribosomal peptide synthetase-like family
Leonardo Di Vincenzo1, Ingeborg Grgurina1 and Stefano Pascarella1,2
`
1 Dipartimento di Scienze Biochimiche ‘A. Rossi Fanelli’, Universita di Roma ‘La Sapienza’, Roma, Italy
`
2 Centro Interdipartimentale di Ricerca per l¢ Analisi dei Modelli e dell’Informazione nei Sistemi Biomedici (CISB), Universita di Roma
‘La Sapienza’, Roma, Italy

Keywords
nonribosomal peptide synthetase; homology
modelling; docking; specifity conferring
code; freestanding NRPSs
Correspondence
S. Pascarella, Dipartimento di Scienze
`
Biochimiche ‘A. Rossi Fanelli’, Universita di
Roma ‘La Sapienza’, 00185 Roma, Italy
Fax: +39 06 49917566
Tel: +39 06 49917574
E-mail:
Website: />homein.html
(Received 8 August 2004, revised 30
November 2004, accepted 9 December
2004)

This work presents a computational analysis of the molecular characteristics shared by the adenylation domains from traditional nonribosomal peptide synthetases (NRPSs) and the group of the freestanding homologous
enzymes: a-aminoadipate semialdehyde dehydrogenase, a-aminoadipate
reductase and the protein Ebony. The results of systematic sequence comparisons allow us to conclude that a specificity-conferring code, similar to
that described for the NRPSs, can be recognized in such enzymes. The


structural and functional roles of the residues involved in the substrate
selection and binding are proposed through the analysis of the predicted
interactions of the model active sites and their respective substrates. The
indications deriving from this study can be useful for the programming of
experiments aimed at a better characterization and at the engineering of
this emerging group of single NRPS modules that are responsible for
amino acid selection, activation and modification in the absence of other
NRPS assembly line components.

doi:10.1111/j.1742-4658.2004.04522.x

Nonribosomal peptide synthetases (NRPSs) are multidomain, multifunctional enzymes involved in the biosynthesis of many bioactive microbial peptides [1,2].
This class of natural products includes a variety of
compounds with interesting biological activities (phytotoxins, siderophores, biosurfactants, and antiviral
agents), as well as several clinically valuable drugs
[3,4]. NRPSs are organized in iterative modules, one
for each amino acid to be built into the peptide product. The minimal module required for a single monomer addition consists of a condensation domain (C),
an adenylation domain (A) and a peptidyl carrier protein (PCP) also denoted as thiolation (T) domain. The

A domain is involved in the selection and activation of
the amino acid substrate, which is then covalently
attached to the enzyme via a thioester bond with
the phosphopantetheine residue of the T domain. C
domains are localized between every consecutive pair
of A domains and PCPs and catalyze the formation of
the peptide bond between the upstream amino acyl or
peptidyl moiety tethered to the phosphopantetheinyl
group and the free amino group of the downstream
aminoacyl moiety, thus facilitating the translocation
of the growing chain onto the next module. The

structural diversity of NRPS products is enriched
through the occasional presence of epimerization (E),

Abbreviations
A domain, adenylation domain; AASDH, a-aminoadipate semialdehyde dehydrogenase; ACV, synthetase [L-d-(a-aminoadipoyl)-L-cysteine-Dvaline] synthetase; AS, putative amine-selecting domain; GrsA, gramicidin S synthetase A; HMM, hidden Markov models; NRPS,
nonribosomal peptide synthetase; PQQ, pyrroloquinoline quinone; T domain, thiolation domain; RMSD, root mean square deviation.

FEBS Journal 272 (2005) 929–941 ª 2005 FEBS

929


Adenylation domains of freestanding NRPS enzymes

cyclization (Cy), N-Methylation (N-Met) and oxidation (Ox) domains [1].
Intense research work carried out in the last decade
led to the characterization of a number of new gene
clusters and to the discovery of nonclassical NRPS systems [2,5]. The crystallographic structure of three
members of the adenylate-forming enzyme family, firefly luciferase of Photinus pyralis [6], the A domain of
the gramicidin S synthetase A (GrsA) from Bacillus
brevis [7] and, recently, DhbE (2,3-dihydroxy-benzoate
activating module) [8], have been solved. Likewise, the
structure of VibH, representative of C domains, is now
available [9]. The wealth of sequence and structure
information pertaining to the A domains has been
exploited to understand the molecular bases of their
substrate specificity [10,11]. Systematic comparative
analyses identified 10 sequence positions lining the active site pocket that are responsible for substrate recognition and selection. The nature of the residues at such
positions was correlated with the known substrates
and a specificity-conferring code was proposed also

with predictive potential [10].
Recently, it was pointed out that modules composed
of an adenylation and a thiolation domain, followed
by a domain having a redox function and not inserted
in the context of a typical NRPS cluster, can be found
in eucaryotes [12]. Indeed, a-aminoadipate semialdehyde dehydrogenase (AASDH) and a-aminoadipate
reductase (Lys2), enzymes involved in lysine metabolism in eucaryotes, display a 3-domain architecture
where the two N-terminal domains are homologous
to the A and T domains from NRPS systems and the
C-terminal part contains a redox cofactor binding
site for either pyrroloquinoline quinone (PQQ) or
NADPH. In particular, AASDH, containing a PQQ
binding domain, is supposed to be involved in lysine
degradation and to convert the a-aminoadipate semialdehyde to a-aminoadipate [12]. Lys2, possessing a
NADPH-binding domain, is involved in lysine biosynthesis; it converts the a-aminoadipate to a-aminoadipate semialdehyde [13,14]. Furthermore, the protein
Ebony, an enzyme from Drosophila melanogaster
involved in conjugation of b-alanine to histamine and
sharing homology to NRPS domains A and T, was
recently characterized [15].
The occurrence of gene assets, typically encountered
in the microbial world, in evolutionarily higher organisms is intriguing. It appears worthwhile to carry out a
deeper investigation on the extent of similarity between
the A domains of the aminoacyl adenylate-forming
enzymes of the freestanding enzymes and those of the
traditional NRPS systems. In particular, how many
sequences of freestanding A domains are known and
930

L. Di Vincenzo et al.


which are the evolutionary relationships to the
NRPSs? Can the nonribosomal code of the traditional
NRPS systems be applied in the freestanding A
domains and, if so, what is the potential role of the
residues involved? To address these issues, systematic
sequence comparisons, homology modelling and docking simulations were employed to predict the structure
of the active site of such enzymes and to propose functional roles for the conserved residues.

Results and Discussion
Databank searches and sequence comparison
The available sequences of the freestanding NRPS
modules from eucaryotic organisms were collected by
means of exhaustive databank searches. The psi-blast
[16] suite was applied over the NR and UniProt databanks. Query sequences were Ebony from Drosophila
melanogaster, AASDH from Mus musculus and Lys2
from yeast. Each sequence is representative of a
domain pattern: A-T-AS (AS stands for putative amine-selecting domain [15]), A-T-PQQ and
A-T-NADPH, respectively. Only the A and T domains
were included in the query sequence. Table 1 reports
the homologous sequences collected by these databank
searches including 10 sequences from genes coding for
putative A domains not yet annotated in the protein
databanks which were predicted through genome
scans. Overall, 39 sequences were identified from different eucaryotic species and the domain assignments
were confirmed by CDD [17] and Pfam [18] queries.
The sequence subset formed by the A-T domains was
aligned utilizing the hmmer package [19]. A set of 62
sequences, corresponding to the domain A of microbial NRPSs, were extracted from the seed alignment of
the Pfam AMP-binding family (Pfam code: PF00501).
The adjacent T domain was subsequently added to

each sequence. The extended sequences were aligned
with clustalw [20] and the final alignment was manually refined to match functionally important residues
such as the Asp235 that binds the a-amino group of
the substrate [11] and Ser573 site of the phosphopantetheine attachment. The resulting alignment was finally
utilized to train the HMM. The resulting HMM was
used to align a subset of the A-T domains listed in
Table 1. The alignment was manually refined and used
in turn to train the final HMM, now specific for the
eucaryotic A-T domains, to carry out the alignment of
all the 39 sequences (Fig. 1).
On the basis of the structural equivalencies contained in this multiple alignment, the occurrence of a
specificity-conferring code similar to that described for
FEBS Journal 272 (2005) 929–941 ª 2005 FEBS


L. Di Vincenzo et al.

Adenylation domains of freestanding NRPS enzymes

Table 1. List of freestanding and NRPS-like enzymes retrieved from databanks. All accession numbers refer to UniProt database except
where noted. Boldface names denote in silico predicted proteins not included in databanks. A stands for adenylation domain, T for thiolation,
C for condensation, AS for amine-selecting, PQQ for PQQ-binding domain, NADPH for NADPH-binding domain, X for other domains not
commonly present in NRPSs. Numeric subscript to parentheses indicate the repetition of those modules. MNRPS stands for monomodular
NRPS. Question marks denote unassigned function. Every PSI-BLAST search was performed with three iterations using as a probe the
sequence of GrsA (P14687).

Source

Domains


Protein
length

Putative
function

BLAST

Accession number
P14687
P07702
EAA62703a
Q75BB3
Q6SV64
EAA73900a
S_MIKATAEc,f
S_CASTELLIIc,g
Q8NJ21
Q9P3Y3
P40976
Q12572
O74298
Q9P3Q7
Q9HDP9
Q873Z1
Q8J0L6
Q8NJX1
O_SATIVA1d,h
O_SATIVA2d,i
O_SATIVA3d,j

Q8L5Z8
Q95Q02
Q9XUJ4
Q17301
ENSCBRP00000001007b
O76858
Q9VLL0
Q7QKF0
Q7Q0Y5
Q80WC9
ENSRNOP00000002907b
Q7Z5Y3
D_RERIOe,k
F_RUBRIPESe,l
C_INTESTINALISe,m
C_SAVIGNYe,n
P_TROGLODITESe,o
ENSGALP00000022317b

Bacillus brevis
Saccharomyces cerevisiae
Aspergillus nidulans
Eremothecium gossypii
Cryptococcus neoformans
Gibberella zeae
Saccharomyces mikatae
Saccharomyces castellii
Kluyveromyces lactis
Pichia farinosa
Schizosaccharomyces pombe

Candida albicans
Penicillium chrysogenum
Neurospora crassa
Acremonium chrysogenum
Leptosphaeria maculans
Claviceps purpurea
Hypocrea virens
Oryza sativa
Oryza sativa
Oryza sativa
Arabidopsis thaliana
Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis briggsae
Caenorhabditis briggsae
Drosophila melanogaster
Drosophila melanogaster
Anopheles gambiae
Anopheles gambiae
Mus musculus
Rattus norvegicus
Homo sapiens
Danio rerio
Takifugu rubripes
Ciona intestinalis
Ciona savigny
Pan troglodites
Gallus gallus

A-T-C

A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP
A-T-NADP-NADP
A-T-C
X-(C-A-T)18-NADP
A-T-C
A-T-C
A-T-PQQ
A-T-PQQ
C-A-T-C-T-C-A-T
A-T-PQQ
X-X-X-.-C-A-T
A-T-PQQ
A-T-AS
A-T-PQQ
A-T-AS
A-T-PQQ
A-T-PQQ

A-T-PQQ
A-T-PQQ
A-T-PQQ
A-T-PQQ
A-T-PQQ
A-T-PQQ
A-T-PQQ
A-T-PQQ

1098
1392
1421
1385
1359
1042
1384
1386
1384
1398
1419
1391
1409
1174
1196
1282
1308
20925
1225
995
1285

1040
2870
707
4767
866
879
1012
881
824
1100
1152
998
1003
1088
1074
1167
1343
1125

NRPS
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2

Lys2
Lys2
Lys2
Lys2
MNRPS
MNRPS
NRPS
NRPS
NRPS
?
?
NRPS
AASDH
NRPS
AASDH
Ebony
AASDH
Ebony
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH

0

e-145
e-135
e-126
e-135
e-105
0.0
0.0
e-141
e-143
e-138
e-134
e-133
e-133
e-129
e-118
0.0
0.0
5e-41
3e-33
2e-51
3e-84
3e-83
4e-59
3e-66
4.8e-28
e-114
1e-57
3e-94
8e-55
1e-85

2e-78
9e-73
4.6e-131
3.1e-128
1e-75
1e-74
7.4e-221
1e-76

E-value

a

EMBLCDS entry name and b EnsEMBL peptide databank. Boldface names denote in silico predicted proteins not included in databanks.
denote that the TBLASTN searches against genomes used as input query sequences, P07702, Q8L5Z8, Q80WC9, respectively. The genes
were predicted from the nucleotide sequences: f EMBL accession number AABZ01000259 (positions coding for the protein: 3300–7700),
g
EMBL AACF01000123 (15700–20800), h EMBL AAAA01021459 (854–2040), i EMBL AAAA01023971 (610–2070), j EMBL AAAA01000789
(18780–31200), k EnsEMBL ctg11952 (800001–1000000), l EnsEMBL Chr_scaffold_632 (38782–48782), m EMBL AABS01000029, n EMBL
AACT01000010, o EnsEMBL scaffold_37623 (4535897–4735897).
c,d,e

the NRPS systems [10] was tested. Substrate specificities were assigned either on the basis of literature
data or by use of the NRPS prediction server [11]. The
FEBS Journal 272 (2005) 929–941 ª 2005 FEBS

sequence positions equivalent, in the multiple sequence alignment, to those involved in the described
nonribosomal specificity code [10] are reported in
931



Adenylation domains of freestanding NRPS enzymes

L. Di Vincenzo et al.

Fig. 1. Multiple sequence alignment of adenylation domains. Only conserved portions from the multiple sequence alignment obtained as
described in Results are shown. Dashes represent insertion and deletion. Numbers above the sequences refer to the sequence numbering
of the gramicidin synthetase;  is used as block separator. The sequence positions equivalent to those involved in the nonribosomal specificity-conferring code described for the A domain of the gramicidin synthetase are marked with blue triangles. The positions of the core
motifs are marked underneath with grey bars labelled according to [1]. Secondary structure assignments are shown for GrsA: a-helices and
b-strands are rendered as squiggles and arrows, respectively; T stands for turn; blank for coil and irregular conformations; dots represent
gaps introduced in the alignment. Identically conserved residues are displayed as white characters on red background. Conserved regions
are denoted by boxed red characters.

932

FEBS Journal 272 (2005) 929–941 ª 2005 FEBS


L. Di Vincenzo et al.

Adenylation domains of freestanding NRPS enzymes

Fig. 1. (Continued).

FEBS Journal 272 (2005) 929–941 ª 2005 FEBS

933


Adenylation domains of freestanding NRPS enzymes


L. Di Vincenzo et al.

Table 2. Nonribosomal specifity-conferring code in the freestanding enzymes. All accession numbers refer to UniProt database except
those noted. Boldface codes denote in silico predicted protein not included in databanks. MNRPS is monomodular NRPS; ACV stands for
(L-d-(a-aminoadipoyl)-cysteine-D-valine) tripeptide synthetase. Question marks denote unassigned function or substrate. L-a-Aa stands for
L-a-aminoadipate, L-a-Aas stands for L-a-aminoadipate semialdehyde, b-Ala for b-alanine, Hty for hydroxyl tyrosine, the other three letters
code stand for standard amino acid abbreviations.
Residue position according to GrsA A domain
numbering
Protein ID

Source

Function

Activated
substrate

P14687
Q873Z1
Q8J0L6
Q8NJX1
O_SATIVA1
O_SATIVA2
O_SATIVA3
Q8L5Z8
Q95Q02
Q95Q02.2
Q17301

O76858
Q7QKF0
P07702
EAA62703a
Q75BB3
Q6SV64
EAA73900a
S_MIKATAE
S_CASTELLII
Q8NJ21
Q9P3Y3
P40976
Q12572
O74298
Q9P3Q7
Q9HDP9
Q9XUJ4
ENSCBRP00000001007b
Q9VLL0
Q7Q0Y5
Q80WC9
ENSRNOP00000002907b
Q7Z5Y3
F_RUBRIPES
D_RERIO
C_INTESTINALIS
C_SAVIGNY
ENSGALP00000022317b
P_TROGLODITES
P26046 M1c


Bacillus brevis
Leptosphaeria maculans
Claviceps purpurea
Hypocrea virens
Oryza sativa
Oryza sativa
Oryza sativa
Arabidopsis thaliana
Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis briggsae
Drosophila melanogaster
Anopheles gambiae
Saccharomyces cerevisiae
Aspergillus nidulans
Eremothecium gossypii
Cryptococcus neoformans
Gibberella zeae
Saccharomyces mikatae
Saccharomyces castellii
Kluyveromyces lactis
Pichia farinosa
Schizosaccharomyces pombe
Candida albicans
Penicillium chrysogenum
Neurospora crassa
Acremonium chrysogenum
Caenorhabditis elegans
Caenorhabditis briggsae

Drosophila melanogaster
Anopheles gambiae
Mus musculus
Rattus norvegicus
Homo sapiens
Takifugu rubripes
Danio rerio
Ciona intestinalis
Ciona savigny
Gallus gallus
Pan troglodites
Penicillium chrysogenum

NRPS
MNRPS
MNRPS
NRPS
NRPS
NRPS
?
?
NRPS
NRPS
NRPS
Ebony
Ebony
Lys2
Lys2
Lys2
Lys2

Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
Lys2
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
AASDH
ACV

Phe
Thrd
Leud
Phed
Thrd

Leu ⁄ Ile ⁄ Vald
?
?
Leud
?
Htyd
b-Ala
b-Ala
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa

L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa
L-a-Aa

235

236

239

278

299

301

322

330

331

D

A
V
G

V
L
H
H
I
N
V
D
D
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P

P
P
P
P
P
P
P

T
L
S
M
N
L
N
L
Y
L
F
V
V
H
H
H
H
H
H
H
H
H

H
H
H
H
H
Q
Q
Q
Q
Q
Q
Q
Q
M
Q
F
Q
Q
N

I
W
V
V
I
M
I
V
Q
V

T
V
V
F
F
F
F
F
F
F
F
F
F
F
F
F
F
L
L
L
V
A
A
A
A
A
A
V
A
L

I

A
N
G
G
G
G
S
S
G
G
G
S
S
V
V
V
V
V
V
V
V
V
V

A
V
L
G

L
A
L
L
E
Y
I
F
F
M
M
W
M
M
M
M
M
M
M
M
M
M
L
V
V
I
I
I
I
I

V
L
I
I
L
V
E

I
I
V
N
I
I
G
G
V
A
V
G
G
R
R
H
R
R
R
R
R
R

R
R
R
R
R
C
C
C
C
S
S
C
S
C
S
S
S
C
F

C
A

D
D
D
D
D
D
D

D
D
V
V
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D

D
E

W
W
F
G
W
F
E
E
A
L
S
A
S
R
R
R
R
R
R
R
R
R
R
R
R
R
R

V
V
V
L
V
V
V
V
V
V
V
V
V
R

V
V
V
G
G
G
G
G
G
G
G
G
V
G
G

G
V

Y
H
F
D
D
Y
F
W
D
D
A
A
L
A
A
A
A
A
S
S
S
A
A
A
W
W
W

W
W
W
W
W
W
W
W
W
V

a

EMBLCDS entry name and b EnsEMBL peptide databank. c Module 1 of ACV synthetase is included for comparison with the code of Lys2
and AASDH. d Predicted using the NRPS prediction BLAST server [11].

Table 2. The residues equivalent to those which in the
GrsA were observed to interact with the a-amino and
a-carboxyl groups of the amino acid substrate, Asp235
and Lys517, respectively [7], are conserved, the only
934

exceptions being the freestanding NRPS from Leptosphaeria maculans (UniProt accession no. Q873Z1)
(lacks the Asp235) and AASDH from Acremonium
chrysogenum (UniProt accession no. Q9HDP9) (lacks
FEBS Journal 272 (2005) 929–941 ª 2005 FEBS


L. Di Vincenzo et al.


the Lys517), and the sequences of Ebony from Drosophila melanogaster and from Anopheles gambiae
(UniProt accession no. Q7QKF0) where the Asp235 is
missing. In this latter case, the absence of the Asp235
can be explained in the light of the model of the interaction of the substrate with the active site (vide infra).
It should be noted that the specificity code for the A
domains recognizing the substrate b-alanine (Table 2)
is similar to that already predicted for the A module
of exochelin synthetase from Mycobacterium smegmatis
(UniProt accession no. O87313) [11], the only differences being in Ebony, at the positions 239 (Ser vs.
Thr), 278 (Val vs. Leu), 299 (Val vs. Ile) and 322 (Phe
vs. Ser). The specificity codes of Lys2 and AASDH
share the residues Asp235 and Pro236. The residue
Pro236 seems to be specific for the aminoadipate substrates. Indeed, the only other system in which it is
present in the same position is the module 1 of the
chloroeremomycin synthetase (UniProt accession no.
O52821) from Amycolatopsis orientalis specific for 3,5hydroxy-l-phenylglycine [11]. The specificity code of
the module 1 of ACV [l-d-(a-aminoadipoyl)-l-cysteine-d-valine] synthetase from Penicillium chrysogenum that activates the l-a-aminoadipate, displays
strong similarities to the Lys2 code (Table 2) with the
remarkable difference at position 235 where a Glu residue replaces the conserved Asp, and at position 330,
where a Phe residue replaces the conserved Arg ⁄ His.
The marginal resemblance of the AASDH code to that
of Lys2 and ACV module 1 provides a structural basis
for the current view that the physiological substrate of
the dehydrogenase is l-a-aminoadipate semialdehyde
rather than l-a-aminoadipate.
Traditional and freestanding A and T domains share
also some conserved core motifs. In particular, the
core motifs A3 to A10 [1] are conserved in the eucaryotic NRPS-like domains while the motifs A1 and A2
are positioned in a nonconservative section of the
alignment (not shown in the figure). However, A1 and

A2 are away from the active site and probably only
conserved in the NRPSs for structural reasons [1].
Phylogenetic analysis
To visualize evolutionary relationships among the
freestanding NRPS A domains and the corresponding domains of the traditional NRPS in a phylogenetic tree, the A domains of 25 bacterial NRPS
and the A domain of the ACV synthetase from Penicillium chrysogenum were added to the multiple
sequence alignment shown in Fig. 1. The 25 bacterial
NRPS sequences were selected taking one representative from each of the different substrate specificity
FEBS Journal 272 (2005) 929–941 ª 2005 FEBS

Adenylation domains of freestanding NRPS enzymes

groups defined by Challis et al. [11] to have a view
of the substrate range utilized by these enzymes. The
phylogenetic tree shown in Fig. 2A, was built from
the portion of the multiple sequence alignment
shown in Fig. 1 comprised between the positions
190–331, that contain the specificity code residues
and the core motifs A3 to A5, using the neighborjoining method as implemented in the module neighbor of the phylip package [21]. The tree accuracy
was tested with 1000 bootstrap replicates. On the
basis of the assumption that the nine amino acids
lining the binding pocket determine substrate specificity [11], we used maximum parsimony method implemented in the program protpars of the phylip
package [21] to establish a relationship between these
important residues and substrate specificities in the
65 A-domains considered, i.e. 39 freestanding plus 26
NRPS A domains. Therefore, the tree in Fig. 2B was
derived considering only nine sequence positions corresponding to the eight involved in the nonribosomal
specificity code [11] and the Asp235 which was included because not always conserved. On the contrary,
Lys517 was not included because it was conserved in
all cases considered. The resulting tree obviously has

no phylogenetic meaning. The phylogenetic tree
based on the positions 190–331 of the complete
alignment revealed two clusters containing the a-aminoadipate reductase from fungi and the a-aminoadipate semialdehyde dehydrogenase from metazoa, with
independent segregation from the other bacterial
sequences. This pattern parallels that observed in the
specificity code tree reported in Fig. 2B and confirms
that Lys2 and AASDH recognize different substrates.
Another independent cluster in both trees is made by
the two Ebony proteins (UniProt accession nos
Q7QKF0 and O76858), domain A of exochelin synthetase from Mycobacterium smegmatis module 2
(UniProt accession no. O87313) and the two plant
hypothetical NRPS-like proteins (UniProt accession
no. Q8L5Z8 and in silico predicted protein O_SATIVA3 in Table 1). This segregation could suggest that
b-alanine or a very similar compound might be the
substrate of the two plant proteins. ACV synthetase
module 1 from Penicillium chrysogenum (UniProt
accession no. P26046) displays a substrate specificity
identical to fungal Lys2 although its sequence is
more similar to that of the metazoa AASDH.
Finally, it is interesting to observe the unexpected
position in the trees of the protein sequences from
Caenorhabditis elegans and Caenorhabditis briggsae
(UniProt accession nos Q95Q02 and Q17301). These
proteins, containing 2870 and 4767 residues, respectively, display a typical NRPS modular structure and,
935


Adenylation domains of freestanding NRPS enzymes

A


L. Di Vincenzo et al.

B

Fig. 2. Phylogenetic trees based on the multiple alignment of A domain sequences. Metazoa, plants, fungi and bacteria are represented with
red, green, brown and black colours, respectively. All names and numbers used in the phylogenetic trees are defined in Table 1 except for
the following UniProt accession numbers: P35854, D-alanine activating enzyme, Lactobacillus casei; Q50857, saframycin Mx1 synthetase B.,
Myxococcus xanthus; O87313, FxbB, Mycobacterium smegmatis; O30409, tyrocidine synthetase 3, Brevibacillus brevis; Q9Z4X5, CDA peptide synthetase II, Streptomyces coelicolor; P19828, AngR protein, Listonella anguillarum; Q45295, LchAA protein, Bacillus licheniformis;
P39845, putative fengycin synthetase, Bacillus subtilis; P45745, dhbF, Bacillus subtilis; O68008, bacitracin synthetase 3, Bacillus licheniformis; O68006, bacitracin synthetase 1, Bacillus licheniformis; O52819, PCZA363.3, Amycolatopsis orientalis; O68007, bacitracin synthetase 2,
Bacillus licheniformis; O87606, peptide synthetase, Bacillus subtilis; Q9ZGA6, FK506 peptide synthetase, Streptomyces sp.; O07944, Pristinamycin I synthetase 3 and 4, Streptomyces pristinaespiralis; P11454, enterobactin, Escherichia coli; O52820, PCZA363.4, Amycolatopsis
orientalis; O52821, PCZA363.5, Amycolatopsis orientalis; P71717, phenyloxazoline synthetase MBTB, Mycobacterium tuberculosis;
Q9Z4 · 6, CDA peptide synthetase I, Streptomyces coelicolor; Q50858, saframycin Mx1 synthetase A, Myxococcus xanthus; O69246,
LchAB protein, Bacillus licheniformis; P26046, N-(5-amino-5-carboxypentanoyl)-L-cysteinyl-D-valine synthetase, Penicillium chrysogenum. The
‘M’ followed by a number in bacterial NRPS refers to the module A used for building the trees. Enzyme substrates are indicated at the end
of the databank code with the standard one-letter code for amino acids or with the following abbreviations. Aa: L-a-aminoadipate; Orn:
L-ornithine; DHPG: 3,5-hydroxy-L-phenylglycine; PGly: L-phenylglycine; b-A: b-alanine; Aas: L-a-aminoadipate semialdehyde; 3hTyr: 3-hydroxyL-tyrosine; HPG: 4-hydroxy-L-phenylglycine; 3h4mF: 3-hydroxy-4-methyl-phenylalanine. (A) Neighbor-joining phylogenetic tree based on the
comparison of alignment positions 190–331. The numbers on the branches indicate the number of times the partition of the species into
the two sets which are separated by that branch occurred among the 1000 bootstrap trees; (B) maximum parsimony tree calculated with
the nine amino acid lining the substrate binding pocket of adenylation domains.

in the phylogenetic tree, are grouped with bacterial NRPSs. Two sequences in the same species
homologous to AASDH are observed to cluster, as
expected, in the AASDH group (UniProt accession
936

no. Q9XUJ4 and EnsEMBL accession no. ENSCBRP00000001007).
Evolutionary trace analysis [22] (results not shown)
was also applied to confirm the presence of functionally
FEBS Journal 272 (2005) 929–941 ª 2005 FEBS



L. Di Vincenzo et al.

important residues conserved at different levels of partition of the freestanding NRPS family. This method
exploits the information inherent in a family of homologous proteins by dividing it to maximize functional
similarity within the groups and functional variation
between the single groups. The analysis was conducted
using the TraceSuite II server (st.
bioc.cam.ac.uk/jiye/evoltrace/evoltrace.html) with the
same multiple sequence alignment used for building
the tree shown in Fig. 2A. The results showed that the
core motifs A3 to A5 are conserved in almost all partitions and are characteristic of the NRPS A domains.
Furthermore, the variability of the residues of the specificity code confirms that they are group-specific
except for the residues Asp235 and Pro236 that are
shared by the two groups, AASDH and Lys2, which
bind similar substrates (Table 2).

Adenylation domains of freestanding NRPS enzymes

The reliability of docking experiments using homology models built at a sequence identity to the template of 25–30%, as in the reported case, can be

Modelling of active sites and docking studies
Molecular modeling, manual and automated docking
have been utilized to map the conserved residues
onto a hypothetical active site structure, to understand the role of their conserved residues and predict
their interaction with the substrates. Figures 3, 4 and
5 report the model active sites of Ebony from
Drosophila melanogaster, Lys2 from Saccharomyces
cerevisiae and the AASDH from Homo sapiens,

respectively.

Fig. 4. Model structure of the active site of Lys2 from Saccharomyces cerevisiae. AMP molecule is shown as a stick model. Carbon, oxygen, nitrogen and phosphorous atoms are displayed with
green, red, blue, purple colors, respectively. The two possible
assets of the substrate L-a-aminoadipate (L-a-Aa) are superimposed
and represented as sticks. Carbon atoms are colored in two different way: cyan for L-a-Aa in which the d-carboxyl group forms an
hydrogen bond with Lys517; green for L-a-Aa in which a-carboxyl
forms a hydrogen bond with Lys517. The other atoms are colored
as in AMP. All the residues in the active site are rendered as CPK
and colored in slate blue. This figure was rendered using PYMOL [31].

Fig. 3. Model structure of Ebony from Drosophila melanogaster.
Ebony model is represented in teal blue cartoons. AMP molecule is
rendered as a stick model. The specifity code residues are shown
as stick models with superimposed slate blue CPK models. Carbon,
oxygen, nitrogen and phosphorous atoms are displayed with green,
red, blue, purple colors, respectively. b-Alanine is represented as a
stick model with grey carbon atoms. Dashes indicate hydrogen
bonds. This figure was rendered using PYMOL [31].

Fig. 5. Model structure of AASDH from Homo sapiens. AASDH
main chain is represented in teal blue cartoons; AMP is shown as
stick model. Carbon, oxygen, nitrogen and phosphorous atoms
are displayed with violet, red, blue, purple colours, respectively.
L-a-Aminoadipate semialdehyde (L-a-Aas) is represented as stick
and carbon atoms are green. The specifity code residues are
shown as sticks and CPK. Sticks are colored as in AMP except for
carbon atoms which are in grey, and CPK which are colored in blue
marine. This figure was rendered using PYMOL [31].


FEBS Journal 272 (2005) 929–941 ª 2005 FEBS

937


Adenylation domains of freestanding NRPS enzymes

questionable. Indeed, the superposition of the three
structures related to the freestanding A domains,
namely GrsA, firefly luciferase and DhbE that share
16% sequence identity on average, shows that the
average RMSD over the Ca of the entire structures is
˚
2.6 A. On the contrary, the average RMSD calculated
˚
over the Ca enclosed in a sphere of radius 9 A centered
at the GrsA residue Asp235 in the active site, is
˚
0.95 A. Indeed, the active sites of the enzymes tend to
be structurally more conserved during evolution [23].
Therefore the error affecting the active site is expected
to be lower than that regarding the rest of the protein.
Consequently, the docking studies can still provide
useful and testable indications.
In the active site of Ebony (Fig. 3), two residues
of the traditional nonribosomal code Asp235 and
Pro236, are replaced by Val and Asp, respectively.
The aspartate in position 236 can form a hydrogen
bond to the b-amino group of the b-alanine substrate,
which interacts also via hydrogen bonds with Ser301

and Asp331. The other residues line the active site
pocket. A bulky aromatic residue (Phe322) serves as
the floor of the active site pockets. Apparently, the
rearrangement of the side chains at the active site
enabled the enzyme to recognize a substrate with a
b-amino instead of a a-amino group. Interestingly,
substitution of Asp235 is indicative of the substrate
structure. For example, in the case of DhbE position
235 is occupied by Asn and the relative susbstrate
lacks a a-amino group [8].
It has been proposed, for Lys2 from S. cerevisiae,
that the l-a-aminoadipate substrate could be adenylated at the d-carboxylate rather than the a-carboxylate and that the a-amino and a-carboxyl groups of
the substrate bind at the bottom of the pocket interacting with the Arg239 and Glu322 [14]. Analogous
arrangement was proposed also for the binding of
l-a-aminoadipate to the adenylation domain of the
ACV synthetase from Penicillium chrysogenum [7].
The results of the docking experiments indicated
(Fig. 4) that the possible binding modes cluster into
two solutions. According to the first possibility, the
substrate a-aminoadipate is bound to the active site
with a salt bridge between the a-amino group and
the a-carboxyl group of Asp235 and a hydrogen
bond to the carbonyl group of Arg330. In yeast
Lys2, the d-carboxylate group of the substrate forms
a salt bridge with Arg239. Finally, the substrate
a-carboxylate interacts via hydrogen bonds with the
e-amino group of Lys517. The other residues of the
putative specificity code line the walls of the active
site. In particular, the conserved Pro236 shapes the
pocket to host the substrate. An alternative inter938


L. Di Vincenzo et al.

action way of binding of the substrate to the active
site involves the formation of a salt bridge between
the d-carboxylate group and the e-amino group of
Lys517 and between the a-carboxylate and Arg239.
The a-amino group interacts via hydrogen bonds with
the carbonyl oxygens of Met322, Gly324 and Arg330.
The first binding mode of the substrate (the a-carboxylate interacting with Asp235) is supported by
the invariancy of Asp235 that usually stabilizes the
a-amino group of the amino acid substrate. The
importance of Asp235 in Lys2 is evidenced also by
mutational analysis which showed a complete loss of
catalytic activity for the mutant Asp235fiAsn, while
the mutant Asp235fiGlu retained only 4% of catalytic activity [24]. Also, this binding mode is in line
with the absence of a negatively charged side chain in
the position 322 of the putative a-aminoadipate specificity code (Table 2) whose role is to stabilize the
a-amino group of the substrate. Such a residue
(Glu322) is present in ACV synthetase. However,
most importantly, the same binding mode does not
account for the experimental evidence of the existence
of the a-aminoadipoyl-C6-AMP [13], which can be
explained by the binding mode with the d-carboxylate
in proximity of Asp235.
The results of the docking studies of a-aminoadipate
semialdehyde, assumed to be the substrate of AASDH
[12] (Fig. 5), show that the substrate can interact with
the active site in only one orientation. It involves the
formation of a salt bridge between the a-amino group

of the substrate and the carboxylic group of Asp235
and a hydrogen bond to the carbonyl atom of Ser330.
The d-aldehyde group of the substrate interacts with
Gln278. Finally, the substrate a-carboxylate, as expected, interacts via hydrogen bonds with the e-amino
group of Lys517 in both enzymes. Once again, this
binding mode can explain the invariancy of Asp235
and this model can account for the lack of a negatively
charged side chain at position 322 of the putative specificity code (Table 2) able to stabilize the a-amino
group of the substrate which is instead present in ACV
synthetase (Glu322).
The results reported in this work demonstrate that a
specificity-conferring code can be recognized also in
the freestanding eucaryotic NRPS-like enzymes. A role
for some of the specificity residues could be predicted
on the basis of in silico studies. These indications can
be useful for programming experiments aimed at a better characterization and at the engineering of this
emerging group of single NRPS modules responsible
for amino acid selection, activation and modification
in the absence of other NRPS assembly line components.
FEBS Journal 272 (2005) 929–941 ª 2005 FEBS


L. Di Vincenzo et al.

Experimental procedures
Databank searches, gene prediction, sequence
comparisons and evolutionary analysis
UniProt and NR databanks, available, respectively, at EBI
()
and

NCBI
(i.
nlm.nih.gov/entrez) web sites, were searched with the psiblast [16] program. Genomes were accessed through the
EnsEMBL (), TIGR (http://www.
tigr.org) and NCBI portals. The CDD [17] and Pfam [18]
databanks were utilized as a reference for protein families
and domains identification.
NRPS-like A domains not yet included in the protein
databanks were searched through the genomes of Homo
sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Fugu
rubripes, Ciona intestinalis, Drosophila melanogaster,
Caenorhabditis elegans, Caenorhabditis briggsae, Saccharomyces cerevisiae, Saccharomyces castellii, Saccharomyces
mikatae, Schizosaccharomyces pombe, Candida albicans,
Ciona savigny, Gallus gallus, Pan troglodytes, Oryza sativa
and Zea mays for which draft genomic sequences were
available. tblastn module of blast [16] package at NCBI,
EnsEMBL, or TIGR was used to search putative NRPS
domain-related genes on such genomes. Gene predictions
were subsequently validated and checked through the
sequential application of several programs. Indeed, the genomic sequences spotted by tblastn were further analyzed
with the program genomescan [25]: the target DNA
sequence plus 10 kilobases upstream and downstream were
extracted to assure that the entire gene structure including
promoter, encoding region, terminator, etc. was taken. The
putative gene structures were further confirmed by EST
(expressed sequence tags) clustering analysis, performed
with estcluster software available in the GCG package
(Wisconsin package, version 10.2, Genetics Computer
Group, Madison, WI, USA) on the EST database of the
respective organism.

Multiple sequence alignments were built with hmmalign
program of hmmer 2.0 software package [19] and clustalw software [20]. The resulting alignments were visually
inspected and, when appropriate, manually adjusted. The
sequence alignments were displayed with the program
espript [26].
Phylogenetic trees relied on the modules protdist,
neighbor, protpars and drawgam of the phylip package
[21].

Molecular modelling and docking studies
The crystal structure of the adenylation domain of the gramicidin synthetase (GrsA, Protein Data Bank code 1AMU)
was used as a template for the construction of homologyderived models of human a-aminoadipate semialdehyde
dehydrogenase (AASDH, UniProt accession no. Q7Z5Y3),

FEBS Journal 272 (2005) 929–941 ª 2005 FEBS

Adenylation domains of freestanding NRPS enzymes

Ebony b-alanyl biogenic amine synthetase from D. melanogaster (UniProt accession no. O76858) and yeast a-aminoadipate reductase (Lys2, UniProt accession no. P07702).
GrsA shares 26% sequence identity with human AASDH,
27% with Ebony and 30% with Lys2. Two other potential
templates, firefly luciferase (PDB code 1LCI) and DhbE
(PDB code 1MD9) have lower percentages of sequence
identity with the target sequence. Indeed, the sequence identities between DhbE and AASDH, Ebony and Lys2 are 19,
23 and 21%, respectively, and they are even lower for firefly luciferase (11, 12 and 11%). Moreover, DhbE binds a
substrate without an amino group and has a deletion of one
of the residues of the specificity code corresponding to the
Ile330 of the GrsA. For all these reasons, GrsA was judged
the best template and selected for modelling. Homology
modelling was based on the multiple sequence alignment

obtained as described in the previous section. Models were
calculated with the modeller-4 package [27]. Twenty different models were derived for each target protein using the
highest built-in refinement procedure and the one displaying
the lowest objective function, which measures the extent of
violation of constraints from the templates, was taken as the
representative model. An AMP molecule, a buried water
molecule and a magnesium ion from the crystal structure of
GrsA were maintained in the homology models. The stereochemical quality, the packing, and the solvent exposure of
the resulting models were validated by procheck [28] and
prosaii [29] analyses, respectively.
The models of the Lys2 substrate, l-a-aminoadipate, the
Ebony substrate, b-alanine, and the AASDH putative substrate, l-a-aminoadipate semialdehyde, were built using the
builder package in the program insightii (Version 2000,
Accelrys, San Diego, CA, USA) and were positioned into
the predicted enzyme active sites following the binding
mode of the phenylalanine substrate in GrsA. During the
manual docking, steric clashes were removed and the position of the substrate at the active site was adjusted to establish stabilizing interactions between the atoms of the ligand
and those constituting the active site.
To optimize the conformation of the active site side
chains and the position of the substrate, the enzyme-substrate complex was further processed by energy minimization as implemented in discover 2.9 of the insightii
package. The cff91 forcefield, a distance-dependent dielec˚
tric constant and a cut-off distance of 28 A were used during each simulation. An initial minimization was performed
to relax the hydrogens added to the model. Positions of the
heavy atoms of the binary complex were fixed, and 100
steepest descent steps were performed, until the maximum
˚
energy derivative was less than 41.8 kJỈmol)1ỈA)1. Subsequently, while main chain atoms were maintained fixed,
˚
side chains of every residue contained in a sphere of 9 A
centered on the phosphate atom of the AMP were subjected to a gradually decreasing tethering force (from 4180


939


Adenylation domains of freestanding NRPS enzymes

˚
to 209 kJỈA)2) using steepest descents, until the maximum
˚
derivative was less than 4.18 kJỈmol)1ỈA)1. Finally, a side
chain minimization including charges was performed for
100 steepest descent step, until the maximum energy deriv˚
ative was less than 0.42 kJỈmol)1ỈA)1.
To verify and confirm the predicted binding mode of the
substrate, the position of the substrate was again calculated
with an automatic method which does not rely on the information contained in the template structure. The autodock
3.0 suite [30] which exploits the Lamarckian genetic algorithm, was used. The docking grid size was prepared with
the autogrid utility of autodock setting to 90 · 90 · 90
˚
points with a grid spacing of 0.200 A. The grid center was
placed in the active site pocket center. The grid boxes included the entire binding site of the enzyme and provided
enough space for the ligand translational and rotational
walk. For each of the three enzymes, AASDH, Lys2 and
Ebony, 50 runs were performed and for each, a maximum
number of 27 000 genetic algorithm operations were generated on a single population of 50 individuals. The maximum number of energy evaluations was set to 250 000.
Other parameters for the docking were: a random starting
˚
position and conformation, a maximal mutation of 2 A in
translation and 50 degrees in rotations, an elitism of 1, a
mutation rate of 0.02, a crossover rate of 0.8 and a local

search rate of 0.06. Simulations were ranked according to
the docked energy between the protein and the ligand, a
summation of internal ligand energy and intermolecular
energy terms.

L. Di Vincenzo et al.

4

5

6

7

8

9

10

11

Acknowledgements
This research was supported in part by a grant from
`
the Italian Ministero dell’Istruzione, Universita e
Ricerca (MIUR) and by PRIN-2002 grant to IG. Part
of this work will be submitted by LDV in partial fulfillment of the requirements of the degree of Dottorato
`

di Ricerca at Universita di Roma ‘La Sapienza’.
Authors are grateful to Daniele Tronelli for his skilful
help and to Professor Francesco Bossa and to Professor Donatella Barra for their encouraging support and
advice.

12
13

14

15

References
1 Marahiel MA, Stachelhaus T & Mootz HD (1997)
Modular peptide synthetases involved in nonribosomal
peptide synthesis. Chem Rev 97, 2651–2673.
2 Mootz HD, Schwarzer D & Marahiel MA (2002) Ways
of assembling complex natural products on modular nonribosomal peptide synthetases. Chembiochem 3, 490–504.
3 Du L, Sanchez C, Chen M, Edwards DJ & Shen B
(2000) The biosynthetic gene cluster for the antitumor
drug bleomycin from Streptomyces verticillus

940

16

17

ATCC15003 supporting functional interactions between
nonribosomal peptide synthetases and a polyketide

synthase. Chem Biol 7, 623–642.
Walsh CT (2004) Polyketide and nonribosomal peptide
antibiotics: modularity and versatility. Science 303,
1805–1810.
Guenzi E, Galli G, Grgurina I, Gross DC & Grandi G
(1998) Characterization of the syringomycin synthetase
gene cluster: a link between procaryotic and eucaryotic
peptide synthetases. J Biol Chem 273, 32857–32863.
Conti E, Franks NP & Brick P (1996) Crystal structure
of firefly luciferase throws light on a superfamily of adenylate-forming enzymes. Structure 4, 287–298.
Conti E, Stachelhaus T, Marahiel MA & Brick P (1997)
Structural basis for the activation of phenylalanine in
the non-ribosomal biosynthesis of gramicidin S. EMBO
J 14, 4174–4183.
May JJ, Kessler N, Marahiel MA & Stubbs MT (2002)
Crystal structure of DhbE, an archetype for aryl acid
activating domains of modular nonribosomal peptide
synthetases. Proc Natl Acad Sci USA 99, 12120–12125.
Keating TA, Marshall CG, Walsh CT & Keating AE
(2002) The structure of VibH represents nonribosomal
peptide synthetase condensation, cyclization and epimerization domains. Nat Struct Biol 9, 522–526.
Stachelhaus T, Mootz HD & Marahiel MA (1999) The
specificity-conferring code of adenylation domains in
nonribosomal peptide synthetases. Chem Biol 6, 493–505.
Challis GL, Ravel J & Townsend CA (2000) Predictive,
structure-based model of amino acid recognition by
nonribosomal peptide synthetase adenylation domains.
Chem Biol 7, 211–224.
Kasahara T & Kato T (2003) A new redox-cofactor
vitamin for mammals. Nature 422, 832.

Sinha AK & Bhattacharjee JK (1971) Lysine biosynthesis in Saccharomyces: conversion of a-aminoadipate
into a-aminoadipic d-semialdehyde. Biochem J 125,
743–749.
Ehmann DE, Gehring AM & Walsh CT (1999) Lysine
biosynthesis in Saccharomyces cerevisiae: mechanism of
a-aminoadipate reductase (Lys2) involves posttranslational phosphopantetheinylation by Lys5. Biochemistry
38, 6171–6177.
Richard A, Kemme T, Wagner S, Schwarzer D,
Marahiel MA & Hovemann BT (2003) Ebony, a novel
nonribosomal peptide synthetase for b-alanine conjugation with biogenic amines in Drosophila. J Biol Chem
278, 41160–41166.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang
ă
Z, Miller W & Lipman DJ (1997) Gapped BLAST and
PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res 25, 3389–3402.
Marchler-Bauer A, Anderson JB, DeWeese-Scott C,
Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson

FEBS Journal 272 (2005) 929–941 ª 2005 FEBS


L. Di Vincenzo et al.

18

19
20

21


22

23

JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C,
Madej T, Marchler GH, Mazumder R, Nikolskaya AN,
Panchenko AR, Rao BS, Shoemaker BA, Simonyan V,
Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ & Bryant SH (2003) CDD: a curated
Entrez database of conserved domain alignments.
Nucleic Acids Res 31, 383–387.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V,
Griffiths-Jones S, Khanna A, Marshall M, Moxon S,
Sonnhammer EL, Studholme DJ, Yeats C & Eddy SR
(2004) The Pfam Protein Families Database. Nucleic
Acids Res 32, D138–D141.
Eddy SR (1996) Hidden Markov Models. Curr Opin
Struct Biol 6, 361–365.
Thompson JD, Higgins DG & Gibson TJ (1994) clustalw: improving the sensitivity of progressive multiple
sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
Nucleic Acids Res 22, 4673–4680.
Felsenstein J (1996) Inferring phylogenies from protein
sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266, 418–427.
Lichtarge O, Bourne HR & Cohen FE (1996) An evolutionary trace method defines binding surfaces common
to protein families. J Mol Biol 257, 342–358.
Irving JA, Whisstock JC & Lesk A (2001) Protein structural alignments and functional genomics. Proteins
Struct Func Genet 42, 378–382.

FEBS Journal 272 (2005) 929–941 ª 2005 FEBS


Adenylation domains of freestanding NRPS enzymes

24 Guo S & Bhattacharjee JK (2003) Site-directed
mutational analysis of the novel catalytic domains of
a-aminoadipate reductase (Lys2p) from Candida
albicans. Mol Genet Genomics 269, 271–279.
25 Yeh RF, Lim LP & Burge C (2001) Computational
inference of homologous gene structures in the human
genome. Genome Res 11, 803–816.
26 Gouet P, Courcelle E, Stuart DI & Metoz F (1999)
ESPript: analysis of multiple sequence alignments in
PostScript. Bioinformatics 15, 305–308.
ˇ
27 Sali A & Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol
234, 779–815.
28 Laskowski RA, MacArthur MW, Moss DS & Thornton
JM (1993) procheck: a program to check the stereochemical quality of protein structures. J Appl Cryst 26,
283–291.
29 Sippl MJ (1993) Recognition of errors in three-dimensional structures of proteins. Proteins Struct Func Genet
17, 355–362.
30 Morris GM, Goodsell DS, Halliday RS, Huey R, Hart
WE, Belew RK & Olson AJ (1998) Automated docking
using a Lamarckian genetic algorithm and empirical
binding free energy function. J Comp Chem 19, 1639–
1662.
31 DeLano WL (2002) The PyMOL Molecular Graphics
System. DeLano Scientific, San Carlos, CA, USA.

941




×