Tải bản đầy đủ (.pdf) (12 trang)

Báo cáo khoa học: Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (684.48 KB, 12 trang )

Abundance of intrinsic disorder in SV-IV, a multifunctional
androgen-dependent protein secreted from rat seminal
vesicle
Silvia Vilasi and Raffaele Ragone
Dipartimento di Biochimica e Biofisica, Naples, Italy
The view that a protein must fold into the correct
shape, as encoded in the amino acid sequence, before
it can function has been deeply rooted in protein sci-
ence, even before the three-dimensional structure of a
protein was first solved. However, for some proteins,
especially those involved in signalling and regulation
[1], the unstructured state has been suggested to be
essential for basic cellular functions and recognized as
a separate functional and structural category [2,3].
These are proteins or domains that, in their native
state, are either completely disordered or contain large
disordered regions, and therefore do not fit the stan-
dard sequence–structure–function paradigm, because
intrinsic disorder, whether local or extended to the
entire protein length, is crucially important for their
function. Dunker and Obradovic [4] categorized func-
tional intrinsically disordered regions in molten glob-
ule-like and random coil-like structural forms, and
Uversky [5] suggested the existence of an additional
pre-molten globule form, whose peculiarity is the pres-
ence of unstable secondary structure. Betraying still
imperfect categorization, these systems are currently
classified as ‘intrinsically disordered proteins’ (IDPs),
but the use of other synonymous expressions, such as
‘intrinsically unstructured proteins’, is widespread in
the literature [6]. More than 100 such proteins are


known, including Tau, Prions, Bcl-2, p53, 4E-BP1 and
eIF1A [5,7].
Keywords
bioinformatics; disorder prediction;
intrinsically disordered proteins; seminal
vesicle protein no. 4; structure–function
relationship
Correspondence
R. Ragone, Dipartimento di Biochimica
e Biofisica, Seconda Universita
`
di Napoli,
via S. Maria di Costantinopoli 16,
80138 Naples, Italy
Fax: +39 081 294136
Tel: +39 081 294042
E-mail: ;

(Received 30 October 2007, revised 5
December 2007, accepted 13 December
2007)
doi:10.1111/j.1742-4658.2007.06242.x
The potent immunomodulatory, anti-inflammatory and procoagulant
properties of protein no. 4 secreted from the rat seminal vesicle epithelium
(SV-IV) have previously been found to be modulated by a supramolecular
monomer–trimer equilibrium. More structural details that integrate experi-
mental data into a predictive framework have recently been reported.
Unfortunately, homology modelling and fold-recognition strategies were
not successful in creating a theoretical model of the structural organization
of SV-IV. It was inferred that the global structure of SV-IV is not similar

to that of any protein of known three-dimensional structure. Reversing the
classical approach to the sequence–structure–function paradigm, in this
paper we report novel information obtained by comparing the physico-
chemical parameters of SV-IV with two datasets composed of intrinsically
unfolded and ideally globular proteins. In addition, we analyse the SV-IV
sequence by several publicly available disorder-oriented predictors. Overall,
disorder predictions and a re-examination of existing experimental data
strongly suggest that SV-IV needs large plasticity to efficiently interact with
the different targets that characterize its multifaceted biological function,
and should therefore be better classified as an intrinsically disordered
protein.
Abbreviations
HCA, hydrophobic cluster analysis; IDPs, intrinsically disordered proteins; PDB, protein data bank; SV-IV, rat seminal vesicle protein no. 4;
SVM, support vector machine.
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 763
Of the proteins studied in our laboratory, SV-IV
(seminal vesicle protein no. 4, so identified according to
its electrophoretic mobility in SDS-PAGE; precursor
SWISS-PROT ID, SVP2_RAT) is a basic (pI = 8.9),
thermostable protein of 90 residues (M
r
= 9758)
secreted from the rat seminal vesicle epithelium under
strict androgen transcriptional control, which has been
found to possess potent non-species-specific immuno-
modulatory, anti-inflammatory and procoagulant prop-
erties [8]. It has been purified to homogeneity and
characterized extensively [8–10]. It is encoded by a gene
that has been isolated, sequenced and expressed in
Escherichia coli [11–14]. On the basis of its biological

and biochemical characteristics, SV-IV appears to be a
molecule of obvious pharmacological interest. SV-IV-
immunorelated proteins have been discovered in several
rat tissues, as well as in human seminal fluid and semi-
nal vesicle secretion [13,14]. The segment 3–41 of SV-IV
has been found to have a high amino acid sequence
similarity with the C-terminal segment 34–66 of utero-
globin, a secreted protein from rabbit displaying
phospholipase A2 inhibitory activity in vitro and anti-
inflammatory effects in vivo [15,16]. Others have also
been able to prepare potent anti-inflammatory peptides
from the region of highest similarity between uteroglo-
bin and lipocortin I, a protein that has been suggested
to mediate the anti-inflammatory effects of glucocortic-
oids [17]. It is therefore highly desirable to obtain as
complete structural information as possible.
From a structural standpoint, early circular dichro-
ism and fluorescence polarization data indicated scarce
structural organization [18]. This agreed with a predic-
tor of local flexibility [19], although other predictive
algorithms contrastingly have suggested either the pres-
ence [18] or lack [20] of an appreciable amount of sec-
ondary structure. Recently, it has been found that, in
the range of physiological concentrations (2–48 lm
[20,21]), the peculiar biological properties of SV-IV are
probably modulated by a supramolecular equilibrium
in which a trimeric form competes with monomeric
protein for binding to a large variety of SV-IV targets
[20]. Eventually, Caporale et al. [22] found agreement
between the amounts of predicted and experimental

helical structure present in the monomeric form
(20 and 24%, respectively), and attempted to create a
theoretical model of the structural organization of SV-
IV. However, on noting that homology modelling and
fold-recognition strategies were not able to provide
detailed structural information, they concluded that
‘SV-IV assumes a global structure that is not similar
to any protein of known three-dimensional structure’
[22]. Indeed, such an occurrence suggests that SV-IV
could violate the standard sequence–structure–function
paradigm, but the authors did not investigate this pos-
sibility.
We have verified that, in terms of disorder- and
order-promoting amino acid subsets [23,24], the com-
position of SV-IV does not strictly conform to trends
previously found to occur in IDPs, except for a very
high content of serine (24%). Furthermore, a search of
the DisProt database [25] did not return any hits for
SV-IV, indicating that no DisProt sequence resembles
this protein. However, novel information obtained by
publicly available disorder-oriented predictors empha-
sizes that the functional state of SV-IV lacks significant
structural organization. This evidence is sufficient to
confidently state that SV-IV can be classified amongst
IDPs. Incidentally, the present work also confirms that
homology modelling and fold-recognition strategies are
best suited to obtain information on the architecture
of ordered proteins, but the study of IDPs as if they
were ordered can prove to be highly frustrating. Thus,
when dealing with proteins of uncertain three-dimen-

sional structure, it would be more correct and less
time-expensive to look for disorder before attempting
modelling procedures.
Results
Survey of existing structural information
In addition to fluorescence polarization and both far-
and near-UV circular dichroism data from our labora-
tory [18,20,22], experimental evidence that regular
structure is scarce in SV-IV comes from SDS-PAGE,
which is routinely used to assess the M
r
values of pro-
teins. Because of their unusual amino acid composi-
tion, IDPs bind less SDS than usual and their
apparent M
r
value is often 1.2–1.8 times higher than
the real value calculated from sequence data or mea-
sured by mass spectrometry [7]. Indeed, the mobility of
SV-IV in SDS-PAGE is compatible with an M
r
value
of about 15 000–18 000 [9], which can be compared
with an M
r
value of 9758 calculated from the
sequence. Size-exclusion chromatography also indicates
that the hydrodynamic radius of SV-IV resembles that
of an IDP [7], because purified SV-IV elutes well
behind chymotrypsinogen (M

r
= 25 600) and slightly
ahead of RNase A (M
r
= 13 600) [9]. Finally, diges-
tion of SV-IV with trypsin suggests that all but Lys80
of the potential proteolytic sites represented by nine
lysine and seven arginine residues are able to efficiently
interact with the catalytic site of the enzyme [22], as
expected for an IDP-like polypeptide [7]. This piece of
information has prompted us to perform predictive
analyses aimed at clarifying whether or not the SV-IV
Intrinsic disorder in SV-IV S. Vilasi and R. Ragone
764 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
sequence is compatible with the classical sequence–
structure–function paradigm.
Analysis of physicochemical parameters
It has recently emerged that protein disorder tends to
be related to general chemical properties, rather than
to the abundance or scarcity of specific amino acids
[26]. Indeed, like early analyses of protein disorder that
were based on the reasoning that protein folding is
governed by a balance between hydrophobic forces
(attractive) and electrostatic forces between similarly
charged residues (repulsive) [23], disorder-oriented
predictors largely use physicochemical parameters,
such as hydrophobicity [24,27–33], the absolute value
of the net charge [24,27–29,33], C-a B-factors [24,27–
29,32,34] and number of contacts [35–38]. Accordingly,
we obtained preliminary information on the structural

preference of SV-IV by comparing values per residue
of these parameters with those of two protein data-
bases composed of ideally globular [35] and natively
unfolded [39] proteins, respectively. Visual inspection
of two-dimensional plots obtained by considering all
possible combinations of two parameters suggests that
SV-IV has a strong preference to conform to the gen-
eral structural features expected for IDPs, because in
no case do SV-IV data points fall in regions populated
by ordered proteins (Fig. 1).
General prediction analysis
Owing to increased interest in the structure–function
relationships of IDPs, disorder-related literature is
increasing, as witnessed by several recent reviews
[40–43]. To obtain prediction reliability, two general
options are presently available: (a) the combined use
of ab initio algorithms, such as a recent scheme based
on well-known predictors [23]; or (b) recent programs
with improved performance on some benchmarks, such
as those based on expected packing density [36–38] or
support vector machine (SVM) methods [44–46] (see
Materials and methods for further details). However,
as the SV-IV sequence comprises amino acid subsets
different from those previously found to occur in IDPs
[23,24] and does not resemble any known sequence
included in the DisProt database [25], it may be valu-
able to proceed with caution and investigate both
options.
The first procedure comprises a preliminary search
for low-complexity regions through the seg algorithm

[47], followed by a thorough analysis benefiting from
the combined use of several ab initio methods, such as
pondr (VSL1 and VL-XT) [24,27–29], hydrophobic
cluster analysis (hca) [30], prelink [31], globplot
[32], disembl [34], ronn [48], iupred [49], disopred
2
[50] and norsp [51]. When applied to SV-IV, seg
resulted in a long non-globular region spanning the
entire sequence, but few amino acids in the N- and
C-termini (amino acids 1–4 and 84–90, respectively).
Other structural peculiarities, such as disulfide-forming
cysteine residues, zinc fingers and leucine zippers [52],
are absent from the SV-IV sequence. On the functional
side, SV-IV is predicted to be a metal binding protein
[53], but the expected probability of correct classifica-
tion is about 60%, which is lower than the actual clas-
sification accuracy based on the analysis of 9932
positive and 45 999 negative samples of proteins [54].
The vast majority of the other methods also converged
to indicate an abundance of intrinsic disorder in
SV-IV, but few amino acids in the C-terminal region.
In particular, hydrophobic clusters, which are typical
of secondary structure elements, were almost totally
absent from the hca plot, and prelink predicted the
whole sequence as disordered. By contrast, some regu-
lar structure was predicted by X-ray-based algorithms,
such as various disembl routines and disopred
2 (seg-
ments 31–39, 49–59 and 77–90), and discrepancies also
affected globplot analyses, depending on the particu-

lar order–disorder propensity set chosen to obtain pre-
dictions, but in no cases were potential globular
domains predicted. When subjected to norsp, the SV-
IV protein did not appear to conform to criteria fixed
for identifying non-regular secondary structure
(NORS) regions, although about 70% of residues were
predicted to be in loopy regions. We suspect that no
NORS region can be predicted in SV-IV because the
recommended length of the sequence window used to
calculate the structural content (70 amino acids) is
close to the protein length (90 amino acids). Finally, a
vanishingly small probability of coiled-coil regions was
also predicted by multicoil [55] and coils [56] algo-
rithms (not shown). The above results are summarized
in Fig. 2.
Another set of predictions was performed using
algorithms that have been reported to predict protein
disorder more accurately than other methods, namely
the foldunfold predictor [36–38] and the SVM-based
poodle suite [44–46]. According to foldunfold,
SV-IV is probably fully disordered, because the aver-
age value of the disorder parameter over its sequence
is less than the disorder threshold. Moreover, the aver-
age value of the disorder parameter over regions 1–34,
36–57 and 59–80 is less than the disorder threshold
and the regions are greater than the reliable frame
(11 residues), which means that these regions are
predicted as fully disordered (Fig. 3A). Similarly,
S. Vilasi and R. Ragone Intrinsic disorder in SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 765

poodle predictions suggest that: (a) the entire SV-IV
sequence corresponds to a long disorder region
(poodle-l); (b) a few residues (amino acids 39–40
and 85–90) do not belong to short disorder regions
(poodle-s); and (c) disorder characterizes the whole
protein because of the high disorder propensity of all
residues (poodle-w) (Fig. 3B).
Other predictions
To complete our analysis, we verified whether or not
SV-IV possesses biased amino acid composition and
can be maximally separated from globular proteins.
Both features have been found to occur in IDPs. On
the first point, Weathers et al. [26,57] have recently
examined the contribution of various vectors to recog-
nizing proteins that contain disordered regions through
an SVM trained on naturally occurring disordered and
ordered proteins. They found that high recognition
accuracy can be obtained by an SVM that incorporates
only amino acid composition, and very good recogni-
tion accuracy was retained using reduced sets of amino
acids based on chemical similarity. Overall, this sug-
gests that composition alone and general physicochem-
ical properties, rather than specific amino acids, are
sufficient to accurately recognize disorder. We applied
0
0.2
0.4
0.6
0.8
AB

CD
EF
Hydrophobicity
Hydrophobicity
Net charge
0
0.2
0.4
0.6
18 19 20 21 22
Number of contacts
Net charge
0
0.2
0.4
0.6
–0.1
0.1 0.2 0.3 0.4 0.5 0.6
0 0.1 0.2 0.3
B factors
Net charge
0.15
0.30
0.45
18
–0.15 –0.05 0.05 0.15 0.25
19 20 21 22
Number of contacts
Hydrophobicity
0.15

0.30
0.45
0.60
0.05–0.15 –0.05 0.15 0.25
B factors
16.5
18.0
19.5
21.0
22.5
B factors
Number of contacts
Fig. 1. Two-dimensional plots. The SV-IV datum (red symbol) is compared with the two sets of 90 natively unfolded and 80 ideally globular
proteins (black and grey symbols, respectively) using the mean values of physicochemical parameters computed from the sequence.
(A) Number of contacts versus hydrophobicity. (B) Number of contacts versus net charge. (C) Number of contacts versus C-a B-factors.
(D) Net charge versus hydrophobicity. (E) Net charge versus C-a B-factors. (F) Hydrophobicity versus C-a B-factors.
Intrinsic disorder in SV-IV S. Vilasi and R. Ragone
766 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
Fig. 2. Analysis of the SV-IV sequence using well-known predictors. The original graphic output of each method and the corresponding inter-
pretation are shown. In
HCA, the protein sequence is shown on a duplicated a-helical net with hydrophobic clusters identified by solid con-
tours and amino acid numbers indicated on the top.
, ¤, h and refer to proline, glycine, threonine and serine, respectively.
S. Vilasi and R. Ragone Intrinsic disorder in SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 767
the SVM method to compare the SV-IV sequence with
the primary structures of 80 ideally folded and
90 natively unfolded proteins. Fig. 4A shows the mean
values of the disorder score for all of these proteins.
Although the regions covered by the two protein data-

sets overlap to some extent, the SV-IV datum clearly
belongs to the region populated by natively unfolded
proteins. With regard to the second point, other
authors [35] have devised an optimal set of artificial
parameters for 20 amino acid residues by Monte Carlo
algorithm, by which they have obtained maximal sepa-
ration between sets of natively unfolded and ideally
globular proteins. Following the same rationale as
above, we compared the mean value of the artificial
parameter for SV-IV and the two sets of proteins.
Even in this case, the SV-IV datum unequivocally falls
amongst natively unfolded proteins, whose data points
are well separated from those of globular proteins
(Fig. 4B). Finally, Fig. 4C summarizes the results
obtained by other algorithms, such as dispro [58],
some additional methods not included in the pondr
package developed by Dunker et al. [59,60], and
aa 39–40 and 85–90 have borderline disorder (probability
very close to 0.5). The remaining regions are predicted as
disordered
POODLE-SPOODLE-L
The whole protein is predicted as disordered
POODLE-W
FOLDUNFOLD
The whole protein is predicted as disordered
0 10 20 30 40 50 60 70 80 90
Residue position
17
18
19

20
21
22
Expected number of contacts
A
B
Disorder probability
Residue positions
0
0.5
1
0 20 40 60 80
Disorder probability
Residue positions
0
0.5
1
0 20 40 60 80
Fig. 3. Analysis of the SV-IV sequence using improved performance programs. Graphic output of FOLDUNFOLD [36–38] (A) and POODLE [44–46]
(B) predictors.
Intrinsic disorder in SV-IV S. Vilasi and R. Ragone
768 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
drippred [61]. All of these algorithms agreed in
predicting that 100% amino acids in the SV-IV
sequence are disordered, except drippred, which
resulted in 32% of residues scoring as regular
structure.
Discussion
The structural information re-examined here indicates
that intrinsic disorder is abundant in SV-IV. Thus, it

was to be expected that homology modelling and
fold-recognition strategies would be unable to create
a theoretical model of the structural organization of
SV-IV [22]. Indeed, we have used several disorder
predictors to obtain novel evidence that the odd
behaviour of SV-IV is not compatible with the classi-
cal sequence–structure–function paradigm. Our predic-
tions suggest that: (a) the entire SV-IV sequence does
not encode any region with globular organization;
(b) a few isolated segments (mostly the C-terminal
region) may possess some regular structure; (c) the
prediction of regular structure almost exclusively
comes from methods based on Protein Data Bank
(PDB) missing coordinates (disembl routines, dis-
opred
2 and drippred) and secondary structure-
derived propensities (globplot with Deleage–Roux
and Russell–Linding parameters); and (d) the mean
physicochemical properties of SV-IV are typical of
IDPs, as suggested by methods based on visual
inspection. This could provide a clue for the clarifica-
tion of the still obscure aspects of the SV-IV struc-
ture–function relationships.
Lack of consensus affecting disorder prediction in
some regions of SV-IV may result from the different
sensitivity displayed by disorder predictors towards the
various functional properties that are encoded in sepa-
rate segments of the protein sequence. Indeed, integrity
of the primary structure was found to be necessary for
immunomodulation, whereas all of the procoagulant

and anti-inflammatory properties were located in the
fragment 1–70, which is devoid of any immunomodu-
latory activity, but possesses the same procoagulant
and anti-inflammatory activity as the native protein.
Moreover, the fragment 8–16 was the shortest N-ter-
minal-derived peptide that possessed equivalent
or slightly higher anti-inflammatory activity than
DISpro
Predictor Disordered region
1–90
VL3, VL3H, VL3E 1–90
DRIPPRED 1–11, 18–47, 58–80
VL2 1–90
–9
–6
–3
0
3
400 600 800
Number of residues in protein Number of residues in protein
Di
sor
d
er score
–4
–2
0
2
4
6

8
A

C
B

0 200 400 600 800 0 200
Artificial parameters
Fig. 4. Additional predictions of disorder. Comparison of the SV-IV sequence with the primary structures of 90 natively unfolded and 80 ide-
ally globular proteins (same symbols as in Fig. 1) using the SVM method [26,57] (A) and an optimal set of artificial parameters [35] (B).
(C) Results obtained by other algorithms.
S. Vilasi and R. Ragone Intrinsic disorder in SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 769
the native protein, but did not possess any immuno-
modulatory or procoagulant activity. Finally, CNBr
cleavage of SV-IV at the single Met70 residue gener-
ated the biologically inactive 71–90 peptide [16],
suggesting that the immunomodulatory properties of
SV-IV are strictly governed by the cooperation
between this and the 1–70 region.
Concerning the organization of SV-IV, the results
reported here are in substantial agreement with pre-
vious secondary structure predictions, at least with
regard to the 1–70 region. In fact, the self-association
process that underlies the overall functional behav-
iour of the protein induces conformational changes
mainly in this region, which has been suggested to
be without secondary structure in the monomer, but
to contain some a-helix in the trimer [22]. However,
minor discrepancies amongst disorder predictions, as

well as between disorder and secondary structure
predictions, suggest that several peptide segments
within the protein sequence might display chameleon
structural behaviour. In this regard, previous experi-
ments in buffer solution [18] have shown that a
structural rearrangement of SV-IV takes place after
treatment with 0.2–6.0 mm SDS. As this interval
includes the critical micellar concentration of the sur-
factant (2.6 mm) [62,63], it may be inferred that
SV-IV interacts with the membrane-like environment
of SDS micelles, either through direct formation of a
protein–surfactant complex or by an indirect process
in which the micelle is formed first and the protein
is then inserted into it. This process is totally differ-
ent from the non-specific massive cooperative binding
of SDS to proteins at submicellar concentrations,
and mimics the situation that SV-IV experiences in
most cell-based biological assays, where its multi-
faceted biological function involves efficient binding
to the plasma membrane of its target cells (macro-
phages, T lymphocytes and polymorphonuclear cells)
at specific sites (K
d
@ 10
)7
–10
)8
) [16], and can be
obtained only through large plasticity of the
structure.

Materials and methods
Protein databases
The database of disordered proteins was created using a list
of natively unfolded proteins [39] and the SWISS-PROT
protein sequence data bank [64]. The ideal database of
globular proteins is available at the address http://phys.
protres.ru/resources/folded_80.html [35,37], as selected by
inspecting the four general classes in the SCOP database
(1.63 release) [65].
Physicochemical parameters
The mean protein hydrophobicity was calculated using the
Kyte–Doolittle Scale [66], rescaled to a range of 0–1 [33].
The expected average number of contacts per residue in the
globular state was calculated according to [35]. The mean
net charge was defined as the absolute value of the differ-
ence between the numbers of positively and negatively
charged residues at pH 7.0, divided by the total residue
number, according to [39]. The average structural B-factor
(isotropic temperature factor) scale (2.0 SD) was obtained
from [32], where only the B-factors for the C-a atoms were
considered to minimize influence by crystal packing and
other structural artefacts.
Predictors of disorder
Below, we list all predictors used in this study, pointing out
their salient features. A detailed description of each predic-
tor is outside the scope of this paper, and the reader inter-
ested in more details is invited to refer to the relevant
article(s). The seg algorithm ( />METHODS/seg.server.html), based on the rationale that
compact globular structures exhibit quasi-random statistical
properties, is designed to detect regions of biased amino

acid composition using mathematically defined properties
[47]. The stringency of the search for low-complexity
segments is determined by three user-defined parameters
[trigger window, W; trigger complexity, K(1); extension
complexity, K(2)], using the seg sequences 45, 3.4, 3.75 and
25, 3.0, 3.3 for long and short non-globular domains,
respectively. Predictors of natural disordered regions
(PONDRs) included in the pondr collection (http://
www.pondr.com) are typically feed-forward neural net-
works trained on non-redundant sets of ordered and disor-
dered sequences that help to ensure modest predictor biases
and to enable the predictors to generalize to new sequences
[27–29]. PONDRs come in several versions depending on
the sequence attributes taken over windows of 9–21 amino
acids. These attributes, such as the fractional composition
of particular amino acids, hydropathy or sequence com-
plexity, are averaged over these windows, and the values
are used to train the neural network during predictor con-
struction. The same values are used as inputs to make pre-
dictions. The regional order neural network (ronn)
software, originally developed to identify protease cleavage
sites, is a method based on sequence alignment available at
[48]. The iupred server
at estimates favourable pairwise con-
tacts in protein sequences and assigns order ⁄ disorder status
based on the assumption that intrinsically unstructured ⁄
disordered proteins and domains (IUPs) have special
sequences that do not fold because of their inability to
form sufficient stabilizing inter-residue interactions [49].
The disembl software available at is

Intrinsic disorder in SV-IV S. Vilasi and R. Ragone
770 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
based on artificial neural networks trained to assign disor-
der by using three different definitions of disorder: residues
within loops ⁄ coils, residues within loops with a high degree
of mobility as determined from X-ray temperature factors
(B-factors), and residues with PDB missing coordinates as
defined by Remark465 entries in PDB [34]. The disopred
2
disorder prediction server at />disopred restrains the definition of disorder to those resi-
dues that appear in the sequence records but with coordi-
nates missing from the electron density map, and an SVM
was trained to specifically recognize these [50]. globplot
() is a web service based on the ten-
dency of residues to be in an ordered or disordered state,
and uses different propensity sets based on amino acid
hydrophobicities (Kyte–Doolittle and Hopp–Woods), B-fac-
tors, PDB missing coordinates and secondary structure-
derived propensities (Deleage–Roux and Russell–Linding)
[32]. norsp is an on-line predictor of NORS regions that is
not trained on any dataset and predicts segments in which
the content in regular secondary structure is below
12% over at least 70 consecutive residues, and at least
10 consecutive residues are predicted to be exposed. It can
be accessed at />NORSp [51]. The identification of hydrophobic clusters was
performed by hca available at ,
which allows the easy identification of globular regions
from non-globular ones and, in globular regions, the identi-
fication of secondary structures [30]. prelink (http://
genomics.eu.org/spip/PreLink) is an hca-derived method

that calculates the amino acid distributions in structured
and unstructured regions, the probability that a given
sequence fragment is part of either a structured or an
unstructured region, and the distance of each amino acid to
the nearest hydrophobic cluster. Using these three values
along a protein sequence, unstructured regions can be pre-
dicted with very simple rules [31]. The multicoil program
( />predicts the location of coiled-coil regions in amino acid
sequences and classifies the predictions as dimeric or tri-
meric [55]. coils ( />form.html) is a program that compares a sequence with a
database of known parallel two-stranded coiled-coils and
derives a similarity score. By comparing this score with the
distribution of scores in globular and coiled-coil proteins,
the program then calculates the probability that the
sequence will adopt a coiled-coil conformation [56].
Predictions with improved performance were carried out
by the foldunfold web server available at http://skuld.
protres.ru/~mlobanov/ogu/ogu.cgi, based on the observa-
tion that disorder is connected to a weak expected packing
density, as evaluated by the observed number of contacts
within 8 A
˚
for each amino acid residue in the globular state
[35–38], and the SVM-based poodle (prediction of order
and disorder by machine learning, />poodle) system. The poodle suite predicts protein disorder
from amino acid sequences and provides three types of pre-
dictions: poodle-l and poodle-s predict long disorder
regions (mainly longer than 40 consecutive amino acids) and
short disorder regions, respectively; poodle-w is for binary
prediction of whole protein disorder [44–46].

Another SVM method for recognizing IDPs was applied
according to the procedure described in [26,57], using the
mySVM implementation of SVM theory by Ru
¨
ping [67].
The set of artificial parameters for 20 amino acid residues
calculated by the Monte Carlo algorithm to maximally sep-
arate natively unfolded and ideally globular proteins was
obtained from [35]. Additional predictions were performed
by: dispro software ( />html), which relies on machine learning methods and lever-
ages evolutionary information as well as predicted second-
ary structure and relative solvent accessibility [58]; the VL2
and VL3 predictors available at />disprot/predictor.php, which rely on partitioning protein
disorder into flavours based on competition amongst
increasing numbers of predictors [59] and on an ensemble
of feed-forward neural networks based on the same attri-
butes as VL2 [60], respectively; and the drippred server
( developed for
sequence profile visualization and contact map prediction,
which predicts structural disorder by looking for sequence
patterns that are not typically found in the PDB [61].
Acknowledgements
This paper is dedicated to the memory of the unforget-
table Harold C. Helgeson (a.k.a. Hal), founder of the
Laboratory of Theoretical Geochemistry and Biogeo-
chemistry at U. C. Berkeley (a.k.a. Prediction Central),
who is probably sailing off the coast near Margarita-
ville. The authors are grateful to V. N. Uversky for his
help in creating the list of natively unfolded proteins.
References

1 Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM &
Obradovic Z (2002) Intrinsic disorder and protein func-
tion. Biochemistry 41, 6573–6582.
2 Wright PE & Dyson HJ (1999) Intrinsically unstruc-
tured proteins: re-assessing the protein structure–func-
tion paradigm. J Mol Biol 293, 321–331.
3 Dyson HJ & Wright PE (2005) Intrinsically unstruc-
tured proteins and their functions. Nat Rev Mol Cell
Biol 6, 197–208.
4 Dunker AK & Obradovic Z (2001) The protein trinity –
linking function and disorder. Nat Biotechnol 19, 805–
806.
5 Uversky VN (2002) Natively unfolded proteins: a point
where biology waits for physics. Protein Sci 11, 739–
756.
S. Vilasi and R. Ragone Intrinsic disorder in SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 771
6 Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic
Z, Uversky VN & Dunker AK (2007) Intrinsic disorder
and functional proteomics. Biophys J 92, 1439–1456.
7 Tompa P (2002) Intrinsically unstructured proteins.
Trends Biochem Sci 27, 527–533.
8 Metafora S, Esposito C, Caputo I, Lepretti M, Cassese
D, Dicitore A, Ferranti P & Stiuso P (2007) Seminal
vesicle protein IV and its derived active peptides: a pos-
sible physiological role in seminal clotting. Semin
Thromb Hemost 33, 53–59.
9 Ostrowski MC, Kistler MK & Kistler WS (1979) Purifi-
cation and cell-free synthesis of a major protein from
rat seminal vesicle secretion. A potential marker for

androgen action. J Biol Chem 254, 383–390.
10 Pan Y-CE & Li SSL (1982) Structure of secretory pro-
tein IV from rat seminal vesicles. Int J Pept Protein Res
20, 177–187.
11 Harris SE, Mansson P-E, Tully DB & Burkhart B
(1983) Seminal vesicle secretion IV gene: allelic differ-
ence due to a series of 20-base-pair direct tandem
repeats within an intron. Proc Natl Acad Sci USA 80 ,
6460–6464.
12 Kandala C, Kistler MK, Lawther RP & Kistler WS
(1983) Characterization of a genomic clone for rat semi-
nal vesicle secretory protein IV. Nucleic Acids Res 11,
3169–3186.
13 McDonald C, Williams L, McTurck P, Fuller F,
McIntosh E & Higgins S (1983) Isolation and charac-
terisation of genes for androgen-responsive secretory
proteins of rat seminal vesicles. Nucleic Acids Res 11,
917–930.
14 D’Ambrosio E, Del Grosso N, Ravagnan G, Peluso G
& Metafora S (1993) Cloning and expression of the rat
genomic DNA sequence coding for the secreted form of
the protein SV-IV. Bull Mol Biol Med 18, 215–223.
15 Metafora S, Facchiano F, Facchiano A, Esposito C,
Peluso G & Porta R (1987) Homology between rabbit
uteroglobin and the rat seminal vesicle sperm binding
protein: prediction of structural features of glutamine
substrates for transglutaminase. J Protein Chem 6,
353–359.
16 Ialenti A, Santagada V, Caliendo G, Severino B,
Fiorino F, Maffia P, Ianaro A, Morelli F, Di Micco B,

Cartenı
`
M et al. (2001) Synthesis of novel anti-inflam-
matory peptides derived from the amino-acid sequence
of the bioactive protein SV-IV. Eur J Biochem 268,
3399–3406.
17 Miele L, Cordella-Miele E, Facchiano A & Mukherjee
AB (1988) Novel anti-inflammatory peptides from the
region of highest similarity between uteroglobin and
lipocortin I. Nature 335, 726–730.
18 Stiuso P, Ragone R, De Santis A, Metafora S, Peluso
G, Ravagnan G & Colonna G (1989) Structural
properties of rat seminal vesicle protein IV: effect of
sodium dodecylsulfate. In Biochemical Aspects on the
Immunopathology of Reproduction (Spera G, Mukherjee
AB, Ravagnan G & Metafora S, eds), pp. 105–111.
Acta Medica, Rome.
19 Ragone R, Facchiano F, Facchiano A, Facchiano AM
& Colonna G (1989) Flexibility plot of proteins. Protein
Eng 2, 497–504.
20 Stiuso P, Metafora S, Facchiano AM, Colonna G &
Ragone R (1999) The self association of protein SV-IV
and its possible functional implications. Eur J Biochem
266, 1029–1035.
21 Tufano MA, Porta R, Farzati B, Di Pierro P,
Rossano F, Catalanotti P, Baroni A & Metafora S
(1996) Rat seminal vesicle protein SV-IV and its
transglutaminase-synthesized polyaminated derivative
Spd
2

-SV-IV induce cytokine release from human rest-
ing lymphocytes and monocytes in vitro. Cell Immunol
168, 148–157.
22 Caporale C, Caruso C, Colonna G, Facchiano A, Ferr-
anti P, Mamone G, Picariello G, Colonna F, Metafora
S & Stiuso P (2004) Structural properties of the protein
SV-IV. Eur J Biochem 271, 263–271.
23 Ferron F, Longhi S, Canard B & Karlin D (2006) A
practical overview of protein disorder prediction meth-
ods. Proteins 65, 1–14.
24 Romero P, Obradovic Z, Li X, Garner EC, Brown CJ
& Dunker AK (2001) Sequence complexity of dis-
ordered protein. Proteins 42, 38–48.
25 Sickmeier M, Hamilton JA, LeGall T, Vavic V, Cortese
MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky
VN et al. (2007) DisProt: the database of disordered
proteins. Nucleic Acids Res 35, D786–793.
26 Weathers EA, Paulaitis ME, Woolf TB & Hoh JH
(2004) Reduced amino acid alphabet is sufficient to
accurately recognize intrinsically disordered protein.
FEBS Lett 576, 348–352.
27 Romero P, Obradovic Z & Dunker AK (1997) Sequence
data analysis for long disordered regions prediction in
the calcineurin family. Genome Inform 8, 110–124.
28 Li X, Romero P, Rani M, Dunker AK & Obradovic Z
(1999) Predicting protein disorder for N-, C-, and inter-
nal regions. Genome Inform 10, 30–40.
29 Obradovic Z, Peng K, Vucetic S, Radivojac P & Dun-
ker AK (2005) Exploiting heterogeneous sequence prop-
erties improves prediction of protein disorder. Proteins

61 (Suppl. 7), 176–182.
30 Gaboriaud C, Bissery V, Benchetrit T & Mornon JP
(1987) Hydrophobic cluster analysis: an efficient new
way to compare and analyse amino acid sequences.
FEBS Lett 224, 149–155.
31 Coeytaux K & Poupon A (2005) Prediction of unfolded
segments in a protein sequence based on amino acid
composition. Bioinformatics 21, 1891–1900.
32 Linding R, Russell RB, Neduva V & Ginson TJ (2003)
GlobPlot: exploring protein sequences for globularity
and disorder. Nucleic Acids Res 31, 3701–3708.
Intrinsic disorder in SV-IV S. Vilasi and R. Ragone
772 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
33 Prilusky J, Felder CE, Zeev-Ben-Mordehai T, Rydberg
EH, Man O, Beckmann JS, Silman I & Sussman JL
(2005) FoldIndex: a simple tool to predict whether a
given protein sequence is intrinsically unfolded. Bioin-
formatics 21, 3435–3438.
34 Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ &
Russell RB (2003) Protein disorder prediction: implica-
tions for structural proteomics. Structure 11, 1453–1459.
35 Garbuzynskiy SO, Lobanov MY & Galzitskaya OV
(2004) To be folded or to be unfolded? Protein Sci 13,
2871–2877.
36 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY
(2006) FoldUnfold: web server for the prediction of
disordered regions in protein chain. Bioinformatics 22,
2948–2949.
37 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY
(2006) Prediction of natively unfolded regions in protein

chain. Mol Biol (Moscow) 40, 341–348.
38 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY
(2006) Prediction of amyloidogenic and disordered
regions in protein chains. PLoS Comput Biol 2, 1639–
1648.
39 Uversky VN, Gillespie JR & Fink AL (2000) Why are
‘natively unfolded’ proteins unstructured under physio-
logic conditions? Proteins 41, 415–427.
40 Bourhis JM, Canard B & Longhi S (2007) Predicting
protein disorder and induced folding: from theoretical
principles to practical applications. Curr Protein Pept
Sci 8, 135–149.
41 Quevillon-Cheruel S, Leulliot N, Gentils L, van Tilbe-
urgh H & Poupon A (2007) Production and crystalliza-
tion of protein domains: how useful are disorder
predictions? Curr Protein Pept Sci 8, 151–160.
42 Doszta
´
nyi Z, Sa
´
ndor M, Tompa P & Simon I (2007)
Prediction of protein disorder at the domain level. Curr
Protein Pept Sci 8, 161–171.
43 Csizmo
´
k V, Doszta
´
nyi Z, Simon I & Tompa P (2007)
Towards proteomic approaches for the identification of
structural disorder. Curr Protein Pept Sci 8, 173–179.

44 Hirose S, Shimizu K, Kanai S, Kuroda Y & Noguchi
T. (2007) POODLE-L: a two-level SVM prediction
system for reliably predicting long disordered regions.
Bioinformatics 23, 2046–2053.
45 Shimizu K, Hirose S & Noguchi T (2007) POODLE-S:
web application for predicting protein disorder by using
physicochemical features and reduced amino acid set of
a position specific scoring matrix. Bioinformatics 23 ,
2337–2338.
46 Shimizu K, Muraoka Y, Hirose S, Tomii K & Noguchi
T (2007) Predicting mostly disordered proteins by using
structure-unknown protein data. BMC Bioinformatics 8,
78.
47 Wootton JC (1994) Non-globular domains in protein
sequences: automated segmentation using complexity
measures. Comput Chem 18, 269–285.
48 Yang ZR, Thomson R, McNeil P & Esnouf RM (2005)
ronn: the bio-basis function neural network technique
applied to the detection of natively disordered regions
in proteins. Bioinformatics 21, 3369–3376.
49 Doszata
´
nyi Z, Csizmok V, Tompa P & Simon I (2005)
IUPred: web server for the prediction of intrinsically
unstructured regions of proteins based on estimated
energy content. Bioinformatics 21, 3433–3434.
50 Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF & Jones
DT (2004) Prediction and functional analysis of native
disorder in proteins from the three kingdoms of life.
J Mol Biol 337, 635–645.

51 Liu J & Rost B (2003) NORSp: predictions of long
regions without regular secondary structure. Nucleic
Acids Res 31, 3833–3835.
52 Bornberg-Bauer E, Rivals E & Vingron M (1998) Com-
putational approaches to identify leucine zippers.
Nucleic Acids Res 26, 2740–2746.
53 Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B,
Cao ZW & Chen YZ (2006) Prediction of the functional
class of metal-binding proteins from sequence derived
physicochemical properties by support vector machine
approach. BMC Bioinformatics 7 (Suppl. 5), S13.
54 Cai CZ, Han LY, Ji ZL, Chen X & Chen YZ (2003)
SVM-Prot: web-based support vector machine software
for functional classification of a protein from its pri-
mary sequence. Nucleic Acids Res 31, 3692–3697.
55 Wolf E, Kim PS & Berger B (1997) multicoil: a pro-
gram for predicting two- and three-stranded coiled coils.
Protein Sci 6, 1179–1189.
56 Lupas A, Van Dyke M & Stock J (1991) Predicting
coiled coils from protein sequences. Science 252, 1162–
1164.
57 Weathers EA, Paulaitis ME, Woolf TB & Hoh JH
(2007) Insights into protein structure and function from
disorder-complexity space. Proteins 66, 16–28.
58 Cheng J, Sweredoski M & Baldi P (2005) Accurate pre-
diction of protein disordered regions by mining protein
structure data. Data Min Knowl Disc 11, 213–222.
59 Vucetic S, Brown CJ, Dunker AK & Obradovic Z
(2003) Flavors of protein disorder. Proteins 52, 573–584.
60 Obradovic Z, Peng K, Vucetic S, Brown CJ, Radivojac

P, Brown CJ & Dunker AK (2003) Predicting intrinsic
disorder from amino acid sequence. Proteins 53 (Suppl.
6), 566–572.
61 MacCallum RM (2004) Striped sheets and protein con-
tact prediction. Bioinformatics 20 (Suppl. 1), I224–I231.
62 Esposito C, Colicchio P, Facchiano A & Ragone R
(1998) Effect of a weak electrolyte on the critical micel-
lar concentration of sodium dodecyl sulfate. J Colloid
Interface Sci 200, 310–312.
63 Ambrosone L & Ragone R (1998) The interaction of
micelles with added species and its similarity to the
denaturant binding model of proteins. J Colloid Inter-
face Sci 205, 454–458.
S. Vilasi and R. Ragone Intrinsic disorder in SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 773
64 Bairoch A & Apweiler R (2000) The SWISS-PROT
protein sequence database and its supplement
TrEMBL in 2000. Nucleic Acids Res 28, 45–48.
65 Murzin AG, Brenner SE, Hubbard T & Chothia C
(1995) SCOP: a structural classification of protein
database for the investigation of sequences and struc-
tures. J Mol Biol 247, 536–540.
66 Kyte J & Doolittle RF (1982) A simple method for dis-
playing the hydropathic character of a protein. J Mol
Biol 157, 105–132.
67 Ru
¨
ping S (2000) MySVM-Manual. University of
Dortmund, Germany. Lehrstuhl Informatik 8, http://
www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM

Accessed on 29 October 2007.
Intrinsic disorder in SV-IV S. Vilasi and R. Ragone
774 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS

×