Site-directed enzymatic PEGylation of the human
granulocyte colony-stimulating factor
Carlo Maullu
1,
*, Domenico Raimondo
2,3,
*, Francesca Caboi
1
, Alejandro Giorgetti
2,4
, Mauro Sergi
1
,
Maria Valentini
2
, Giancarlo Tonon
1
and Anna Tramontano
2,3,5
1 Bio-Ker S.r.l., c/o Sardegna Ricerche Scientific Park, Pula, Cagliari, Italy
2 CRS4-Bioinformatics Laboratory, c ⁄ o Sardegna Ricerche Scientific Park, Pula, Cagliari, Italy
3 Department of Biochemical Sciences ‘A. Rossi Fanelli’, University of Rome ‘La Sapienza’, Italy
4 Department of Biotechnology, University of Verona, Italy
5 Pasteur Institute–Cenci Bolognetti Foundation, University of Rome ‘La Sapienza’, Italy
Introduction
The conjugation of poly(ethylene glycol) (PEG) chains,
termed PEGylation, is a useful methodology for drug
development that is widely used for the modification
of proteins, peptides, and oligonucleotides [1,2].
PEG is a noncharged, highly hydrophilic polymer
that has been demonstrated to be nontoxic when its
molecular mass is lower than 1000 Da, and its use for
conjugation has been approved by the US Food and
Drug Administration [3]. The PEGylation of pharma-
ceuticals, such as liposomes and therapeutic proteins,
has been shown to be an effective strategy for
improvement of the biopharmaceutical properties of
drugs. PEG–drug conjugates have several advantages:
increased stability and water solubility, increased resis-
Keywords
molecular dynamics; PEGylation;
protein–protein docking; site-directed
mutagenesis; transglutamination
Correspondence
A. Tramontano, Department of Biochemical
Sciences ‘A. Rossi Fanelli’, University of
Rome ‘La Sapienza’, P.le Aldo Moro, 5,
00185 Rome, Italy
Fax: +39 06 4440062
Tel: +39 06 49910556
E-mail:
Website: />*These authors contributed equally to this
work
(Received 12 July 2009, revised 14
September 2009, accepted 16 September
2009)
doi:10.1111/j.1742-4658.2009.07387.x
Poly(ethylene glycol) (PEG) is a widely used polymer employed to increase
the circulating half-life of proteins in blood and to decrease their immuno-
genicity and antigenicity. PEG attaches to free amines, typically at lysine
residues or at the N-terminal amino acid. This lack of selectivity can pres-
ent problems when a PEGylated protein therapeutic is being developed,
because predictability of activity and manufacturing reproducibility are
needed for regulatory approval. Enzymatic modification of proteins is one
route to overcome this limitation. Bacterial transglutaminases are enzyme
candidates for site-specific modification, but they also have rather broad
specificity. The need arises to be able to predict a priori potential PEGyla-
tion sites on the protein of interest and, especially, to be able to design
mutants where unique PEGylation sites can be introduced when needed.
We investigated the feasibility of a computational approach to the prob-
lem, using human granulocyte colony-stimulating factor as a test case. The
selected protein is therapeutically relevant and represents a challenging
problem, as it contains 17 potential PEGylation sites. Our results show that
a combination of computational methods allows the identification of the
specific glutamines that are substrates for enzymatic PEGylation by a
microbial transglutaminase, and that it is possible to rationally modify the
protein and introduce PEG moieties at desired sites, thus allowing the
selection of regions that are unlikely to interfere with the biological activity
of a therapeutic protein.
Abbreviations
G-CSF, granulocyte colony-stimulating factor; MD, molecular dynamics; mPEG, monomethoxy-poly(ethylene glycol); MTGase, microbial
transglutaminase; PEG, poly(ethylene glycol); RMSF, root mean squared fluctuation.
FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS 6741
tance to proteolytic inactivation, low toxicity,
improved pharmacokinetic profiles, and reduced renal
clearance and immunogenicity [4,5].
Thanks to these favorable properties, PEGylation
plays an important role in drug delivery, enhancing the
potential of peptides and proteins as therapeutic
agents.
PEGylation was first described in the 1970s by
Davies and Abuchowsky, and was reported in two
key papers on albumin and catalase modification
[6,7]. Since then, the procedure of PEGylation has
been expanded, and a wide range of chemical
and enzymatic methods for conjugation have been
developed.
The most widely used modification method for pro-
tein PEGylation involves the covalent conjugation of
activated monomethoxy-PEG (mPEG) at the level of
the e-amino group of lysine residues by using acylating
mPEG derivatives. This strategy has limitations,
because of the potential multiple sites of conjugation
and the consequent heterogeneity of the PEGylated
proteins. The purification of these mixtures is usually
difficult, and this reduces the predictability of their
activity and manufacturing reproducibility needed for
regulatory approval.
The requirements for the approval of new conju-
gates are very stringent, and obtaining a single
isomer, whenever possible, or at least a well-character-
ized mixture of mono-PEGylated isomers is compul-
sory. Examples are the two a-interferon conjugates,
Pegasys [8] and PEG-Intron [9], for which almost all
the binding sites in the primary sequence were charac-
terized.
In order to obtain site-specific PEGylation, other
chemical approaches were developed, such as the
selective PEGylation at the level of the thiol group of
cysteines or at the N-terminal amino group of a poly-
peptide chain [10,11]. More recently, a very promising
enzymatic method has been proposed that makes use
of the transglutaminase enzyme for the covalent link-
age of PEG moieties at the c-carboxamide groups of
glutamines of proteins [12,13]. For this purpose, an
mPEG derivative bearing a primary amino group is
used (mPEG-NH
2
); this becomes covalently linked to
the protein at glutamines through a transglutamination
reaction catalyzed by the enzyme according to the
following scheme:
protein-CONH
2
þ H
2
N-R !
protein-CONH-R þ NH
3
where CONH
2
is a carboxamide group of glutamine
side chains, and R is an mPEG molecule.
In this work, we investigated the molecular basis of
enzymatic conjugation of PEG molecules to glutamines
by a microbial transglutaminase enzyme (MTGase) deri-
ved from a variant of Streptoverticillium mobaraense.
The granulocyte colony-stimulating factor (G-CSF) was
used as substrate. It is a challenging case, because it con-
tains 17 potential PEGylation sites and, at the same
time, an important target, as it acts in hematopoiesis by
controlling the production, differentiation and function
of granulocytes. It is pharmaceutically available under
the names Neupogen or Granulokine (produced by
Escherichia coli cells; Amgen, Thousand Oaks, CA,
USA ⁄ Roche, Nutley, NJ, USA) and Granocyte (pro-
duced in mammalian cells; Rhone-Poulenc, Rorer,
Cologne, France), and is used to treat neutropenia, a
disorder characterized by an extremely low number of
neutrophils in blood. Although widely used, G-CSF is
rapidly removed from the body by a combination of
renal and active neutrophil clearance processes. As a
result, for most practical purposes, repeated injections
or continuous infusion of G-CSF are necessary to gener-
ate sufficiently elevated neutrophil and mobilized pro-
genitor ⁄ stem cell levels in the peripheral blood [14]. For
this reason, the PEGylation of G-CSF, and ⁄ or design of
new variants with longer circulation times, together with
a thorough characterization of the mechanism underly-
ing the process of PEGylation, are essential steps for the
design of new and more effective therapeutic proteins.
We report here a computational approach aimed at
identifying the glutamines modified by the enzyme. We
used three-dimensional structural analysis, molecular
dynamics (MD) simulations, and protein–protein dock-
ing calculations. All of these approaches allowed us to
identify a single potential PEGylation site in the G-CSF
molecule, a prediction that was subsequently validated
by site-directed mutagenesis experiments, PEGylation
experiments, and analytical analysis of PEGylated
G-CSF by peptide mapping and N-terminal sequence
analysis. All of the data obtained from these experi-
ments confirmed our computational results on the iden-
tification of a single G-CSF residue that is the target of
PEGylation modification by MTGase. Moreover, the
characterization of the dynamic properties of the
G-CSF region involved in the transglutamination pro-
cess was also demonstrated to be useful for the design of
mutants with different PEGylation properties.
Results
G-CSF sequence and structure analysis
The G-CSF primary structure (UniProtKB ⁄ Swiss-Prot
accession code: P09919) includes 17 glutamines that, in
PEGylation of the G-CSF molecule by MTGase enzyme C. Maullu et al.
6742 FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS
principle, are candidates for transglutamination by
MTGase (Fig. 1).
Our aim was to identify the G-CSF reactive gluta-
mine(s) involved in the transglutamination process,
under the assumptions that they are exposed to the
solvent, highly flexible, and in a region that can
undergo favorable interactions with the enzyme active
site. As a first step, we evaluated the accessible surface
area of each of these 17 glutamines. Glutamines were
considered to be buried when < 25% of their total
area was exposed to solvent: there are eight glutamines
satisfying this condition (Table 1).
The substrate specificity of MTGase is rather broad
[15,16]. In general, broad specificity requires flexibility
of the substrate, which is expected to be able to adapt
to the enzyme conformation. It follows that the site of
PEGylation should not be part of regular secondary
structure elements [13,17], and this latter requirement
reduced our candidate list to five glutamines: Gln11,
Gln67, Gln70, Gln131, and Gln134. Incidentally,
Gln131 and Gln134 are very close to Thr133, which is
the glycosylation site of natural G-CSF, confirming
that they are accessible and potentially more reactive.
Note that the nonglycosylated recombinant protein
expressed in E. coli is active, and therefore, even if
transglutamination impaired glycosylation at the
neighboring site, the protein function should not be
affected.
Both the glycosylated Thr133 and the five candidate
glutamines, Gln11, Gln67, Gln70, Gln131, and Gln134,
are very well conserved among different species
(Fig. 1B). In humans, there are four splicing variants of
G-CSF annotated in ensembl (G-CSF ensembl acces-
sion code: ENSG00000108342), although only two of
them are annotated in the UniprotKB database. They
differ in the N-terminal region of the protein, which is
far away both in sequence and structure from the
glycosylation and putative transglutamination sites,
which are conserved in all of them.
MD simulations
Carefully performed MD simulations can highlight
flexible regions of proteins. We performed two differ-
ent 10 ns MD simulations on the wild-type G-CSF
monomeric subunit and on a G-CSF structure in
which we made two single amino acid substitutions
(P132Q and Q134N). We selected P132Q and Q134N
mutations in order to build a molecule with different
transglutamination properties. Removal of Pro132
could lead to increased local flexibility of Gln131,
making it an appropriate substrate for transglutamina-
tion, whereas the Q134N mutation would remove the
putative transglutamination site of the wild-type mole-
cule. The MD simulation for the double mutant
P132Q ⁄ Q134N (defined as Mut4) was run under the
same conditions used for the wild-type protein.
A
B
Fig. 1. (A) G-CSF protein sequence as reported in the Protein Data
Bank entry 2D9Q SEQRES records. Secondary structure elements
are marked above the sequence. The positions of the 17 gluta-
mines present in the wild-type G-CSF are in blue boxes. Glutamines
showing high structural flexibility in the MD experiments are indi-
cated by stars. (B) A sequence logo representation [35] of the mul-
tiple sequence alignment of human G-CSF and its orthologous
proteins. It consists of stacks of symbols, one for each position in
the protein sequence; the overall height of the stack indicates the
sequence conservation at that position, and the height of symbols
within the stack indicates the relative frequency of each amino acid
at that position. The blue arrows indicate the potential glycosylation
and transglutamination sites. V8 protease preferential cleavage
sites are marked with red arrows.
C. Maullu et al. PEGylation of the G-CSF molecule by MTGase enzyme
FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS 6743
The analysis of the trajectories of the equilibrated
MD simulation showed that the root mean squared
fluctuations (RMSFs) of the protein have their highest
peaks around the positions corresponding to Gln134
and Gln131, belonging to a highly mobile region of
the protein, in good agreement with the b-factor values
reported in the Protein Data Bank entry (Fig. 2). We
also analyzed the w ⁄ u angle variation during the MD
simulation. The Ramachandran plots reported in
Fig. 3 show that Gln134 is able to explore a very
broad combination of dihedral angles (i.e. all of the
allowed conformations of the classic Ramachandran
plot), which is not the case for Gln131.
The differences in local flexibility observed for
Gln131 and Gln134 could be explained by the proxim-
ity of Pro132 to Gln131. The rigidity of the proline
might reduce the potential flexibility of the neighboring
side chain. In conclusion, our analysis suggested that
Gln134 is the most likely substrate for PEGylation.
In the mutant, both Gln131 and Gln132 were able
to explore a broader range of the Ramachandran
regions, almost as broad as that of Gln134 in the wild-
type protein (Fig. 3).
Overall, sequence, structure and dynamic analysis
of G-CSF molecule indicate that Gln134 is the
most likely transglutamination site, and that the
P132Q ⁄ Q134N double mutant should behave differ-
Table 1. Solvent-accessible area and secondary structure of the 17
glutamines present in the wild-type G-CSF. The first column reports
the position of the glutamines in the wild-type protein sequence.
The second column indicates the percentage of residue exposure
(we consider a glutamine residue to be exposed when the reported
value is grater than 25%). The third column reports the secondary
structure context of each of the glutamines. The five candidate glu-
tamines that are exposed and outside regular secondary structure
elements are in bold type.
Gln
Solvent-accessible
area, G-CSF Secondary structure
11 44.18 At the N-terminus of a1
20 15.66 In a1
25 22.51 In a1
32 22.34 In a1
67 25.53 In the loop closed by the
Cis64–Cys74 disulfide bridge
70 80.74 In the loop closed by the
Cys64–Cys74 disulfide bridge
77 19.32 In a3
86 12.81 In a3
90 48.65 At the C-terminus of a3
107 14.34 In a4
119 59.5 In a4
120 14.62 In a4
131 69.65 In the loop between helices
a4 and a5
134 32.57 In the loop between helices
a4 and a5
145 28.71 In a5
158 19.5 In a5
173 72.04 C-terminus
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Residue
0
0.1
0.2
0.3
0.4
0
.5
RMSF (nm)
Gln131
Gln134
Fig. 2. RMSFs of the G-CSF Ca atoms during the entire MD simu-
lation. The points corresponding to Gln131 and Gln134 RMSFs are
indicated by a red and a green circle, respectively.
–180 –120 –60 0 60 120
Phi
–180
–120
–60
0
60
120
180
Psi
Gln131
–180 –120 –60 0 60 120 180
Phi
–180
–120
–60
0
60
120
180
Psi
Gln132
–180 –120 –60 0 60 120
Phi
–180
–120
–60
0
60
120
180
Psi
Gln131AB
CD
–180 –120 –60 0 60 120 180
Phi
–180
–120
–60
0
60
120
180
Psi
Gln134
Fig. 3. Ramachandran plots showing the u ⁄ w angle variation along
the MD simulations of selected glutamines. (A–D) Plots corre-
sponding to Gln131 and Gln134 in the MD simulation of the wild
type and of Gln131 and Gln132 in the simulation of Mut4, respec-
tively.
PEGylation of the G-CSF molecule by MTGase enzyme C. Maullu et al.
6744 FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS
ently and be transglutaminated on Gln131 and ⁄ or
Gln132.
Molecular docking analysis of MTGase
⁄
G-CSF
One puzzling observation derived from the structural
analysis of MTGase (Protein Data Bank accession
code: 1IU4) is that the active site of the enzyme is
located in a shallow crevice surrounded by two loops,
and this is difficult to reconcile with the broad specific-
ity of the enzyme. This is confirmed by protein–protein
docking calculation.
A local version of the rosettadock program [18]
was used to predict protein–protein interaction
between G-CSF and MTGase (G-CSF–closed-MTGase
and G-CSF–open-MTGase). None of the docking
solutions that we obtained involved interactions of
G-CSF with the enzyme active site, not even when we
included distance restrains of 7 A
˚
between the amino
acids hypothesized to be involved in the interaction,
Gln131 and Gln134 from G-CSF, and Cys64 from
MTGase.
The active site is surrounded by two loops that are
likely to be flexible, and therefore we hypothesized that
they can also assume a conformation different from
that observed in the X-ray structure. The G-CSF struc-
ture was modified by exciting the low-energy modes of
the system. In particular, by deforming the structure
along the lowest-energy mode, it was possible to gener-
ate an ‘open’ conformation of the enzyme (Fig. 4).
Next, two different systems were tested by the
rosettadock protein–protein docking program, and
100 000 decoys were produced for each of them. The
analyzed systems were Gln134-restrained docking of
both closed-MTGase–G-CSF and open-MTGase–
G-CSF; Gln134-restrained docking means that we per-
formed the docking protocol with the inclusion of
distance restraints of 7 A
˚
between the G-CSF Gln134
and the active site residue Cys64 of MTGase.
Using the open conformation of the enzyme, we
were able to retrieve eight configurations fulfilling the
distance constraint. Seven of the poses differ by
< 1.6 A
˚
rmsd from each other (Fig. 5).
Experimental validation
To validate our computational predictions about
PEGylation site, we analyzed the properties of the
wild-type G-CSF and of the following mutants:
Q131N, Q134N, Q173N and P132Q ⁄ Q134N (Mut1–
Fig. 4. Optimal three-dimensional superposition of the ‘open’ and
‘closed’ MTGase configurations, represented in pale green and
blue, respectively. The rmsd values between these two conforma-
tions are 1.45 A
˚
and 1.42 A
˚
for all atoms and Ca atoms, respec-
tively. The G-CSF interaction site is expected to be near the active
site residue Cys64, indicated in ball-and-stick representation.
Fig. 5. Model of the interaction between G-CSF (orange) and the
‘open’ conformation of MTGase (blue). The MTGase Asp3, Cys64
(active site residue) and G-CSF Gln134 and Thr133 are shown in
ball-and-stick representation. The hydrogen bond between the
Thr133 side chain and the Asp3 main chain is shown as a green
line.
C. Maullu et al. PEGylation of the G-CSF molecule by MTGase enzyme
FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS 6745
Mut4 in Table 2). Gln173 was chosen because it is
located in the very flexible C-terminal region of the
protein, very close to an a-helix.
The PEGylation reaction results obtained for wild-
type G-CSF and for the four mutants are summarized
in Table 2. They showed that the Q134N mutant was
not PEGylated, whereas PEGylation was only slightly
reduced (85%) in the Q131N and Q173N mutants,
confirming that Gln134 is the only glutamine, among
the 17 present in the molecule, available for the trans-
glutamination reaction. These data convincingly
validate our computational predictions.
Incidentally, it is very relevant that enzymatic
PEGylation of G-CSF gives rise to a site-specific
monoconjugate derivative, which is interesting mole-
cule for therapeutic approaches.
The double mutant Mut4 retains the ability to be
PEGylated to a similar extent as the wild type
(Table 2). As this mutant lacks the Gln134 PEGylation
site, it is likely that the P132Q mutation changes the
properties of Gln131 and ⁄ or Gln132, increasing its
flexibility and making it a better substrate for the
enzyme. However, Mut4 contains other glutamines,
and the possibility cannot be excluded that one of the
others becomes the PEGylation site. To verify which
of the glutamines of Mut4 are transglutaminated, the
PEGylation sites of native and mutated G-CSF were
analyzed by enzymatic digestion with Staphylococ-
cus aureus V8 protease, which is specific for cleavage
at the C-terminus of glutamic acid and aspartic acid
(Fig. 1B).
The RP-HPLC profiles of the two enzymatic diges-
tion mixtures differed mainly by a few peaks that,
in the chromatogram of the PEGylated digestion
mixture, were eluted with retention times correspond-
ing to more hydrophilic molecules, indicating that
these peptides are bound to the PEG chain (data not
shown).
The peptides obtained by enzymatic digestion were
separated by SDS ⁄ PAGE. Figure 6 shows the two
SDS ⁄ PAGE gels stained with barium iodine (lane A),
which highlights the PEG moiety, and with Coomassie
Blue (lane B), which reveals protein and peptides. The
spots corresponding to PEG-bearing peptides were
then electroblotted onto a poly(vinylidene difluoride)
membrane, and the fragments were subjected to N-ter-
minal sequencing.
All fragments started with the sequence LGMAP-
ALQPTQGAMPA and lacked the signal correspond-
ing to Gln134, which is diagnostic of its derivatization.
This result confirmed that Gln134 is the single
PEGylation site of G-CSF, in agreement with the
results obtained by the computational calculations.
PEGylated Mut4, subjected to the same analytical
characterization, did not lack any residue in the N-ter-
minal sequencing of its mono-PEGylated fragments.
This result can be explained by the presence of two
different mono-PEGylated isomers, corresponding to
Gln131 and Gln132, in agreement with the calculations
performed on the mutant. We are led to conclude that
PEGylation of one of the two glutamines impairs the
PEGylation of the neighboring one. The computa-
tional prediction of the relative abundance of the two
PEGylated species would require knowledge of the
structure of the mutant, as it is well known that even
the most advanced docking technologies cannot cope
with cases where the backbone of one of the molecules
changes upon binding [19].
Table 2. PEGylation reaction results for G-CSF and its mutants.
Name Mutant PEGylation yield (%)
G-CSF Wild type 100
Mut1 Q173N 85
Mut2 Q131N 85
Mut3 Q134N 5–6
Mut4 P132Q ⁄ Q134N 80
Fig. 6. SDS PAGE analyses of PEGylated
G-CSF and its V8 protease digested mixture,
stained with barium iodide (A) and Blue
Coomassie (B).
PEGylation of the G-CSF molecule by MTGase enzyme C. Maullu et al.
6746 FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS
Discussion
The computational and experimental analysis of the
PEGylation properties of the G-CSF residues allows
us to confidently conclude the following with regard to
PEGylation by MTGase: (a) the substrate reactive site
should be exposed to solvent and present in a ‘locally’
flexible region; (b) neighboring residues are unlikely to
be PEGylated on the same molecule, possibly because
of steric hindrance; and (c) the presence of a proline
close to the putative site of PEGylation is a limiting
factor that hampers the reaction.
In our view, it is relevant that computational predic-
tions, based on publicly available methods, are nowa-
days sufficiently reliable to allow the identification of
targets of enzymatic modifications and the redesign of
proteins with the desired properties, as substantiated by
the results of our mutant design experiments, where we
could redirect the enzyme specificity to different sites.
Our study was performed on one protein, selected
because it represents a challenging case, with 17 puta-
tive transglutamination sites, and because of its high
therapeutic interest. We believe that our results are
likely to be general, because they are based on reason-
able assumptions (flexibility, exposure to solvent, and
ability to interact with the enzyme). Further experi-
ments on different systems are in progress to substanti-
ate this hypothesis.
Finally, the mono-PEGylated G-CSF molecule
described here is of therapeutic interest, as it is fully
characterized, homogeneously modified, easy to pro-
duce, and expected to have a longer circulating half-
life than the wild-type protein. Pharmacokinetic and
pharmacodynamic studies of the recombinant G-CSF–
Q134-PEG following subcutaneous administration in
normal and neutropenic rats are in progress. Prelimin-
ary results show that our molecule has the same phar-
macological effect as the nonpegylated G-CSF and
better pharmacokinetic parameters.
Experimental procedures
Materials
MTGase from S. mobaraense was purchased from Ajino-
moto (Activa WM, Europe Sales GmbH, Hamburg,
Germany). Recombinant G-CSF and its mutants were pro-
duced by Bio-Ker (c ⁄ o Sardegna Ricerche, Pula, Italy) by a
fusion protein technology [20] (US7,410,775 B2, 12 August
12, 2008, Method for making recombinant peptides or
proteins using soluble endoptroteases).
Endoproteinase Glu-C from St. aureus (V8 protease) was
purchased from Sigma Aldrich (St Louis, MO, USA).
Methoxy-PEG-NH
2
(M
r
20 000) was purchased from
SunBio (San Francisco, CA, USA). Restriction and DNA-
modifying enzymes were purchased from New England
Biolabs (Beverly, MA, USA) and used according to the
manufacturer’s instructions. PfuTurbo Hot Start polymerase
was purchased from Stratagene (La Jolla, CA, USA).
Sequence conservation analysis
The alignment shown in Fig. 1 includes all the species
where a protein orthologous to G-CSF was found,
using the ensembl search for orthology [21]: Bos taurus,
Canis familiaris, Cavia porcellus, Dasypus novemcinctus,
Dipodomys ordii, Echinops telfairi, Equus caballus, Felis
catus, Gorilla gorilla, Loxodonta africana, Macaca mulatta,
Macropus eugenii, Microcebus murinus, Monodelphis
domestica, Mus musculus, Myotis lucifugus, Ochotona
princeps, Ornithorhynchus anatinus, Oryctolagus cuniculus,
Otolemur garnettii, Pan troglodytes, Pipistrellus pygmaeus,
Procavia capensis, Pteropus vampyrus, Rattus norvegicus,
Spermophilus tridecemlineatus, Taeniopygia guttata, Tupaia
belangeri, Tursiops truncatus, and Xenopus tropicalis.
Neither a psi-blast nor a psi-search run against the NR
and UniprotKB databases could identify ortholog-contain-
ing species other than the ones listed above (data not
shown).
Solvent accessibilities and MD simulations
The G-CSF coordinates were retrieved from the Protein
Data Bank (accession code: 2D9Q, chain A). Modeling of
the double mutant of G-CSF was performed with the pro-
gram scrwl [22].
Amino acid solvent-accessible surface area was calculated
using the molmol program [23] and the Scit web server
[24].
MD simulations were performed using the gromacs
package of programs (version 3.2) [25] and the gromos 96
force field. All of the structures were placed in a cubic peri-
odic box (92 · 92 · 92 A
˚
) of 24 876 SPC ⁄ E water mole-
cules [26]. Four sodium ions were added to ensure
electroneutrality of the systems. All of the systems studied
were energy relaxed with 1000 steps of steepest descent
energy minimization to remove possible unfavorable
contacts from the initial structures.
The protein–solvent systems were then subjected to
0.5 ns of position-restrained dynamics to allow water mole-
cules to soak the protein, followed by 1 ns of equilibration
at constant temperature (300 K) and pressure (1 atm), using
the Nose–Hoover thermostat and barostat (coupling con-
stants were 0.5 ps) [27]. The lincs algorithm [28] was used
to constrain all hydrogen bonds. A cut-off of 1.4 nm for
Lennard–Jones interactions was used, and the particle mesh
Ewald method [29] was employed to calculate longer-range
electrostatic contributions on a grid with 0.12 nm spacing
C. Maullu et al. PEGylation of the G-CSF molecule by MTGase enzyme
FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS 6747
and a cut-off of 0.9 nm. The time step used was 2 fs. Root
mean square displacement fluctuations were calculated with
the program g_rmsf included in the gromacs analysis
tools, using the equilibrated trajectories.
Normal mode analysis – generation of an ‘open’
MTGase conformation
The b-Gaussian network model [30], a coarse-grained
model, provides a reliable and not very computationally
time-consuming description (with respect to full atom MD
simulations) of concerted large-scale rearrangements in pro-
teins. In this approach, the concerted motions are calculated
within the quasiharmonic approximation of the free energy
F around a protein native state (assumed to coincide with
the crystallographic structure or with a minimized model
structure). Thus, a displacement from the native state
dR ={dr
1
, dr
2
, , dr
n
}(r
i
being the displacement of Ca
atom i) is associated with a free energy change
DF =(1⁄ 2)dR
FdR, where F is an interaction matrix
derived from the knowledge of contacting Ca and Cb atoms
in the native state, and the superscript indicates the trans-
pose matrix. The large-scale motions of the system corre-
spond to the eigenvectors of F with the smallest nonzero
eigenvalues.
The maxsprout algorithm [31] and scrwl software [22]
were used to reconstruct the backbone coordinates from
the Ca atom positions and the side chains, respectively,
after normal mode analysis.
G-CSF–MTGase interaction
A local version of the rosettadock program [18] running
on a 48 node Opteron cluster was used to perform the
protein–protein docking experiments. The rosettadock
program, also proven to be useful for protein models, uses
real-space Monte Carlo minimization on both rigid body
and side chain degrees of freedom to identify the lowest-
free-energy docked arrangement of two interacting proteins.
The ranking of the solutions is based on a free energy func-
tion dominated by a Lennard–Jones potential, an orienta-
tion-dependent hydrogen bond potential, [32] and an
implicit solvation model [33].
Site-directed mutagenesis
Four mutants of G-CSF were constructed with the Quik-
Change site-directed mutagenesis kit (Stratagene). Mut1–
Mut4 correspond to mutants Q173N, Q131N Q134N and
P132Q ⁄ Q134N, respectively.
Briefly, PCR amplification was performed by PfuTurbo
Hot Start polymerase (Stratagene) under standard condi-
tions, using approximately 10 ng of a plasmid containing
the wild-type G-CSF as a template and, in the case of the
Q134N mutant, a pair of complementary primers (forward,
5¢-GCCGGCATGGCACCGTTGGTGGGCTGCAGGG-3¢;
and reverse, 5¢-CCCTGCAGCCCACCAACGGTGCCA
TGCCGGC-3¢). The PCR product was then digested with
10 U of DpnI, and this was followed by transformation into
electrocompetent JM109 E. coli cells. The presence of the
desired Q134N mutation was confirmed by direct DNA
sequence analysis. The Q173N, Q131N and P132Q ⁄ Q134N
mutants were obtained with the same strategy, using suit-
able primers.
All DNA manipulations, including restriction digestion,
ligation, and agarose gel electrophoresis, were performed as
described by Sambrook et al. [34]. The PCR amplifications
were performed using a PCR thermal cycler (Gene Amp
PCR System 2700; Applied Biosystems, Foster City, CA,
USA), a high-fidelity PCR system [600320-51, PfuTurbo
Hot Start (Stratagene) and 600400-51 Easy A Hi Fi (Strata-
gene)], and oligonucleotides synthesized by M-Medical
(Milan, Italy). Plasmid extractions, gel extractions and
PCR purifications were performed using Qiagen kits.
E. coli competent cells {JM109 strain (F¢[traD36,
proA
+
B
+
, lacI
q
, D(lacZ)M15], D(lac, proAB)}, glnV44,
e14
–
, gyrA96, rec A1, rel A1, end A1, thi, hsdR17) from
New England Biolabs were transformed using the Bio-Rad
E. coli pulser transformation apparatus. The recombinant
JM109 cells were cultured using a fed-batch fermentation
process with a 10 L bioreactor (Biostat C, B. Braun), and
the G-CSF mutant fusion proteins, expressed in the form
of insoluble inclusion bodies, were recovered from the cells
by high-pressure homogenization, solubilized using a
chaotropic agent, and renatured by dilution in urea buffer.
Biologically active forms of G-CSF mutants, more than
98% pure, were obtained by enzymatic cleavage of the
fusion protein followed by a two-step column chromatogra-
phy purification process and a final gel filtration step.
PEGylation of G-CSF and its mutants via MTGase
Nonglycosylated G-CSF or one of its mutants was dis-
solved in a 10 mm (pH 7.4) potassium dihydrogen phos-
phate buffer at a concentration of 1 mg proteinÆmL
)1
,
corresponding to a concentration of about 53 lm. Mono-
methoxy-PEG-NH
2
(20 kDa) (Sunbio) was then added to
the protein solution to achieve a 10 : 1 PEG ⁄ G-CSF molar
ratio.
MTGase was then added to the reaction mixture to
0.024 UÆmL
)1
of final solution. The reaction took place
overnight under mild stirring at room temperature. At the
end of the reaction, aliquots of the reaction mixture were
analyzed on an RP-HPLC column to determine the yield of
the reaction.
PEGylated G-CSF and mutant analysis
The characterization of the PEGylation sites of the wild-
type and mutant G-CSF was performed by combining
PEGylation of the G-CSF molecule by MTGase enzyme C. Maullu et al.
6748 FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS
different analytical methods. The PEGylated proteins were
first subjected to enzymatic digestion by V8 protease, and
the PEGylated fragments, generated by specific and nonspe-
cific enzymatic cuts, were separated from the peptide
mixture by SDS ⁄ PAGE. The spots corresponding to
PEG-bearing peptides were blotted onto a poly(vinylidene
difluoride), membrane and their N-terminal sequences were
determined.
Acknowledgements
This publication was based on work partially sup-
ported by the MIUR grant ITALBIONET and by
FIRB project PROTEOMICA RBRN07BMCT. We
thank F. Ferre
`
for insightful discussions.
References
1 Jain A & Jain SK (2008) PEGylation: an approach for
drug delivery. A review. Crit Rev Ther Drug Carrier
Syst 25, 403–447.
2 Veronese FM & Mero A (2008) The impact of PEGyla-
tion on biological therapies. BioDrugs 22, 315–329.
3 Harris JM & Chess RB (2003) Effect of pegylation on
pharmaceuticals. Nat Rev Drug Discov 2, 214–221.
4 Roberts MJ, Bentley MD & Harris JM (2002) Chemis-
try for peptide and protein PEGylation. Adv Drug Deliv
Rev 54, 459–476.
5 Wattendorf U & Merkle HP (2008) PEGylation as a
tool for the biomedical engineering of surface modified
microparticles. J Pharm Sci 97, 4655–4669.
6 Abuchowski A, van EsT, Palczuk NC & Davis FF
(1977) Alteration of immunological properties of bovine
serum albumin by covalent attachment of polyethylene
glycol. J Biol Chem 252, 3578–3581.
7 Abuchowski A, McCoy JR, Palczuk NC, van Es T &
Davis FF (1977) Effect of covalent attachment of
polyethylene glycol on immunogenicity and circulating
life of bovine liver catalase. J Biol Chem 252, 3582–
3586.
8 Bailon P, Palleroni A, Schaffer CA, Spence CL, Fung
WJ, Porter JE, Ehrlich GK, Pan W, Xu ZX, Modi MW
et al. (2001) Rational design of a potent, long-lasting
form of interferon: a 40 kDa branched polyethylene gly-
col-conjugated interferon alpha-2a for the treatment of
hepatitis C. Bioconjug Chem 12, 195–202.
9 Wang YS, Youngster S, Grace M, Bausch J, Bordens R
& Wyss DF (2002) Structural and biological character-
ization of pegylated recombinant interferon alpha-2b
and its therapeutic implications. Adv Drug Deliv Rev 54,
547–570.
10 Zalipsky S (1995) Functionalized poly(ethylene glycol)
for preparation of biologically relevant conjugates.
Bioconjug Chem 6, 150–165.
11 Kinstler O, Molineux G, Treuheit M, Ladd D &
Gegg C (2002) Mono-N-terminal poly(ethylene
glycol)–protein conjugates. Adv Drug Deliv Rev 54,
477–485.
12 Sato H (2002) Enzymatic procedure for site-specific
pegylation of proteins. Adv Drug Deliv Rev 54, 487–504.
13 Fontana A, Spolaore B, Mero A & Veronese FM
(2008) Site-specific modification and PEGylation of
pharmaceutical proteins mediated by transglutaminase.
Adv Drug Deliv Rev 60, 13–28.
14 Lord BI, Woolford LB & Molineux G (2001) Kinetics
of neutrophil production in normal and neutropenic
animals during the response to filgrastim (r-metHu
G-CSF) or filgrastim SD ⁄ 01 (PEG-r-metHu G-CSF).
Clin Cancer Res 7, 2085–2090.
15 Taguchi S, Nishihama KI, Igi K, Ito K, Taira H,
Motoki M & Momose H (2000) Substrate specificity
analysis of microbial transglutaminase using proteina-
ceous protease inhibitors as natural model substrates.
J Biochem 128, 415–425.
16 Ohtsuka T, Sawa A, Kawabata R, Nio N & Motoki M
(2000) Substrate specificities of microbial transglutamin-
ase for primary amines. J Agric Food Chem 48, 6230–
6233.
17 Coussons PJ, Price NC, Kelly SM, Smith B & Sawyer
L (1992) Factors that govern the specificity of transglu-
taminase-catalysed modification of proteins and pep-
tides. Biochem J 282 (Pt 3), 929–930.
18 Gray JJ, Moughon S, Wang C, Schueler-Furman O,
Kuhlman B, Rohl CA & Baker D (2003) Protein–
protein docking with simultaneous optimization of
rigid-body displacement and side-chain conformations.
J Mol Biol 331, 281–299.
19 Wang C, Schueler-Furman O, Andre I, London N,
Fleishman SJ, Bradley P, Qian B & Baker D (2007)
RosettaDock in CAPRI rounds 6–12. Proteins
69,
758–763.
20 Pozzuolo S, Breme U, Salis B, Taylor G, Tonon G &
Orsini G (2008) Efficient bacterial expression of fusion
proteins and their selective processing by a recombinant
Kex-1 protease. Protein Expr Purif 59, 334–341.
21 Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R
& Birney E (2009) EnsemblCompara GeneTrees: com-
plete, duplication-aware phylogenetic trees in verte-
brates. Genome Res 19, 327–335.
22 Canutescu AA, Shelenkov AA & Dunbrack RL Jr
(2003) A graph-theory algorithm for rapid protein side-
chain prediction. Protein Sci 12, 2001–2014.
23 Koradi R, Billeter M & Wuthrich K (1996) MOLMOL:
a program for display and analysis of macromolecular
structures. J Mol Graph 14, 51–55.
24 Gautier R, Camproux AC & Tuffery P (2004) SCit:
web tools for protein side chain conformation analysis.
Nucleic Acids Res 32, W508–W511.
C. Maullu et al. PEGylation of the G-CSF molecule by MTGase enzyme
FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS 6749
25 Van Der Spoel D, Lindahl E, Hess B, Groenhof G,
Mark AE & Berendsen HJ (2005) GROMACS: fast,
flexible, and free. J Comput Chem 26, 1701–1718.
26 Berendsen HJC, Grigera JR & Straatsma TP (1987)
The missing term in effective pair potentials. J Phys
Chem 91, 6269–6271.
27 Evans DJ & Holian BL (1985) The Nose–Hoover ther-
mostat. J Chem Phys 83, 4069–4074.
28 Hess B, Bekker H, Berendsen HJC & Fraaije JGEM
(1997) LINCS: a linear constraint solver for molecular
simulations. J Comput Chem 18, 1463–1472.
29 Essmann U, Perera L, Berkowitz M, Darden T, Lee H
& Pedersen L (1995) A smooth particle mesh Ewald
method. J Chem Phys 103, 8577–8593.
30 Micheletti C, Carloni P & Maritan A (2004) Accurate
and efficient description of protein vibrational dynam-
ics: comparing molecular dynamics and Gaussian
models. Proteins 55, 635–645.
31 Holm L & Sander C (1991) Database algorithm for
generating protein backbone and side-chain co-ordi-
nates from a C alpha trace application to model build-
ing and detection of co-ordinate errors. J Mol Biol 218,
183–194.
32 Kortemme T, Morozov AV & Baker D (2003) An
orientation-dependent hydrogen bonding potential
improves prediction of specificity and structure for
proteins and protein–protein complexes. J Mol Biol 326,
1239–1259.
33 Lazaridis T & Karplus M (1999) Effective energy func-
tion for proteins in solution. Proteins 35, 133–152.
34 Sambrook J, Fritsch EF & Maniatis T (2000) Molecular
Cloning: a Laboratory Manual, 3rd edn. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, NY.
35 Crooks GE, Hon G, Chandonia JM & Brenner SE
(2004) WebLogo: a sequence logo generator. Genome
Res 14, 1188–1190.
PEGylation of the G-CSF molecule by MTGase enzyme C. Maullu et al.
6750 FEBS Journal 276 (2009) 6741–6750 ª 2009 The Authors Journal compilation ª 2009 FEBS