Tải bản đầy đủ (.pdf) (25 trang)

Structural basis of protein stability at poly extreme crystal structure of amya at 1 6 a resolution

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (280.93 KB, 25 trang )


1
CHAPTER 1

INTRODUCTION TO MACROMOLECULAR X-RAY
CRYSTALLOGRAPHY

The 1901 Nobel Prize for physics was awarded to Roentgen for his discovery
of X-rays. X-rays are electromagnetic waves whose wavelengths are in the range of
0.1-100 Å. They are produced when rapidly moving electrons strike a solid metal
target and their kinetic energy is converted into radiation. The wavelength of the
emitted radiation depends on the energy of the electrons. In 1912, von Laue’s group
discovered X-ray diffraction by crystals and this discovery gave rise to the
development of a very rich scientific period and created a new academic branch – X-
ray crystallography. One year later, W. L. Bragg determined the first crystal structure.
From then on, crystal structure determination has been broadly undertaken on
inorganic and organic molecules.
X-ray crystallography is now a commonly used technique for determination of
the three-dimensional structure of biomolecules. The methodology is fairly robust in
that the experimental and computational methods for these studies are now well
developed. The use of advanced protein expression and purification procedures,
crystallization robots and powerful synchrotron radiation sources has enabled high-
throughput structure determination. This chapter briefly discusses the concepts and
methodologies used in macromolecular X-ray crystallography.




2

1.1 MACROMOLECULAR CRYSTALLIZATION


To perform X-ray crystallography, it is necessary to grow crystals with edge
lengths around 0.1-0.3 mm. Crystals are formed as the conditions in a supersaturated
solution slowly change. For small molecules, growing large crystals is relatively
simple. Proteins are difficult to crystallize because of their complexity, molecular
weight and flexibility. Also purification of a protein to homogeneity is a very tedious
process. The strategy to crystallize a protein is to guide a protein/solvent system very
slowly toward a state of reduced solubility by modifying the properties of the solvent
or the character of the macromolecule. This is most frequently accomplished by
increasing the concentration of precipitating agents or by altering some physical
properties (e.g., pH, temperature) to achieve supersaturation. Efforts then have to be
put into the refinement and optimization of the crystallization conditions that will
encourage and promote specific bonding interactions between molecules, bigger
single crystal formation and to stabilize the crystals once they are formed.
The ‘salting in’ and ‘salting out’ properties of proteins have been used to push
them into supersaturation. Although the ‘salting in’ effect can be used as a method for
crystallization, however, most proteins are not stable at a low salt environment.
Therefore, exploration of the protein ‘salting out’ property is more commonly used. A
number of methods have been attempted to bring proteins in an unsaturated state
gradually into a supersaturated state. The most commonly used method to crystallize
proteins is the vapor diffusion method. A drop of protein solution is suspended over a
reservoir containing buffer and precipitant. Water diffuses from the drop to the
reservoir solution leaving the drop with optimal crystal growth conditions. The other
methods include batch crystallization, micro-batch crystallization and dialysis.

3

1.2 BASIC CONCEPTS OF CRYSTALLOGRAPHY
1.2.1 Crystal, unit-cell and asymmetric unit
Protein crystals are usually about 40-60% solvent by weight and are thus
fragile and sensitive to drying out. In a crystal, molecules are arranged with regular

repeats of symmetry. A unit-cell is defined as the smallest possible volume that when
repeated, represents the entire crystal. The dimensions of a unit-cell can be described
with 3 edge lengths (a, b, c) and 3 interaxial angles (α, β, γ). The location of atoms
within a unit-cell can be listed in the Cartesian coordinate system.
The smallest volume within the unit-cell that can be rotated and translated to
generate one unit-cell is called the asymmetric unit. Only the symmetry operators that
are allowed by the crystallographic symmetry must be used for the construction of the
entire unit-cell. Even though the asymmetric unit may commonly contain only one
molecule or one subunit of a multimeric protein, it can also be more than one.

1.2.2 Lattice, point group and space group
A lattice is classically defined as a group of points organized in space in such
a way that each point has the same environment. There are 14 types of unit-cells in
crystallography that lead to 14 Bravais lattices. The Bravais lattices are the distinct
lattice types which, when repeated can fill the whole space. They can be classified as
primitive (simple unit-cell), face centered (equals the simple lattice with the addition
of a lattice point in the center of each of the six faces of each unit-cell), body centered
(point at the center of the cell) and end centered (point at the center of one face). The
cubic crystal system (which warrants a cubic unit-cell) can have a primitive, body
centered and face centered lattice; the tetragonal system can have a primitive and

4
body centered lattice; the orthorhombic system can have a primitive, face centered,
body centered and end centered lattice; the hexagonal crystal system can only have a
primitive lattice while the trigonal system can have a rhombohedral lattice; the
monoclinic system needs a primitive or an end centered lattice while the triclinic
system can only have a primitive lattice.
Molecules follow certain symmetry operations when they are packed into a
crystal. Beside unit translations along the three unit-cell axes, called three-
dimensional translation symmetry, other symmetry elements are rotation, reflection,

and inversion. The combination of these symmetry elements that acts on a unit-cell is
commonly called a crystallographic point group. The simplest point groups are
composed of proper rotation around the symmetry axis. These are the point groups 1,
2, 3, 4, and 6. The total number of crystallographic point groups involving proper
rotation is 11. Point groups also contain improper rotations, which are conformed to
one of the six general types:
n
,
nn
, PII, IPI, IIP, and P/I P/I P/I. There are 21
improper rotations. Thus there are totally 32 crystallographic point groups (Buerger,
1956).
Rotation or reflection combined with translation will generate screw or glide
symmetry, respectively. The combination of lattices and points groups (including their
allowed screw axes and glide planes) leads to 230 different ways to combine the
allowed symmetry operations in a crystal, known as space groups. Because only L-
amino acids are present in proteins and application of the mirror plane and inversion
center to an L-amino acid would demand a D-amino acid not all the 230 space groups
are allowed in protein crystals and only 65 space groups are applicable (McRee,
1999).


5
1.2.3 hkl plane
A convenient way to study the crystalline lattice is through the use of hkl
planes. The index h gives the number of planes in the set per unit-cell in the X
direction, or equivalently, the number of parts into which the set of planes cuts the X
edge of each cell. Similarly, the indices k and l specify how many such planes exist
per unit-cell in the Y and Z directions. The family of planes having indices hkl is the
(hkl) family of planes. This concept is very useful in explaining the diffraction of X-

rays by crystals.

1.3 PRINCIPLES OF X-RAY DIFFRACTION
Diffraction occurs as waves interact with a regular structure whose repeat
distance is about the same as the wavelength. It happens that X-rays have wavelengths
in the order of Angstroms, same as typical interatomic distances in crystalline solids.
That means X-rays can be diffracted by minerals, which, by definition, are crystalline
and have regularly repeating atomic structures. When certain geometric requirements
are satisfied, X-rays that are scattered from a crystalline solid can constructively
interfere, thereby producing a diffracted beam. These geometric requirements were
first explained by Bragg.

1.3.1 Bragg's law
Diffraction depends on spacing between scattering bodies and wavelengths of
incident radiation. In Bragg’s model of diffraction as reflection from parallel sets of
planes, (Fig. 1.1) any of these sets of planes can be the source of one diffracted X-ray
beam. Bragg showed that a set of parallel planes with indices hkl and interplanar

6
spacing d
hkl
produces a diffracted beam when X-rays of wavelength λ impinge on the
planes at an angle θ and are reflected at the same angle, only if θ meets the condition
2 d
hkl
sinθ = nλ (1.1)







Figure 1.1 The Bragg’s law. The condition that produces diffracted
rays. sin θ = BC/AB, BC = AB sinθ = d
hkl
sinθ. If the additional
distance (2BC) travelled by the more deeply penetrating ray R
2
is an
integral multiple of λ, then rays R
1
and R
2
interfere constructively.

Notice that the angle of diffraction θ is inversely related to the interplanar spacing d
hkl
(sinθ is proportional to 1/d
hkl
). This implies that large unit-cells, with large spacing,
give small angles of diffraction and hence produce many reflections that fall within a
convenient angle from the incident beam. On the other hand, small unit-cells give a
large angle of diffraction, producing fewer measurable reflections. In a sense, the
number of measurable reflections depends on how many reflections are possible from
a unit-cell under a given experimental condition.
Each set of parallel planes in a crystal produces one reflection. The intensity of
a reflection depends on the summation of the electron distribution in the unit-cell
along the direction of the planes that produce that reflection.

θ

θ
A
B
C
R
R
hkl
1
C
d
2

7
1.3.2 Reciprocal lattice
Although Bragg’s law gives a simple and convenient method for calculating
the separation of crystallographic planes, further analysis is necessary to calculate the
intensity of scattering from a spatial distribution of electrons within each unit-cell. A
reciprocal lattice is defined as a discrete set of diffracted rays (reflections). The
reciprocal lattice vectors are perpendicular to the real lattice planes from which they
are derived. The dimensions of the reciprocal lattice are inversely related to those of
the real lattice. Thus large unit-cells result in a very closely spaced reciprocal lattice
and small unit-cells result in a reciprocal lattice with large intervals.
Fig. 1.2 explains how a reciprocal lattice is generated. Take O as the origin.
Through a neighboring crystal lattice point N, draw one plane each of the set (110),
(120) and so forth, whose interplanar distances will be d
110
, d
120
and so on. From the
origin, draw a line normal to the (110) plane. The point at a distance, 1/d

110
, on this
line will define the reciprocal lattice point 110. Do the same for (120) and so on. Note
that the points defined by this operation form a lattice, with the chosen origin. This
new lattice is the reciprocal lattice. If the real unit-cell angles α, β and γ are 90°, the
reciprocal unit-cell has axes a* lying along the real unit-cell edge with the
corresponding length of 1/a. Similarly, the other parameters, b* and c* are defined. If
the axial lengths are expressed in Angstroms, then the reciprocal lattice spacing is in
the unit 1/Å or Å
-1
(reciprocal Angstroms).

1.3.3 Ewald sphere
Reciprocal lattice points give the crystallographer a convenient way to
compute the direction of diffracted beams from all sets of parallel planes in the

8
a*
b*
θ
θ
θ
θ
B
X
C
P
O
P'
R

O
N
b
(110)
(120)
(130)
(010)
110
120
130
140
b*
x
y
crystalline lattice (real space). The following geometrical interpretation of diffraction
was formulated by Ewald.








Figure 1.2 The reciprocal lattice

Assume that an X-ray beam (arrow XO in Fig. 1.3) impinges on the crystal on
a plane. Point O is arbitrarily chosen as the origin of the reciprocal lattice. O is also
the real lattice origin in the crystal. Draw a circle of radius 1/λ with its center C on XO
and passes through O. This circle represents the wavelength of X-rays in the









Figure 1.3 The Ewald sphere

9
reciprocal space. Rotating the crystal about O will also rotate the reciprocal lattice
about O, successively bringing the reciprocal lattice points P and P' into contact with
the circle. Because the triangle PBO is inscribed in a semicircle, it is a right angled
triangle and sin
θ
= OP/ BO = OP/ (2/
λ
). Because P is a reciprocal lattice point, the
length of line OP is 1/d
hkl
, where h, k and l are the indices of the set of planes
represented by P. So, 1/OP = d
hkl
and 2d
hkl
sin
θ
=
λ

, which is Bragg's law with n = 1.
The line defining a reciprocal lattice point is normal to the set of planes having
the same indices as the point. BP, which is perpendicular to OP, is parallel to the
planes that are producing the reflection P in Fig. 1.3. If we draw a line parallel to BP
and passing through C, the center of the circle, this line represents a plane in the set
that reflects the X-ray beam under these conditions. The beam impinges on this plane
at an angle
θ
, reflected at the same angle and diverges from the plane at C by an angle
2
θ
, which takes it precisely through the point P. CP gives the direction of the
reflected ray R. In conclusion, reflection occurs in the direction CP when the
reciprocal lattice point P comes in contact with this circle. As the crystal is rotated in
the X-ray beam, all reciprocal lattice points come into contact with this sphere. Each
reciprocal lattice point produces a beam in the direction of a line from the center of
the sphere of reflection through the reciprocal lattice point that is in contact with the
sphere.
This model of diffraction also implies that the directions of reflections, as well
as the number of reflections, depend only on the unit-cell dimensions, and not on the
contents of the unit-cell.




10

1.4 FOURIER TRANSFORM
1.4.1 The Fourier series
A Fourier series, named after Joseph Fourier, is an expansion of a periodic

function f(x) in terms of an infinite sum of sines and cosines and makes use of the
orthogonality relationships of the sine and cosine functions. The computation and
study of the Fourier series is known as harmonic analysis and is extremely useful as a
way to break up an arbitrary periodic function into a set of simple terms that can be
plugged in, solved individually, and then recombined to obtain a solution to the
original problem or an approximation to it to whatever accuracy is desired in practice.
Each reflection is the result of diffraction from atoms in the unit-cell. As a
wave is periodic, Fourier analysis is the approximation of periodic functions by sine
and cosine. The basic idea of Fourier analysis is that any function f(x) of period 1 can
be approximated by sums of the type

=
+=
n
h
h
hxihxFxf
0
)](2sin)(2[cos||)(
ππ
(1.2)
Here
)(xf specifies the resulting diffracting wave and it is the sum of n Fourier terms
or diffraction from n atoms. Each term is a simple wave with its own amplitude |F
h
|,
its own frequency h, and implicitly, its own phase
α
h
. Since

cos
θ
+ isin
θ
= e
i
θ

(1.3)
the above Fourier series can be written as


=
h
hxi
h
eFxf
)(2
||)(
π
(1.4)
When the above Fourier series is derived as a three dimensional Fourier series, the
equation will be

11




++

=
hkl
lzkyhxi
hkl
eFzyxf
)(2
||),,(
π
(1.5)
Here each term in the series is a simple three-dimensional wave whose frequency is h
in the X direction, k in the Y direction and l in the Z direction. For each possible set
of value h, k and l, the associated wave has an amplitude |F
hkl
|.

1.4.2 The Fourier transform
The Fourier transform defines a relationship between a signal in the time
domain and its representation in the frequency domain. Being a transform, no
information is created or lost in the process, so the original signal can be recovered
from knowing the Fourier transform, and vice versa. Fourier demonstrated that for
any function f(x), there exists another Function F(h) such that


+∞
∞−
= dxexfhF
hxi )(2
)()(
π
(1.6)

Where F(h) is called the Fourier transformation (FT) of f(x), and the unit of the
variable h is the reciprocal of the unit of x.
The Fourier transform operation is reversible. That is, the same mathematical
operation that gives F(h) from f(x) can be carried out in the opposite direction to give
f(x) from F(h), if x and h are reciprocal to each other.


+∞
∞−

= dhehFxf
hxi )(2
)()(
π
(1.7)
The above functions f(x) and F(h) are one-dimensional. If stated in three dimensions,
the Fourier transform would be:

∫∫∫
++
=
xyz
lzkyhxi
dxdydzezyxflkhF
)(2
),,(),,(
π
(1.8)
and in turn the reverse Fourier transform is


12

∫∫∫
++−
=
hkl
lzkyhxi
dhdkdlelkhFzyxf
)(2
),,(),,(
π
(1.9)

1.4.3 Electron density and structure factor
The Fourier series is directly applicable in the study of crystals because the
electron density function in a crystal is periodic. Although the information about a
protein structure is presented in the Cartesian coordinates of each atom, in reality
what the crystallographer sees is the electron density, the cloud of electrons
surrounding the nucleus of an atom with which X-rays interact.
The unit-cell can be represented as an assembly of electron density in several
defined volume elements. The electron density of each volume element centered at (x,
y, z) is roughly the average value of
ρ
(x, y, z) in that region. Smaller the volume
elements, the more precisely these averages approach the correct value of
ρ
(x, y, z) at
all points. The electron density is written as

)(2

1
),,(
lzkyhxi
hkl
hkl
eF
v
zyx
++−
∑∑∑
=
π
ρ
(1.10)
where F
hkl
is called structure factor, whose Fourier transform is the electron density
and vice versa. In turn, the structure factor is written as

∫∫∫
++
=
hkl
lzkyhxi
hkl
dxdydzezyxF
)(2
),,(
π
ρ

(1.11)
In other words, the structure factor is the resultant of N waves scattered in the
direction of the reflection hkl by the N atoms in the unit-cell. Each of these waves has
an amplitude, which is proportional to the sum of f
j
, the scattering factor of atom j,
and a phase angle
α
j
with respect to the origin of the unit-cell.
Crystallographers represent each structure factor as a complex vector. The
length of this vector represents the amplitude of the structure factor F
hkl
, which is

13
|A|
F
i
|B|
Real
I
m
a
g
i
n
a
r
y

α
proportional to the square root of the intensity of the reflection hkl, (I
hkl
)
1/2
. The phase
is represented by the angle
α that the vector makes with the real axis when the origin
of the vector is placed at the origin of the complex plane. The structure factor F can
be represented as a vector A + iB on this plane, Fig. 1.4 The projection of F on the
real axis is its real part A, a vector of length |A| and the projection of F on the
imaginary axis is its imaginary part iB, a vector of length |B|.










Figure 1.4
Real and imaginary components of the structure factor

From the above figure

||
||
sin

F
B
=
α
and
||
||
cos
F
A
=
α
(1.12)
and
|A| = |F|cos
α
and |B| = |F|sin
α
(1.13)
F = |A| + i|B| = |F|(cos
α
+ isin
α
) (1.14)
Expressing the complex terms in parentheses as an exponential,

14

α
i

eFF ⋅= ||
(1.15)
Substituting this expression for F
hkl
in equation 1.10 will generate

∑∑∑
++−
=
hkl
lzkyhxi
i
hkl
eeF
V
zyx
hkl
)(2
||
1
),,(
π
α
ρ
(1.16)
The structure factor for the reflection F
hkl
can be rearranged as



=
++
=
n
j
lzkyhxi
jhkl
jjj
efF
1
)(2
π
(1.17)
where f
j
is the scattering factor and (x
j
, y
j
, z
j
) are the fractional coordinates of atom j in
the unit-cell.
In X-ray crystallography the structure factor F(hkl) of any X-ray reflection
(diffracted beam) hkl is the quantity that expresses both the amplitude and the phase
of that reflection. It plays a central role in the determination and refinement of crystal
structures because it represents the quantity related to the intensity of the reflection
which depends on the structure that gives rise to that reflection and is independent of
the method and conditions of observation of the reflection. The set of structure factors
for all the reflections are the primary quantities necessary for the derivation of the

three-dimensional distribution of electron density, which is the image of the crystal
structure, calculated by Fourier methods. This image is the crystallographic analogue
of the image formed in a microscope by the recombination of the rays that are
scattered by the object. In a microscope this recombination is physically performed by
lenses but in crystallography the corresponding recombination of diffracted beams
must be achieved by mathematical calculations.




15
1.5 THE PHASE PROBLEM
In a diffraction experiment, we measure the intensities of waves scattered from
planes (denoted by hkl) in a crystal. The amplitude of the wave, |F
hkl
|, is proportional
to the square root of the intensity of the reflection measured by a detector. To
calculate the electron density at a position (xyz) in the unit-cell, we need to compute
the summation of Equation 1.16 over all the hkl planes, which we can express in
words as: electron density at (xyz) = the sum of contributions [to the point (xyz)] of
waves scattered from all possible planes, whose amplitudes depend on the number of
electrons in the unit-cell and the contributions are added with the correct relative
phase relationship. In Equation 1.16, V is the volume of the unit-cell and α
hkl
is the
phase associated with the structure-factor amplitude |F
hkl
|. We can measure the
amplitudes, but the phases are immeasurable in a diffraction experiment. This is the
phase problem of X-ray crystallography.

If we can somehow assume or arrive at some prior knowledge of the electron
density or the structure, we can calculate the phase angle. This is the basis for all
phasing methods. The structure determination process of a crystal structure therefore
consists of applying a technique, which is relevant to that particular crystal, for
obtaining the approximate phases of at least some of the X-ray reflections. In the
process of structure refinement the knowledge of the initial phases is extended to all
reflections as accurately as possible.


1.5.1 Solving the phase problem
Four methods are used to solve the phase problem in macromolecular structure
determination. They are: direct methods, heavy-atom method (or isomorphous
replacement method), anomalous scattering method (also called anomalous

16
dispersion) and molecular replacement method. All these methods only yield
estimates of phases for a limited set of reflections which in some cases must be
improved before an interpretable electron density map can be obtained. Subsequently,
phases are assigned to as many reflections as possible.

1.5.2 Direct methods
If you assume that a crystal is made up of similarly-shaped atoms that all have
positive electron density, then there are statistical relationships between sets of
structure factors. These statistical relationships can be used to deduce possible values
for the phases. Direct methods exploit such relationships, and can be used to solve
small molecule structures relatively easily. The direct methods estimate the initial
phases for a selected set of reflections using a triple relation and extend phases to
more reflections. A trio of reflections in which the intensity and phase of one
reflection can be explained by the other two has a triple relation. A number of initial
phases are tested and selected by this method.

Unfortunately, the statistical relationships become weaker as the number of
atoms increases, and direct methods are limited to structures with, at most, a few
hundred atoms in the unit-cell. The prime requirement for the direct methods to be
successful in protein crystallography is very high resolution data (> 1.2 Å). This has
limited the usefulness of ab initio phase determination in protein crystallography,
although the direct methods have been used to phase proteins up to 1000 atoms.

1.5.3 Molecular replacement (MR)
When a structural model, called the search model that is highly homologous to
the subject protein, is available, molecular replacement can be successful. The

17
principles of this method were first described by Michael Rossmann and David Blow
(Rossmann, 1962). Usually, the Patterson function of the search model is first
correctly orientated in the new crystal unit-cell by means of rotation functions and
then the correctly oriented model is translated in the new unit-cell to achieve the best
fit that is supported by a convincing correlation factor and a residual factor (details of
the residual factor are discussed in Section 1.8).

1.5.4 Multiple isomorphous replacement (MIR)
The use of heavy-atom substitution was formulated very early by small-
molecular crystallographers to solve the phase problem. It was Max Perutz and John
Kendrew who first applied this method to proteins (Perutz, 1956; Kendrew et al,
1958) by soaking protein crystals in heavy-atom solutions to create isomorphous
heavy-atom derivatives (same unit-cell, same orientation of the protein in the unit-
cell), which gave rise to measurable intensity changes that could be used to deduce
the positions of the heavy atoms.
In this method, crystals of the wild type protein, whose structure is to be
determined are grown in the usual manner. After reaching maturity they are soaked in
solutions of heavy atom compounds. The goal is to obtain derivative crystals in which

heavy atoms bind specifically and consistently to each protein molecule in the unit-
cell. After soaking, the positions of the heavy atoms are determined using difference
Pattersons. For this step to be successful it is important that only a few heavy atoms
should bind in each asymmetric unit. Once the initial heavy atom locations have been
determined, the coordinates, occupancy and temperature factors of each heavy atom
are refined. At least two isomorphous derivatives are needed for successful structure
determination by MIR whereas for multiple isomorphous replacement with

18
anomalous scattering (MIRAS) phasing, one isomorphous derivative and anomalous
scattering data are needed. In practice, data from several derivatives are combined for
the refinement of heavy atom parameters and for the calculation of MIR or MIRAS
phases.

1.5.5 Anomalous scattering
The atomic scattering factor of an atom has three components: f
0
, a scattering
term that is dependent on the Bragg angle and two terms (f

and f

) that are not
dependent on the scattering angle, but on wavelengths. These latter two terms
represent the anomalous scattering that occurs at the absorption edge when the X-ray
photon energy is sufficient to promote an electron from an inner shell. The dispersive
term f

reduces f
0

whereas the absorption term f

is 90° advanced in phase with
respect to f

. This leads to a breakdown in Friedel's law, giving rise to anomalous
differences that can be used to locate anomalous scatterers in a crystal, if any.
The anomalous or Bijvoet difference can be used in the same way as the
isomorphous difference in the Patterson or direct methods to locate anomalous
scatterers. Phases for the native structure factors can then be derived in a way similar
to single or multiple isomorphous replacement (SIR or MIR). Anomalous scattering
can be used to break the phase ambiguity in a single isomorphous replacement
experiment, leading to single isomorphous replacement with anomalous scattering
(SIRAS).

1.5.5.1 MAD
Isomorphous replacement has several problems: non-isomorphism between
crystals (unit-cell changes, reorientation of the protein, conformational changes,

19
changes in salt and solvent ions), problems in locating all the heavy atoms, problems
in refining heavy-atom positions, occupancies and thermal parameters and errors in
intensity measurements. The use of the multiwavelength anomalous diffraction
(MAD) method overcomes the non-isomorphism problems. Data are collected at
several, typically three, wavelengths in order to maximize the absorption and
dispersive effects.
The changes in structure-factor amplitudes arising from anomalous scattering
are generally small and require accurate measurement of intensities. The actual profile
of the absorption curve must be determined experimentally by a fluorescence scan on
the crystal at the synchrotron, as the environment of the anomalous scatterers can

affect the details of the absorption. There is a need for excellent optics for accurate
wavelength setting with minimum wavelength dispersion. Generally, all data are
collected from a single frozen crystal with high redundancy in order to increase the
statistical significance of the measurements and data are collected with as high a
completeness as possible.

1.5.5.2 SAD
Single anomalous dispersion (SAD) is a sub-set of MAD. It is becoming
increasingly practical to collect data at the absorption peak and use density-
modification protocols to break the phase ambiguity and provide interpretable maps.

1.6 PHASE IMPROVEMENT
Generally, experimentally determined phases are not sufficiently accurate to
give a completely interpretable electron-density map. Experimental phases are often
the starting point for phase improvement using a variety of density modification

20
methods, which are also based on some prior knowledge of the structure. Solvent
flattening, histogram matching and non-crystallographic averaging are the main
techniques used to modify electron density and improve phases. Solvent flattening is a
powerful technique that removes negative electron density and sets the value of
electron density in the solvent regions to a typical value of 0.33 e Å
-3
, in contrast to a
typical protein electron density of 0.43 e Å
-3
.
Density modification is often a cyclic procedure, involving back-transformation
of the modified electron-density map to give modified phases, recombination of these
phases with the experimental phases (so as not to throw away experimental reality)

and calculation of a new map which is then modified iteratively until convergence.
Such methods can also be used to provide phases beyond the resolution for which
experimental phase information is available, assuming higher resolution native data
have been collected. In such cases, the modified map is back-transformed to a slightly
higher resolution on each cycle to provide new phases for higher resolution
reflections.

1.7 MODEL BUILDING
A model of the subject protein is produced by fitting the components of the
structure into the experimentally derived electron density map followed by
refinement. In protein crystallography, the generation of an atomic model of the
molecule(s) is a crucial step in the structure-determination process. With an atomic
model available, the vast amount of geometrical data of protein structures can be
applied in structure refinement in order to generate better phases and a better atomic
model. In practice, an atomic model can only be generated when sufficient phase

21
information has been obtained to produce an interpretable electron-density map either
by experimental means or through the use of known homologous structures.
The model-building task may be far from straightforward, because the phase
information may be poor and the resolution of the diffraction data may be limited. An
initial model built into an experimental map, or in a poorly phased molecular
replacement map, will usually contain many errors. In order to produce an accurate
model, it is necessary to carry out crystallographic refinement as well as rebuilding at
the graphics display. These steps are carried out in a cyclic process of gradual
improvement of the model. Depending on the size of the structure, the automatic
(refinement) or the manual (rebuilding) part may be rate-limiting. Refinement is the
process of adjusting the parameters of a model to find values most nearly compatible
with the observations. It is to minimize the sum of the weighted differences between
(|F

o
| - |F
c
|)
2


For maps at high resolution (d
≤ 2.2 Å) and with good starting phases,
automation of the model-building process has been highly successful in recent years
(Perrakis et al, 1999). Automation has enormously reduced the amount of time
involved in manual model building using computer graphics programs. Currently,
various approaches are being developed to improve the pattern recognition of protein
structural features in electron-density maps (Terwilliger, 2003; Holton et al, 2000;
Levitt, 2001) so that automated model building can deal with even lower resolution
data and poorer phase information. Nonetheless, at increasingly lower resolution and
with poorer phase information map interpretation will become increasingly unreliable.



22
1.8 REFINEMENT
Once all the atoms in a structure have been located, the final part of the
process is to refine them. An atomic model can never be perfect, but it can be
improved a great deal by a process called refinement, in which the atomic model is
adjusted to improve the agreement with the measured diffraction data. Refinement is
the optimization of a function of a set of observations by changing the parameters of a
model.
During the refinement of a protein structure no data cut-off should be applied
and generally all observed reflections at low resolutions should be used. Low-

resolution data must be collected for proper evaluation of the structure because this is
used in bulk solvent averaging. Failing to do so could result in underestimation of B-
factors or even negative values. The highest possible resolution limit should be used
because this maximizes the accuracy and precision of the structure. This is determined
by the signal-to-noise ratio [I/σ(I)], completeness and redundancy of data within the
highest resolution shell.
It is important not to over-refine a structure by building the model into density
of inadequate quality. This can generally be revealed by visual inspection of the
electron density maps and the presence of high B-factors (i.e. atoms with high thermal
mobility). The majority of structures are refined using isotropic B-factors with the
assumption that an atom moves equally in all directions. Anisotropic refinement
enables the movement of an atom in each direction to be individually refined but the
higher number of parameters means that more unique reflections are also required.
Consequently, anisotropic B-factor refinement is only possible for atomic resolution
structures.

23
Crystallographic refinement procedures are self-monitoring but are prone to
contain experimental errors and hence interpretation of the electron density map can
be difficult (Kleywegt, 2000). This is further complicated by the presence of model
bias: that is, the structural model influences the appearance of the electron density
map. The use of annealing procedures to avoid false energy minima can reduce model
bias, and should be applied to models that contain protein structures and ligands. The
procedure simulates gradual heating and cooling of the molecule and also has the
advantage of correcting many small errors in the model. However, visual inspection
of the model during refinement is most important because this can reveal unexpected
errors.
Modern crystallographic refinement uses automatic routines to add water
molecules to structures. The number of ordered water molecules within the model
increases with increasing resolution. Most proteins are crystallized in the presence of

salts, and it is often difficult to differentiate them from water molecules, particularly
for similarly sized ions (e.g. NH
4
+
and Na
+
) or noise in electron density maps.
Hydrogen-bonding patterns and coordination geometry of ions can be used to
differentiate water molecules and ions in these circumstances.
After several rounds of refinement and map fitting, the model is slowly
converged to the final model. The refinement of the structural model against the X-ray
diffraction data is measured by a ‘residual’ or ‘reliability’ factor (R-factor). The
progress in iterative real and reciprocal space refinement is monitored by computing
the difference between the measured structure factor |F
o
| and the calculated structure
factor |F
c
| from the current model.




=
||
||||||
obs
calcobs
F
FF

R
(1.20)

24
When the model converges to the correct structure, the difference between
measured F's and calculated F's will also converge. A desirable target R factor for a
protein model should be less than 0.3. Occasionally, a small and well ordered protein
structure may refine to about R = 0.1. Modern structure refinement algorithms split
the diffraction data into a ‘test set’ to calculate ‘R
free
’ and ‘model set’ to calculate
‘R
cryst
’ as global measures of refinement. The R
free
parameter is particularly important
for judging against ‘over-fitting’: that is, when no further improvement of the model
is obtained and is also sensitive to the presence of errors (Brunger et al, 1992 and
Kleywegt et al, 1996).
R-factor values alone can be misleading about the quality of the structure
because several factors, such as the omission of weak data using a sigma cut-off,
influence them. Inappropriate refinement procedures can produce artificially low R
values (i.e. the fit appears to be better than it really is). The inclusion of weak
diffraction data is essential to obtain a complete model of the structure, including
conformational variations of the amino acid side-chains and bound solvent. Thus, R-
factors must be examined critically when judging the reliability of the model, and
sometimes even properly refined structures with ‘acceptable’ R-factors can have
significant errors associated with them.

1.9 VALIDATION AND DEPOSITION

Since the process of building and refining a model of a biomacromolecule
based on crystallographic data is subjective, quality-control techniques are required to
assess the validity of such models. Errors in the process of model building are almost
unavoidable, but it is the crystallographer's task to remove as many of these errors as
possible prior to analysis, publication and deposition of the structure. There are many

25
methods to reduce or avoid these errors. These include (i) the use of information
derived from databases of well refined structures in model building (ii) the use of
various sorts of local quality checks (iii) the use of global quality indicators.
Many statistics, methods and programs have been developed to help identify
errors in protein models. These methods generally fall into two classes: one in which
only coordinates and B factors are taken into account and the second in which both
the model and the crystallographic data are analyzed.
In a well refined model, the root mean square deviation (RMSD) for bond
lengths should not be more than 0.02 Å and for bond angles it is less than 4
°. Also,
there should be no D-amino acid residues present. Peptide planes must be nearly
planar and the back bone conformational angles
φ and ψ should fall in the allowed
regions. Torsional angles in side chains should lie within a few degrees of stable and
staggered conformation. Finally, the well refined model is deposited in the Protein
Data Bank ( />).

×