Tải bản đầy đủ (.pdf) (338 trang)

Drug discovery strategies and methods 2004 makriyannis biegel

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.57 MB, 338 trang )

DRUG
DISCOVERY

STRATEGIES
METHODS
EDITED BY

ALEXANDROSMAKRIYANNIS

DIANEB~EGEL

Center for Drug Discovery
University of Connecticut
Storrs, Connecticut, U.S.A.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Although great care has been taken to provide accurate and current information, neither the
author(s) nor the publisher, nor anyone else associated with this publication, shall be liable for
any loss, damage, or liability directly or indirectly caused or alleged to be caused by this book.
The material contained herein is not intended to provide specific advice or recommendations
for any specific situation.
Trademark notice: Product or corporate names may be trademarks or registered trademarks
and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.
ISBN: 0-8247-0691-9
This book is printed on acid-free paper.
Headquarters


Marcel Dekker, Inc.
270 Madison Avenue, New York, NY 10016
tel: 212-696-9000; fax: 212-685-4540
Eastern Hemisphere Distribution
Marcel Dekker AG
Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland
tel: 41-61-260-6300; fax: 41-61-260-6333
World Wide Web

The publisher offers discounts on this book when ordered in bulk quantities. For more
information, write to Special Sales/Professional Marketing at the headquarters address
above.
Copyright n
n 2004 by Marcel Dekker, Inc. All Rights Reserved.
Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording, or by
any information storage and retrieval system, without permission in writing from the
publisher.
Current printing (last digit):
10 9 8 7 6 5 4 3 2 1
PRINTED IN THE UNITED STATES OF AMERICA

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Preface

Drug research encompasses diverse branches of science united by a common goal, namely, developing novel therapeutic agents and understanding
their molecular mechanisms of action. This process is a lengthy, exacting,
and expensive undertaking that involves integration of data from different

fields and culminates in the final product—a new drug in the marketplace.
In the past decade, progress in drug research has flourished because of
major contributions from a variety of disciplines.
The material presented in this volume focuses on a number of
research topics that have provided critical information in the field of drug
discovery. Several chapters present techniques that extend our understanding of the three-dimensional structure of macromolecules, principally
proteins, but also nucleic acid polymers and organized lipid and carbohydrate assemblies. As greater structural data on the these molecules become
available, information can be obtained on their interactions with small
endogenous ligand drug molecules as well as on the interactions between
two or more of these biopolymers. Such knowledge enhances our overall
understanding of the biochemical systems of interest and their relevance
for therapeutic discovery. In addition to the basic knowledge gained by
such research, the data provide a solid basis for the development of novel
drugs with greater potencies, higher specificities of action, and reduced side
effects.
Another area of research covered in this book is the in vivo
anatomical localization of potential therapeutics using PET and SPECT
analysis (Chapter 5). These techniques allow researchers to pinpoint the
localization of high-affinity ligands in the living organism with high
accuracy, thus giving researchers a window on the functions of the brain
and other organs and on the sites of action of potential therapeutic agents.
Such studies will provide a blueprint for the design of pharmacological

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


agents that will target specific regions of affected organs and deliver therapeutic actions rapidly and with high specificity.
High throughput methods have increased our capacity for appropriate candidate compounds selection and also for developing libraries of
novel compounds from which such candidates can be selected. Chapter 7
discusses the use of solid-phase synthesis for the high throughput production of peptides and other small molecules. In addition, as discussed in

Chapter 6 on peptidomimetics, the swift production of novel leads holds
considerable promise for future discovery of novel therapeutic agents.
The investigation of therapeutic targets for cannabinoid sites of action has already generated considerable interest within the field of drug
discovery, and Chapter 4, which details the results of such studies, highlights the importance of target-based studies. The enhanced appreciation
of the role of stereochemistry in drug action has focused efforts on understanding the conformation of drugs as they bind to their target receptor.
Studies of the diverse effects of cannabinoids and the development of
compounds that employ the information gleaned from the ligand/receptor
data should provide substantial insight into their molecular mechanisms of
action. Future research will promote the development of drugs that are
capable of higher specificity. longer half-lives, and lessened toxicity. In
studies of potential antiviral therapies, the understanding of viral target
molecules is essential for the production of effective medications that interact specifically in the viral life cycle and gene products, which will result
in lowering drug toxicity to the host and enhancing the antiviral activity of
the pharmacotherapy. As the nature of viral infectivity, cell growth, death,
and receptor biology are elucidated, the methods and paradigms for development of highly specific medications will provide superior treatments
for a number of diseases that pose a terrible burden worldwide (Chapters 10
and 11).
From the fields of proteomics and genomics that hold significant
promise for unique medications, several areas of biology have also found
applications in the drug discovery arena. The study of regulatory molecules
and oncogenes has opened new avenues in drug therapy, as discussed in
Chapter 8 on G-protein-coupled receptors and Chapter 2 on SRC homology domains. Research on protein misfolding (Chapter 9), which has been
implicated in neurodegenerative diseases, has highlighted the need to
enhance our understanding of structural alterations in normal proteins
products. Chapter 1 details the development of such research, and asserts
that only as we understand the basic physical mechanisms of such alterations can new therapeutic regimens be proposed and tested.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.



The topics included in this volume are not intended to be allinclusive. Our approach has been eclectic, in an effort to bring the reader
the most exciting aspects of drug discovery, along with the methods that
show the most promise in enhancing the discovery process.
The chapters presented in this book have been contributed by specialists in their areas of research and will provide a contemporary picture of
the overall field of drug discovery to scientists from diverse disciplines.
Alexandros Makriyannis
Diane Biegel

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Contents

Preface
Contributors

1. Protein Crystallography in Structure-Based Drug Design
Xiayang Qiu and Sherin S. Abdel-Meguid
2. Src Homology-2 Domains and Structure-Based, SmallMolecule Library Approaches to Drug Discovery
Chester A. Metcalf III and Tomi Sawyer
3. Three-Dimensional Structure of the Inhibited
Catalytic Domain of Human Stromelysin-1 by
Heteronuclear NMR Spectroscopy
Paul R. Gooley
4. Cannabinergics: Old and New Possibilities
Andreas Goutopoulos and Alexandros Makriyannis
5. Development of PET and SPECT Radioligands for
Cannabinoid Receptors
S. John Gatley, Andrew N. Gifford, Yu-Shin Ding,
Ruoxi Lan, Qian Liu, Nora D. Volkow, and

Alexandros Makriyannis

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


6. Structural and Pharmacological Aspects of
Peptidomimetics
Peter W. Schiller, Grazyna Weltrowska, Ralf Schmidt,
Thi M.-D. Nguyen, Irena Berezowska, Carole Lemieux,
Ngoc Nga Chung, Katharine A. Carpenter,
and Brian C. Wilkes
7. Linkers and Resins for Solid-Phase Synthesis: 1997-1999
Pan Li, Elaine K. Kolaczkowski, and Steven A. Kates
8. Allosteric Modulation of G-Protein-Coupled
Receptors: Implications for Drug Action
Angeliki P. Kourounakis, Pieter van der Klein,
and Ad P. I. IJzerman
9. Protein Misfolding and Neurodegenerative Disease:
Therapeutic Opportunities
Harry LeVine III
10. Uncoating and Adsorption Inhibitors of Rhinovirus
Replication
Guy D. Diana and Adi Treasurywala
11. Profiles of Prototype Antiviral Agents Interfering
with the Initial Stages of HIV Infection
E. De Clercq

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.



Contributors

Sherin S. Abdel-Meguid Suntory Pharmaceutical Research Laboratories,
Cambridge, Massachussets, U.S.A.
Irena Berezowska
Quebec, Canada

Clinical Research Institute of Montreal, Montreal,

Katharine A. Carpenter
treal, Quebec, Canada
Ngoc Nga Chung
Quebec, Canada

Clinical Research Institute of Montreal, Mon-

Clinical Research Institute of Montreal, Montreal,

Eric De Clercq Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
Guy D. Diana ViroPharma, Inc. Exton, Pennsylvania, U.S.A.
Yu-Shin Ding
U.S.A.

Brookhaven National Laboratory, Upton, New York,

S. John Gatley
U.S.A.

Brookhaven National Laboratory, Upton, New York,


Andrew N. Gifford
York, U.S.A.
Paul R. Gooley

Brookhaven National Laboratory, Upton, New

University of Melbourne, Parkville, Victoria, Australia

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Andreas Goutopoulos Serono Reproductive Biology Institute, Rockland,
Massachusetts, U.S.A.
Ad P. IJzerman

Leiden University, Leiden, The Netherlands

Steven A. Kates

Surface Logix, Inc., Brighton, Massachusetts, U.S.A.

Elaine K. Kolaczkowski
chussetts, U.S.A.

Vertex Pharmaceuticals, Cambridge, Massa-

Angeliki P. Kouranakis
Greece

University of Thessaloniki, Thessaloniki,


Ruoxi Lan

University of Connecticut, Storrs, Connecticut, U.S.A.

Carole Lemieux
Quebec, Canada

Clinical Research Institute of Montreal, Montreal,

Harry LeVine III

University of Kentucky, Lexington, Kentucky, U.S.A.

Pan Li Vertex Pharmaceuticals, Cambridge, Massachusettes, U.S.A.
Qian Liu

University of Connecticut, Storrs, Connecticut, U.S.A.

Alexandros Makriyannis
U.S.A.
Chester A. Metcalf III
sachusetts, U.S.A.
Thi M.-D. Nguyen
Quebec, Canada

University of Connecticut, Storrs, Connecticut,

ARIAD Pharmaceuticals, Inc., Cambridge, Mas-


Clinical Research Institute of Montreal, Montreal,

Xiayang Qiu SmithKline Beecham Pharmaceuticals, King of Prussia,
Pennsylvania, U.S.A.
Tomi Sawyer
setts, U.S.A.

ARIAD Pharmaceuticals, Inc., Cambridge, Massachu-

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Clinical Research Institute of Montreal, Montreal,

Peter W. Schiller
Quebec, Canada
Ralf Schmidt
Canada

Clinical Research Institute of Montreal, Montreal, Quebec,

Pfizer Central Research, Groton, Connecticut, U.S.A.

Adi Treasurywala
Pieter van der Klein
Nora D. Volkow

NIDA, Bethesda, Maryland, U.S.A.

Grazyna Weltrowska

Quebec, Canada
Brian C. Wilkes
Quebec, Canada

Leiden University, Leiden, The Netherlands

Clinical Research Institute of Montreal, Montreal,

Clinical Research Institute of Montreal, Montreal,

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


1
Protein Crystallography in
Structure-Based Drug Design
Xiayang Qiu
SmithKline Beecham Pharmaceuticals, King of Prussia,
Pennsylvania, U.S.A.

Sherin S. Abdel-Meguid
Suntory Pharmaceutical Research Laboratories, Cambridge,
Massachusetts, U.S.A.

I.

INTRODUCTION

Proteins are responsible for a wide variety of important biological functions in living organisms and are commonly used as targets of therapeutic
agents. A unique primary and tertiary structure is a hallmark property of a

protein. Although several related and even unrelated proteins may share
the same overall tertiary structure or fold, each will differ from the others in
the details. Knowledge of the detailed atomic three-dimensional structure
of the protein and/or its ligand complexes should facilitate the design of
novel, high affinity ligands that interact with that protein. The process of
elucidating the atomic structure of proteins and their complexes, and the
design of novel, therapeutically relevant ligands based on these structure
elucidations, is known as structure-based drug design.
Proteins are complex molecules, typically containing several thousand atoms. Although Pauling and Corey proposed the a helix and the h
sheet as the main secondary structural elements of proteins in 1951, and the
crystal structure of myoglobin was reported by John Kendrew in 1958,

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


crystal structure determination in the early days were hampered by
numerous technical limitations and usually required many years of hard
work. By the mid-1980s, substantial improvements in data acquisition
software and hardware had considerably accelerated the speed with which
a crystal structure could be determined. Trying to capitalize on the
potential of structure-based drug design, several pharmaceutical companies built their own protein crystallography laboratories, and a number of
structure-based drug design efforts emerged in industrial and academic
laboratories [1].
In the past 10 years, we have experienced a sudden burst in the
number of protein three-dimensional structures determined. By the end of
the twentieth century, merely 40 years since the first protein structure was
solved, there were over 11,000 structures deposited in the Protein Data
Bank (PDB). Although each entry is not a unique protein, the number of
novel structures deposited in the PDB has increased sharply during the last
decade. These proteins include not only soluble proteins, but also a number

of membrane proteins. Furthermore, structures of protein–protein and
protein–nucleic acid complexes, viruses, and the ribosome are also available. This marvelous scientific achievement was mostly credited to the
method of single-crystal x-ray diffraction (protein crystallography),
although a notable number of structures were determined by means of
NMR spectroscopy. Many factors in addition to the incredible advances in
computer hardware and software contributed to the improved efficiency
and precision in protein crystallography: the advent of molecular biology,
which allows for cloning, mutation, and overexpression of many targets
that are difficult to isolate from natural sources; advances in protein
purification that facilitate the production of large amounts of highly
purified proteins; improvement in protein characterization and crystallization strategies; enhancement of data acquisition techniques and equipments; access to powerful synchrotron radiation sources; and introduction
of the selenomethionine multiple-wavelength anomalous diffraction
(MAD) procedure for phase determination. Currently, almost all large
pharmaceutical and numerous biotechnology companies have established
in-house macromolecular crystallography units, and the crystallographic
community is solving thousands of new structures every year. With
structural information becoming more readily available, structure-based
drug design has become an integral part of the modern drug discovery
process and has begun to contribute to a significant portion of the current
drug discovery portfolio.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Identifying and bringing a successful small-molecule drug to the
market requires considerable effort, which typically costs millions of
dollars and may span as much as10 years. With this time scale in mind,
one must realize that structure-based drug design is still in its infancy,
having started in earnest in the mid-1980s. While the concept of such a
rational approach has been around for some time, for much of the work

in the field it is still too early to demonstrate market successes. Moreover, although structural knowledge may be used for lead generation
and lead optimization, or even for addressing some developability issues,
it does little to address other important issues in drug development
ranging from the appropriateness of targets or disease models to government regulatory issues or changing market forces. In fact, drug
discovery is a risky business in that only a very small number of compounds are able to find their way to the market. Therefore, the successful structure-based design and the launch of inhibitors of HIV protease
[2] and influenza virus neuraminidase [3] as drugs are particularly
encouraging events for the field [4–9]. In this chapter, we will introduce
the technique of protein crystallography and its use in structure-based
drug design, point out the technical challenges ahead of us, and report
many practical lessons learned during the past decade of structure-based
drug design.

II. THE DRUG DISCOVERY PROCESS
The many steps of the complex and multidisciplinary drug discovery
process can be grouped into four major phases: target identification and
validation, lead identification, lead optimization, and biological testing
(Fig. 1). Choosing an appropriate target is usually the first step in the drug
discovery process. Target selection requires an understanding of human
diseases and the biological processes that lead to a particular disease.
Although historically drugs (e.g., h-lactams) were discovered without
knowledge of their molecular target, knowing the target greatly enhances
one’s ability to discover novel drugs in a timely fashion. Recent advances in
sequencing the human genome, as well as the genomes of many human
pathogens, have provided a large pool of potential novel molecular targets.
Most future drug discovery efforts will start with a relatively unknown gene
selected from a sequence database based on one or more attractive features
that could provide a hint of its function, such as tissue distribution, genome

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.



Figure 1

Simplified drug discovery process.

localization, and/or sequence homology or structural analogy to a known
protein. Cloning, expression, purification, and characterization of the
protein target and other tool reagents such as antibodies or receptors will
usually follow, to be used in target validation with a set of appropriate
genetic and biological assays.
The second step is to identify a suitable lead molecule to interact
with the molecular target. This is usually achieved through high throughput screening of available chemical compound libraries and natural
products, typically containing hundreds of thousands of compounds.
Although the size of the library per se is not critical, a library that contains
a large number of molecules is essential to assure molecular diversity.
Novel lead molecules can also be designed by analysis of the threedimensional structure of the target molecule in a process known as de
novo design. A desirable lead should usually have at least low micromolar
binding potency against the target and should be amenable to further
synthetic manipulations.
The third step is to optimize the lead molecule through iterative
chemical synthesis and biological testing, aiming to obtain molecules with
the required potency (typically nanomolar), selectivity, bioavailability,
and DMPK (drug metabolism and pharmacokinetics) properties. This step
usually requires considerable time and resources; usually the synthesis of
hundreds of compounds is needed to deduce a robust SAR (structure–
activity relationship). Such resources can be considerably reduced and the

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.



time significantly shortened if optimization employs knowledge from the
three-dimensional structure of complexes of leads with the target.
The last step of the drug discovery process involves the testing of lead
compounds to address issues such as efficacy, bioavailability, and safety.
Testing may include in vitro assays but ultimately would require a suitable
disease model and studies in animals. Many compounds may need to be
designed and synthesized to identify the one compound with all the desired
properties. Such a compound can be advanced to preclinical studies and
eventually to the clinic.

III. THE STRUCTURE-BASED DRUG DESIGN CYCLE
Timely optimization of lead compounds requires knowledge of the threedimensional structure of target–ligand complexes. Protein crystallography
has been the predominant technique used to elucidate the three-dimensional structure of proteins in structure-based drug design. Crystallographic studies usually consume tens of milligrams of pure protein and
take several months to yield the first crystal structure. Therefore, one
should start crystallographic efforts as soon as suitable material is available, preferably even before initiation of high throughput screening. Once
a lead has been identified through high throughput screening or de novo
design, structure determinations of target–ligand complexes should be
pursued. The use of information derived from the structure determination
of the target bound to the initial lead molecule should allow for the design
and synthesis of new ligands with improved properties, as well as the
initiation of further rounds of structure-based design. Through iterations
of structure determination, design, synthesis, and biological testing (Fig. 2)
a drug candidate should emerge.
In addition to lead optimization and lead identification, three-dimensional structures of the target–ligand complexes can contribute to the
traditional drug discovery process in other ways. For example, structural
information combined with genomic sequences may aid in target identification by helping to classify genes with unknown functions. Structures
can be used as templates for de novo design or in silico lead identification
through screening of virtual libraries. Structural information can provide a
basis for the design of directed combinatorial libraries [10,11]. Moreover,
structural studies of leads with serum albumin and various cytochrome

P450s should allow for a better understanding of some of the developability issues that may arise during drug development.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Figure 2

Structure-based design cycle.

IV. PROTEIN CRYSTALLOGRAPHY
For most noncrystallographers, protein crystallography tends be a black
box full of jargon. Here, we give a brief overview of the technology in an
attempt to demystify some of the terms used.

A. Crystallization
Obtaining large single crystals that diffract to high resolution remains the
primary bottleneck of protein crystallography. The most widely used

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


crystallization method is the hanging-drop method of vapor diffusion
(Fig. 3), in which a drop (1 or 2 AL) of protein is mixed with an equal
volume of a precipitant on a glass coverslip and is sealed over a well
containing the same precipitant added to the protein. Many factors are
known to be important in protein crystallization: protein purity (preferably >95% pure) and concentration (typically 10 mg/mL), the nature of
precipitant [e.g., poly(ethylene glycol) or various salts] and its concentration, the nature, concentration, and pH of the buffer, the presence or
absence of additives (e.g., metal ions, reducing agents, protease inhibitors,
metal chelators, detergents) and effectors (e.g., ligands, cofactors, substrates, inhibitors), the rate of equilibrium between the protein and the
precipitant, the crystallization temperature, and so on. Since there are no

general rules to correlate all these factors to the eventual success in
obtaining crystals, protein crystallization remains a trial-and-error process
and a significant bottleneck in protein crystallography: failure rate is
typically 50% even with thousands of crystallization trials. Many methods
and techniques have been employed to enhance one’s ability to obtain
protein crystals. Molecular biology and biochemical methods have been
utilized to generate domains of large proteins that may be less flexible and
thus more amenable to crystallization. Biophysical tools such as dynamic
light scattering [12] and ultracentrifugation [13] have been used to study
protein aggregation in solution. Molecular biology has been employed to
generate mutants that do not aggregate or are more soluble. Crystallization trials using incomplete factorial designs [14] allow the screening of a
much wider range of conditions with a modest number of experiments, and

Figure 3 Protein crystallization: diagrammatic representation of the hangingdrop method of vapor diffusion.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


thus less protein. Miniaturization and automation made possible by the
use of advanced crystallization robots may also have a great impact on the
future success of protein crystallization.

B. X-Ray Diffraction Data Acquisition
The next step is to measure x-ray diffraction data from a single crystal
(Fig. 4). Data are usually measured by means of an area detector
such as a phosphorus image plate or a charge-coupled device (CCD).
Through several steps of computational analysis, the position and
amplitude or intensity of the each diffraction spot can be obtained.
Because diffraction intensities are proportional to the volume of the
crystal and generally decrease at higher resolution, protein crystals must

be reasonably large to give strong enough diffraction signals at high
resolution. While a cube of 0.1 to 0.5 mm in each dimension is still
preferred by most crystallographers, the availability of powerful synchrotron radiation sources has made the analysis of much smaller
crystals feasible. Crystals also must be stable enough in the x-ray beam
to allow the measurement of a complete diffraction data set from a
single crystal. In this regard, flash-freezing of protein crystals under
proper conditions at cryogenic temperatures [15] has virtually eliminated
radiation decay problems.

Figure 4 Diagrammatic representation of single-crystal x-ray diffraction and
data collection.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


C. Phasing
The ultimate goal of an x-ray diffraction experiment is to produce an
electron density map that is then used to build an atomic model of the
molecule being studied (Fig. 5). The use of single-crystal x-ray diffraction
techniques to determine the three-dimensional structure of molecules
requires the measurement of amplitudes and the calculation of phases
for each diffraction spot. Although amplitudes can be directly measured
from diffracting crystals, as noted earlier, phases are indirectly determined. The inability to directly measure phases is known as the ‘‘phase
problem’’ [16]. In practice, there are several ways to get around the phase
problem. If the protein of interest is small (f100 amino acids) and highresolution data (1.2 A˚ or better) are available, phases can be obtained
computationally by using the so-called direct method. This is basically the
same technique used to determine crystal structures of small organic
molecules. If the protein being studied is known to have a fold similar
to that of a protein with a known three-dimensional structure, one uses the
molecular replacement (MR) method, in which the known structure serves

as a model to generate approximate phases that are then refined against
the experimental data obtained from crystals of the protein under study.
Until recently, multiple isomorphous replacement (MIR) was the most
widely used method for ab initio phase determination. This technique
requires the introduction into the protein under study of atoms of high
atomic number (heavy atoms) such as mercury, platinum, and uranium,
without disrupting the protein’s three-dimensional structure or the
packing in the crystal. This is achieved by soaking crystals in a solution

Figure 5 Steps in the use of protein crystallography for structure determination.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


containing the desired heavy atom. The binding of one or more heavy
atoms to the protein alters the diffraction of the crystals from that of
the underivatized (native) crystals. If the introduction of heavy atoms is
truly isomorphous, the differences between the diffraction of the
derivative and of the native will represent only contributions from the
heavy atom(s). Thus, the problem of structure determination is reduced
to locating the position of one or a few heavy atoms. Once their
positions have been accurately determined, the heavy atoms are used to
calculate phases for all diffraction intensities. In theory, one needs only
two isomorphous derivatives, but in practice more are needed owing to
errors that are introduced in data measurement as well as the lack of
isomorphism. Multiple-wavelength anomalous dispersion (MAD) phasing, cited earlier, has gained popularity in the last 10 years, and this
more recent technique for ab initio phase determination is now the
predominant method in de novo structure determination. In the MAD
technique, cells that overexpress the protein can be grown in media
containing selenomethionine (Se-Met) instead of methionine, producing

proteins that have Se-Met at all the methionine positions. Because of
the unique absorption quality of Se, diffraction data can be measured
by using a Se-Met-substituted crystal at three or four different wavelengths around the Se absorption edge. These data can be analyzed by
using computational methods to generate phase information, allowing
an electron density map to be calculated [17]. Such an experiment calls
for modern synchrotron facilities.

D. Model Building and Refinement
Once an electron density map has become available, atoms may be fitted
into the map by means of computer graphics to give an initial structural
model of the protein. The quality of the electron density map and
structural model may be improved through iterative structural refinement but will ultimately be limited by the resolution of the diffraction
data. At low resolution, electron density maps have very few detailed
features (Fig. 6), and tracing the protein chain can be rather difficult
without some knowledge of the protein structure. At better than 3.0 A˚
resolution, amino acid side chains can be recognized with the help of
protein sequence information, while at better than 2.5 A˚ resolution
solvent molecules can be observed and added to the structural model
with some confidence. As the resolution improves to better than 2.0 A˚
resolution, fitting of individual atoms may be possible, and most of the

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Figure 6 Electron density of an a-helix at different resolutions.

amino acid side chains can be readily assigned even in the absence of
sequence information.

E. Understanding Structural Coordinates

Once a crystal structure has been determined, the information is
communicated in the form of an atomic coordinates file. In addition
to a list of the atomic positions, the coordinates file contains other information that deserves an explanation and requires attention by the
user. Some of the terms included in an atomic coordinates file are
explained briefly. It is hoped that the information will provide the reader
with insights to evaluate the quality of the structure, distinguish between
its well-defined and flexible regions, and make sensible decisions in
structural analysis.
The unit cell is the basic microscopic building block of the crystal. A
crystal can be viewed as a three-dimensional stack of identical unit cells,
each defined by three cell edges (a, b, c, in angstroms), and three angles (a,
h, g in degrees) between each pair of edges. Each unit cell may contain one
or more protein molecules related by crystal symmetry. The unique portion
of the unit cell (i.e., the portion that is not related to other portions by
crystal symmetry) is called the asymmetric unit. There are only 230 different
combinations of symmetry elements in crystals; each of these is called a
space group. However, since biological molecules are enantiomorphic,

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


which means that a protein crystal cannot contain mirror planes, the
number of space groups of relevance to protein crystallography is reduced
to 65. It is possible to have more than one copy of the same protein in an
asymmetric unit. However, these will be related by ‘‘noncrystallographic’’
symmetry. Therefore, all atoms of an asymmetric unit, along with the unit
cell dimensions and the space group, must be given in the coordinates file
for subsequent analysis and for regenerating the structure in any portion of
the unit cell or the crystal, which may be important for studying intermolecular ‘‘crystal packing’’ interactions.
The R-factor is probably the single most important number that

provides a sense of the overall quality of the structure. It is defined as
[A||Fobs| À k*|Fcalc||] / A|Fobs|, where Fobs is the observed structure factor
(the square root of the measured diffraction intensity or amplitude), Fcalc
is the structure factor calculated from the model, and k is a scaling
factor. The factor R is a measurement of the agreement between the
structural model and the observed diffraction data; the lower the
number, the better. For a refined crystal structure, the R factor is often
approximately 10 times its resolution (e.g., 20% for a 2.0 A˚ resolution
structure). Along with the traditional R factor, most of the recent
structures also report an Rfree value, which is obtained from the part
of the diffraction data (5–10%) set aside and not used during structural
refinement. Generally Rfree is 5–10% higher than R; larger discrepancies
between the two may indicate that there is a problem in the structure
model or diffraction data, or that the structure is overrefined against the
data. Reducing R to below 20% used to be the goal for structural
refinement; but obtaining a sensible Rfree is now considered to be more
important. Therefore, before analyzing a crystal structure on computer
graphics, one should check the R factor and Rfree values to get a sense of
the overall quality of the structure. It is important to note that these
values can be reported as percentages (20%) or as fractions (0.20).
The atomic temperature factor, or B factor, measures the dynamic
disorder caused by the temperature-dependent vibration of the atom, as
well as the static disorder resulting from subtle structural differences in
different unit cells throughout the crystal. For a B factor of 15 A˚2, displacement of an atom from its equilibrium position is approximately 0.44
A˚, and it is as much as 0.87 A˚ for a B factor of 60 A˚2. It is very important
to inspect the B factors during any structural analysis: a B factor of less
than 30 A˚2 for a particular atom usually indicates confidence in its
atomic position, but a B factor of higher than 60 A˚2 likely indicates that
the atom is disordered.


Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


For a particular crystal, the number of diffraction data increases as
the resolution increases, which means that more experimental data will
be available for structural refinement. There are four parameters to be
refined for each atom: x, y, z (atomic position), and B (temperature
factor). If the crystal has normal solvent content (i.e., about 50%), the
number of experimental data and refinement parameters will be about the
same at 2.8 A˚ resolution. This suggests that B factors for individual
atoms should be refined only when data have a resolution better than 2.8
A˚. Refinement of atomic B factors at lower resolution will have no
physical meaning, although a lower but meaningless R factor will result.
Identification and refinement of solvent molecules (e.g., waters) become
reliable only when the structure has at least a 2.5 A˚ resolution. Even then,
before a water molecule is used in mechanistic or computational analysis,
it is always wise to check its B factor for the existence of at least one
hydrogen bond to hold the water to the protein. At times, spurious water
molecules are added (such additions will result in a meaningless lower R
factor). Unless the structure has been determined at a reasonably high
resolution, electron density and refinement often do not discriminate
between the oxygen and nitrogen atoms of asparagines and glutamines,
or the alternative conformations of histidine side chains. In a detailed
structural analysis, it may be necessary to check alternative conformations of Asn, Gln, or His side chains and decide which one makes more
sense chemically.

V. IN SILICO LEAD GENERATION
Armed with the crystal structure of the protein–ligand complex and upto-date computer modeling software, one can design additional ligands.
Numerous molecular modeling software programs are available for that
purpose. However, it is important to note that current computational

algorithms have their limitations and utilize many approximations. Therefore, while computer modeling software has been proven useful [4,18],
further testing and structural validations are required to identify the best
possible compound.

A. In Silico Screening of Virtual Compound Libraries
Starting with the crystal structure of the target, it is possible to screen for
leads in three-dimensional compound databases such as the Cambridge

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


Structural Database [19] or the Chemical Abstracts Service Registry [20],
or convert private databases to 3-D structures by programs such as
CONCORD [21]. Several programs are available for such screening. For
example, DOCK [22] works by using a set of overlapping spheres to create
a complementary image of the ligand binding site and essentially matching
the shape of a putative ligand with that of the image to generate a
‘‘goodness of fit’’ score that is then used to rank the hits identified. Instead
of comparing shapes, the program LUDI [23] uses parameters that
describe hydrogen-bonding potential and hydrophobic complementarity
to match the ligand and its binding site. These programs can rapidly search
through three-dimensional databases of small molecules and rank each
candidate. Typically, the 100 to 200 top-scoring compounds are examined
graphically to identify the best 10 to 50 candidates for experimental testing.
In the case of DOCK, 2 to 20% of these in silico hits may show micromolar
binding affinity [4]. Subsequently, crystallography can be used to optimize
any leads identified.

B. Building Leads from Molecular Fragments
Again starting with the crystal structure of the target, another strategy

is to dock small chemical fragments into the ligand binding site, then
grow the fragment to better complement the binding site. Programs
such as GRID [24], AUTODOCK [25], and MCCS [26] can be used
for the docking step. GRID uses small functional groups to probe the
binding site and evaluate interaction energies by using an empirical
Lennard-Jones energy function, as well as electrostatic and hydrogenbonding terms. AUTODOCK uses simulated annealing for ligand
conformational search to dock small ligands of flexible conformations
onto a rigid binding site and a standard force field for rapid grid-type
energy evaluation. MCSS (multicopy simultaneous search) places thousands of copies of functional groups in the binding site and optimizes
them simultaneously to generate energetically favorable positions and
orientations in a flexible binding site. Once selected, suitable binding
fragments can be built into a single compound by manual modeling or
by using linking programs such as CAVEAT [27], which attempts to
identify a suitable cyclic linker from a database. Alternatively, programs like GroupBuild [28] can search compound libraries for potential leads that have the functional fragments identified by the programs
just described.

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.


VI. STRUCTURE-BASED LEAD OPTIMIZATION
Once a chemical lead has been identified, the structure-based lead
optimization process goes through several iterations of structure determination, design, chemical synthesis, and biological testing. The goal is
to optimize the lead in terms of electrostatic interactions, van der Waals
contacts, and the fit in the ligand binding pocket. The design process
may be simple and intuitive if one starts with a relatively high affinity
lead. In this case only minor modifications to the existing lead are
introduced at each of the iterations of the drug design cycle. Many of
these modifications may be either proposed from personal knowledge or
derived by computer modeling. However, it is important to note that
computational methods are still not reliable in predicting binding

modes and affinities of ligands, mainly because of inaccuracies in force
fields, limitations in dealing with ligand and target flexibility, and the
lack of a reliable scoring functions, as well as the difficulties in treating
solvent molecules. Therefore, even for seemingly minor modifications of
the leads, it is still necessary to confirm the binding mode experimentally; there are countless examples in which the mode of binding
significantly changes upon introduction of minor modifications to the
original lead.

VII.

EXPERIENCE WITH STRUCTURE-BASED
DRUG DESIGN

Any summary of experience gained during the last 15 years in the area of
structure-based drug design is in some way a work in progress, and clearly
there is much that we still need to learn.

A. Design Should Be Based on Liganded Structures
Many proteins undergo considerable conformational change upon binding
to their ligands. Initiating ligand design based on an unliganded structure
may be misleading if that structure is of a protein that will change its
conformation upon ligand binding. To be on the safe side, one should
always start ligand design based on a liganded structure of the target
protein. An example of a protein that undergoes large conformational
change upon ligation is EPSP (5-enol-pyruvyl-3-phosphate) synthase. The

Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.



×