Tải bản đầy đủ (.pdf) (278 trang)

Ebook Analysis of genes and genomes: Part 1

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.73 MB, 278 trang )

Analysis of Genes
and Genomes

Richard J. Reece
University of Manchester, UK

John Wiley & Sons, Ltd



Analysis of Genes
and Genomes



Analysis of Genes
and Genomes

Richard J. Reece
University of Manchester, UK

John Wiley & Sons, Ltd


Copyright  2004

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England
Telephone (+44) 1243 779777

Email (for orders and customer service enquiries):


Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning
or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the
terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London
W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should
be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate,
Chichester, West Sussex PO19 8SQ, England, or emailed to , or faxed to
(+44) 1243 770620.
This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering
professional services. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.
Library of Congress Cataloging-in-Publication Data
Reece, Richard J.
Analysis of genes & genomes / Richard J. Reece.
p. ; cm.
Includes bibliographical references and index.
ISBN 0-470-84379-9 (cloth : alk. paper) – ISBN 0-470-84380-2 (paper : alk. paper)
1. Molecular genetics – Research – Methodology. 2. Genetic engineering – Research – Methodology.
[DNLM: 1. Genetic Techniques. 2. DNA–analysis. 3. Genome. QZ 52 R322a 2003]

I. Title: Analysis of genes and genomes. II. Title.
QH442.R445 2003
572.8 6 – dc21
2003012937
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-470-84379-9 (HB)
0-470-84380-2 (PB)
Typeset in 11/14pt Sabon by Laserwords Private Limited, Chennai, India
Printed and bound in Italy by Conti Tipocolor SpA, Florence
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.


For Judith



Contents
Preface
Acknowledgements
Abbreviations and acronyms

xiii
xv
xvii

1 DNA: Structure and function
1.1 Nucleic acid is the material of heredity
1.2 Structure of nucleic acids

1.3 The double helix
1.3.1 The antiparallel helix
1.3.2 Base pairs and stacking
1.3.3 Gaining access to information with the double
helix without breaking it apart
1.3.4 Hydrogen bonding
1.4 Reversible denaturing of DNA
1.5 Structure of DNA in the cell
1.6 The eukaryotic nucleosome
1.7 The replication of DNA
1.8 DNA polymerases
1.9 The replication process
1.10 Recombination
1.11 Genes and genomes
1.12 Genes within a genome
1.13 Transcription
1.13.1 Transcription in prokaryotes
1.13.2 Transcription in eukaryotes
1.14 RNA processing
1.14.1 RNA splicing
1.14.2 Alternative splicing
1.15 Translation

1
2
7
11
12
14


2 Basic techniques in gene analysis
2.1 Restriction enzymes
2.1.1 Types of restriction–modification system
2.1.2 Other modification systems

65
66
70
72

16
17
18
21
24
28
31
33
37
39
40
43
43
46
54
55
58
59



viii

CONTENTS

2.1.3 How do type II restriction enzymes work?
2.2 Joining DNA molecules
2.3 The basics of cloning
2.4 Bacterial transformation
2.4.1 Chemical transformation
2.4.2 Electroporation
2.4.3 Gene gun
2.5 Gel electrophoresis
2.5.1 Polyacrylamide gels
2.5.2 Agarose gels
2.5.3 Pulsed-field gel electrophoresis
2.6 Nucleic acid blotting
2.6.1 Southern blotting
2.6.2 The compass points of blotting
2.7 DNA purification

74
76
78
84
86
87
88
88
89
89

95
98
100
102
103

3 Vectors
3.1 Plasmids
3.1.1 pBR322
3.1.2 pUC plasmids
3.2 Selectable markers
3.3 λ vectors
3.4 Cosmid vectors
3.5 M13 vectors
3.6 Phagemids
3.7 Artificial chromosomes
3.7.1 YACs
3.7.2 PACs
3.7.3 BACs
3.7.4 HACs

109
112
116
119
122
126
135
137
140

142
143
146
148
149

4 Polymerase chain reaction
4.1 PCR reaction conditions
4.2 Thermostable DNA polymerases
4.3 Template DNA
4.4 Oligonucleotide primers
4.4.1 Synthesis of oligonucleotide primers
4.5 Primer mismatches
4.6 PCR in the diagnosis of genetic disease
4.7 Cloning PCR products

153
159
162
164
165
167
169
173
175


CONTENTS

4.8 RT–PCR

4.9 Real-time PCR
4.10 Applications of PCR

ix
177
179
181

5 Cloning a gene
5.1 Genomic libraries
5.2 cDNA libraries
5.3 Directional cDNA cloning
5.4 PCR based libraries
5.5 Subtraction libraries
5.6 Library construction in the post-genome era

183
185
191
196
199
200
204

6 Gene identification
6.1 Screening by nucleic acid hybridization
6.2 Immunoscreening
6.3 Screening by function
6.4 Screening by interaction
6.5 Phage display

6.6 Two-hybrid screening
6.6.1 Problems, and some solutions, with two-hybrid
screening
6.7 Other interaction screens – variations on a theme
6.7.1 One hybrid
6.7.2 Three hybrid
6.7.3 Reverse two hybrid

205
206
211
216
217
218
218
225
228
229
229
229

7 Creating mutations
7.1 Creating specific DNA changes using primer extension
mutagenesis
7.2 Strand selection methods
7.2.1 Phosphorothioate strand selection
7.2.2 dut− ung− (or Kunkel) strand selection
7.3 Cassette mutagenesis
7.4 PCR based mutagenesis
7.5 QuikChange mutagenesis

7.6 Creating random mutations in specific genes
7.7 Protein engineering

231
233
237
237
238
240
241
248
250
254

8 Protein production and purification
8.1 Expression in E. coli
8.1.1 The lac promoter

257
258
259


x

CONTENTS

8.1.2 The tac promoter
8.1.3 The λPL promoter
8.1.4 The T7 expression system

Expression in yeast
8.2.1 Saccharomyces cerevisiae
8.2.1.1 The GAL system
8.2.1.2 The CUP1 system
8.2.2 Pichia pastoris
8.2.3 Schizosaccharomyces pombe
Expression in insect cells
Expression in higher-Eukaryotic cells
8.4.1 Tet-on/Tet-off system
Protein purification
8.5.1 The His-tag
8.5.2 The GST-tag
8.5.3 The MBP-tag
8.5.4 IMPACT
8.5.5 TAP-tagging

259
260
261
265
265
266
268
268
269
269
272
272
275
276

279
282
282
286

9 Genome sequencing projects
9.1 Genomic mapping
9.2 Genetic mapping
9.3 Physical mapping
9.4 Nucleotide sequencing
9.4.1 Manual DNA sequencing
9.4.2 Automated DNA sequencing
9.5 Genome sequencing
9.6 The human genome project
9.7 Finding genes
9.8 Gene assignment
9.9 Bioinformatics

287
289
290
293
295
296
300
303
305
307
309
311


8.2

8.3
8.4
8.5

10 Post-genome analysis
10.1 Global changes in gene expression
10.1.1 Differential display
10.1.2 Microarrays
10.1.3 ChIPs with everything
10.2 Protein function on a genome-wide scale
10.3 Knock-out analysis
10.4 Antisense and RNA interference (RNAi)

313
314
315
317
324
327
327
329


CONTENTS

10.5 Genome-wide two-hybrid screens
10.6 Protein detection arrays

10.7 Structural genomics

xi
333
335
335

11 Engineering plants
11.1 Cloning in plants
11.1.1 Agrobacterium tumefaciens
11.1.2 Direct nuclear transformation
11.1.3 Viral vectors
11.1.4 Chloroplast transformation
11.2 Commercial exploitation of plant transgenics
11.2.1 Delayed ripening
11.2.2 Insecticidal resistance
11.2.3 Herbicidal resistance
11.2.4 Viral resistance
11.2.5 Fungal resistance
11.2.6 Terminator technology
11.3 Ethics of genetically engineered crops

341
341
342
347
348
350
354
354

355
356
357
358
358
360

12 Engineering animal cells
12.1 Cell culture
12.2 Transfection of animal cells
12.2.1 Chemical transfection
12.2.2 Electroporation
12.2.3 Liposome-mediated transfection
12.2.4 Peptides
12.2.5 Direct DNA transfer
12.3 Viruses as vectors
12.3.1 SV40
12.3.2 Adenovirus
12.3.3 Adeno-associated virus (AAV)
12.3.4 Retrovirus
12.4 Selectable markers and gene amplification in animal cells
12.5 Expressing genes in animal cells

361
361
362
363
364
364
366

366
367
367
369
371
372
375
378

13 Engineering animals
13.1 Pronuclear injection
13.2 Embryonic stem cells
13.3 Nuclear transfer
13.4 Gene therapy
13.5 Examples and potential of gene therapy

379
381
384
390
396
398


xii

CONTENTS

Glossary


401

Proteins
A1.1
A1.2
A1.3

409
409
410
411

Nobel prize winners

413

References

417

Index

459


Preface
There are few phrases that can elicit such an emotive response as ‘genetic
engineering’ and ‘cloning’. Newspapers and television invariably use these
phrases to describe something that is not quite right – even perhaps against
nature. Genetic engineering and the modification of genes invariably conjures

up images of Frankenstein foods and abnormal animals. During the course of
reading this book, however, I hope that readers will appreciate that genetic
engineering, and the techniques of molecular biology that underpin it, are
essential components to understanding how organisms work. Man has been
playing, often unwittingly, with genes for thousands of years through selective
breeding to promote certain traits that were seen as desirable. We are currently
at a watershed in the way in which we look at genes. Behind us is 50 years of
knowledge of the structure of the genetic material, and ahead is the ability to
see how every gene that we contain responds to other genes and environmental
conditions. Determining the biochemical basis of why certain people respond
differently to drug treatments, for example, may not be possible yet, but the
techniques to address the appropriate questions are in place. The excitement
of entering the post-genome age will go hand-in-hand with concerns over what
we have the ability to do – whether we actually do it or not.
The analysis of genes and genomes could easily fall into a list of techniques
that can be applied to a particular problem. I have tried to avoid this and,
wherever possible, I have used specific examples to illustrate the problem and
potential solutions. I have relied heavily on published works and have endeavoured to reference all primary material so that interested readers can explore the
topic further. This has also allowed me to place many of the ideas and experiments into a historical context. It seems a common misconception that Watson
and Crick were solely responsible for our understanding of how genes work.
Their contribution should never be underestimated, but the work of many others
should not be discounted. The full sequence of the human genome and, equally
or even more importantly, the genomes of experimentally amenable organisms
provide exceptional opportunities for advances in biological sciences over the
coming years. More and more experiments can now be performed on a genomewide scale and we are just beginning to understand the consequences of this.
One of the main problems that I have encountered during the writing of this
text is attaining a balance between depth and coverage. I have purposefully


xiv


PREFACE

concentrated on more amenable experimental systems – E. coli for prokaryotes
and yeast for eukaryotes. In addition, I have treated higher eukaryotes as
being almost exclusively mammals, and especially humans. This is intended
to give readers a flavour of the ideas and experiments that are currently
being undertaken, but also to give a historical framework onto which today’s
experiments may be hung. We ignore the past at our peril. This approach
has, however, led to the exclusion of some other systems, e.g. Drosophila
and prokaryotes other than E. coli, but is by no means meant as a slight to
these neglected fields. Rather than either covering all fields in scant detail or
explaining the intricate details and nuances of only a few, I have attempted to
provide a broad overview that is punctuated with specific examples. Whether I
have succeeded in getting the balance right I will leave to individual readers. I
can say for certain, however, that there has never been a more exciting time to
study biology, and I hope that this is reflected in this text.

Richard J. Reece
The University of Manchester
October 2003


Acknowledgements
I have had a great deal of help in writing this book. Of course, omissions
and inaccuracies are entirely my responsibility, but I thank those who have
(hopefully) kept these to a minimum – David Timson, Noel Curtis, Cristina
Merlotti, Chris Sellick, Carolyn Byrne, Ray Boot-Handford and Ged Brady.
I am also very grateful to Robert Slater (University of Hertfordshire) and to
Mick Tuite (University of Kent) for their immensely helpful comments and

suggestions. I thank the many friends and colleagues, mentioned in the text,
who have so generously provided both figures for the book and for permission
to cite their work. I am also deeply indented to Jordi Bella for showing me
that molecular graphics programmes are usable by idiots. Nicky McGirr at
John Wiley persuaded me that this project was a good idea. Her boundless
enthusiasm and encouragement saw me through the times when I was not so
sure and, of course, she was right. The ‘guinea pigs’ for many of the ideas
presented here have been successive years of Genetic Engineering students at
The University of Manchester. I thank the many of them who read parts of the
manuscript, and all of them for challenging me, and many of my preconceived
ideas. Judith, Daniel and Kathryn have been incredibly patient throughout
the inception and writing of this book. Readers who find it useful should be
thanking them, not me. Finally, I want to thank my teachers – Tony Maxwell
and Mark Ptashne – who, each in his own way, have true passion for science
and an insistence that the right experiments are done.



Abbreviations and acronyms
AAT
AAV
AD
BAC
CaMV
CAP
CBD
CDK
cDNA
CFI
CFII

CHEF
ChIP
CMV
CPSF
CStF
CTD
DBD
DEAE
DHFR
DNA
DTT
ECM
EMS
ER
ES
EST
FIGE
FISH
FRET
GST
HAC
HAT
H-DAC

α1 -antitrypsin
adeno-associated virus
activation domain
bacterial artificial chromosome
cauliflower mosaic virus
catabolite activator protein

chitin binding domain
cyclin-dependent kinase
complementary DNA
cleavage factor I
cleavage factor II
contour-clamped homogeneous electric field
chromatin immunoprecipitation
cytomegalovirus
cleavage and polyadenylation specificity factor
cleavage stimulation factor
carboxy-terminal repeat domain
DNA binding domain
diethylaminoethanol
dihydrofolate reductase
deoxyribonucleic acid
dithiothreitol
extra-cellular matrix
ethyl methane sulphonate
endoplasmic reticulum
embryonic stem
expressed sequence tag
field inversion gel electrophoresis
fluorescent in situ hybridization
fluorescence resonance energy transfer
glutathione S-transferase
human artificial chromosome
histone acetyltransferase
histone deacetylase



xviii

ABBREVIATIONS AND ACRONYMS

HSV
IMAC
IMPACT
ITR
LTR
MBP
mRNA
MCS
MLP
MSV
NLS
OD
ORF
PABII
PAC
PAP
PCR
PFGE
RdRp
RF
RFLP
RIP
RISC
RNAi
rRNA
RT

RT-PCR
SAM
SDS
siRNAs
SNP
snRNP
SRB
STS
SV40
TAF

herpes simplex virus
immobilized metal ion affinity chromatography
intein mediated purification with an affinity chitin binding tag
inverted terminal repeat
long terminal repeat
maltose binding protein
messenger RNA
multiple cloning site
major late promoter
maize streak virus
nuclear localization signal
optical density
open reading frame
polyA binding protein II
P1 artificial chromosome
polyA polymerase
polymerase chain reaction
pulsed-field gel electrophoresis
RNA-dependent RNA polymerase

release factor
replicative form
restriction fragment length polymorphism
ribosome inactivating protein
RNA induced silencing complex
RNA interference
ribosomal RNA
reverse transcription
reverse transcriptase
reverse transcription-polymerase chain reaction
S-adenosylmethionine
sodium dodecyl sulphate
small inhibiting RNAs
single-nucleotide polymorphism
small nuclear ribonucleoprotein
suppressor of RNA polymerase B
sequence tagged site
simian virus 40
TATA-box binding associated factor


ABBREVIATIONS AND ACRONYMS

TBP
TdT
TGMV
TK
tRNA
VA RNAs
VNTR

YAC

TATA-box binding protein
terminal deoxynucleotidal transferase
tomato golden mosaic virus
thymidine kinase
transfer RNA
viral associated RNAs
variable number tandem repeat
yeast artificial chromosome

xix



1

DNA: Structure
and function
Key concepts
The genetic information is contained within nucleic acids
DNA is a double-stranded antiparallel helix
Base pairing (A to T and G to C) holds the two strands of the
helix together
DNA replication occurs through the unwinding of the DNA strands
and copying each strand
The central dogma of molecular biology:


DNA makes RNA makes protein


Transcription is the production of an RNA copy of one of the
DNA strands
Translation is decoding of an RNA molecule to produce protein
Every organism possesses the information required to construct and maintain
a living copy of itself. The basic concepts of heredity and, as a consequence,
genes can be traced back to 1865 and the studies of Gregor Mendel – discussed
by Orel (1995). From the results of his breeding experiments with peas, Mendel
concluded that each pea plant possessed two alleles for each gene, but only
displayed a single phenotype. Perhaps the most remarkable achievement of
Mendel was his ability to correctly identify a complex phenomenon with
no knowledge of the molecular processes involved in the formation of that
phenomenon. Hereditary transmission through sperm and egg became known
about the same time and Ernst Haeckel, noting that sperm consists largely of
nuclear material, postulated that the nucleus was responsible for heredity.
Analysis of Genes and Genomes Richard J. Reece
 2004 John Wiley & Sons, Ltd ISBNs: 0-470-84379-9 (HB); 0-470-84380-2 (PB)


2

1.1

DNA: STRUCTURE AND FUNCTION 1

Nucleic Acid is the Material of Heredity
The idea that genetic material is physically transmitted from parent to offspring
has been accepted for as long as the concept of inheritance has existed. Both
proteins and nucleic acid were considered as likely candidates for the role of the
genetic material. Until the 1940s, however, many scientists favoured proteins.

There were two main reasons for this. Firstly, proteins are abundant in cells;
although the amount of an individual protein varies considerably from one cell
type to another, the overall protein content of most cells accounts for over 50%
of the dry weight. Secondly, nucleic acids appeared to be too simple to convey
the complex information presumed to be required to convey the characteristics
of heredity. DNA (deoxyribonucleic acid) was first isolated in 1869 by the Swiss
chemist Johann Frederick Miescher. He separated nuclei from the cytoplasm
of cells, and then isolated an acidic substance from these nuclei that he called
nuclein. Miescher showed that nuclein contained large amounts of phosphorus
and no sulphur, characteristics that differentiated it from proteins. In what
proved to be a remarkable insight, he suggested that ‘if one wants to assume
that a single substance. . . is the specific cause of fertilization then one should
undoubtedly first of all think of nuclein’.
In 1926, based on the idea that DNA contained approximately equal amounts
of four different groups, called nucleotides, and by determining the type of
linkage that joined the nucleotides together, Levene and Simms proposed a
tetranucleotide structure (Figure 1.1) to explain the chemical arrangement of
nucleotides within nucleic acids (Levene and Simms, 1926). They proposed
a very simple four-nucleotide unit that was repeated many times to form
long nucleic acid molecules. Because the tetranucleotide structure was relatively simple, it was widely believed that nucleic acids could not provide
the chemical variation expected of the genetic material. Proteins, on the
OH
HO

PO
HO

Sugar
PO
HO


Adenine
Sugar

Uracil

PO

Sugar

HO

PO

Guanine
Sugar

Cytidine

Figure 1.1. The tetranucleotide model for nucleic acid structure proposed by Levene
and Simms in 1926. At the time that this model was proposed, it was thought that plant
and animal nucleic acid might be different, and the differences between DNA and RNA
were not fully understood


1.1

NUCLEIC ACID IS THE MATERIAL OF HEREDITY

3


other hand, containing 20 different amino acids, could provide the basis for
substantial variation.
In 1928, Frederick Griffith performed experiments using several different
strains of the bacterium Streptococcus pneumoniae (Griffith, 1928). Some of
the strains used were termed virulent, meaning that they caused pneumonia
in both humans and mice. Other strains were avirulent, and did not cause
illness. Virulent and avirulent strains are morphologically distinct in that the
virulent strains have a polysaccharide capsule surrounding the bacterium and
form smooth, shiny-surfaced colonies when grown on agar plates. Avirulent
bacteria lack the capsule and produce rough colonies on the same plates. The
smooth bacteria are virulent because the polysaccharide capsule means that
they are not easily engulfed by the immune system of an infected animal, and
thus are able to multiply and cause pneumonia. The rough bacteria that lack
the polysaccharide capsule do not have this protection and are consequently
readily engulfed and destroyed by the host immune system.
Griffith knew that only living virulent bacteria would produce pneumonia
when injected into mice. If heat-killed virulent bacteria were injected into mice,
no pneumonia would result, just as living avirulent bacteria failed to produce
the disease when similarly injected. Griffith’s critical experiment (Figure 1.2)
involved the injection into mice of living rough bacteria (avirulent) combined
with heat-treated smooth bacteria. Neither cell type caused death in mice
when they were injected alone, but all mice receiving the combined injections
died. The analysis of blood of the dead mice revealed a large number of
living smooth bacteria when grown on agar plates. Griffith concluded that the
heat-killed smooth bacteria were somehow responsible for converting the live
avirulent rough bacteria into virulent smooth ones. He called the phenomenon
transformation, and suggested that the transforming principle might be some
part of the polysaccharide capsule or some compound required for capsule
synthesis, although he noted that the capsule alone did not cause pneumonia.

In 1944, Oswald Avery, Colin MacLeod and Maclyn McCarty published
their work to show that the molecule responsible for the transforming principle
was DNA (Avery, MacLeod and McCarty, 1944). They began by culturing
large quantities of smooth Streptococcus pneumoniae cells. The cells were
harvested from cultures and then heat-killed. Following homogenization and
several extractions with detergent, they obtained an extract that, when tested by
co-injection with live rough bacteria, still contained the transforming principle.
Protein was removed from the extract by several chloroform extractions and
polysaccharides were enzymatically digested and removed. Finally, precipitation
of the resultant fraction with ethanol yielded a fibrous mass that still retained
the ability to induce transformation of the rough avirulent cells. From the


×