Tải bản đầy đủ (.pdf) (492 trang)

reece - analysis of genes and genomes (wiley, 2004)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (10.67 MB, 492 trang )

TeAM
YYeP
G
Digitally signed by TeAM
YYePG
DN: cn=TeAM YYePG,
c=US, o=TeAM YYePG,
ou=TeAM YYePG,
email=
Reason: I attest to the
accuracy and integrity of
this document
Date: 2005.04.26
18:28:28 +08'00'
Analysis of Genes
and Genomes
Richard J. Reece
University of Manchester, UK
John Wiley & Sons, Ltd
Analysis of Genes
and Genomes

Analysis of Genes
and Genomes
Richard J. Reece
University of Manchester, UK
John Wiley & Sons, Ltd
Copyright  2004 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England
Telephone (+44) 1243 779777


Email (for orders and customer service enquiries):
Visit our Home Page on www.wileyeurope.com or www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning
or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the
terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London
W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should
be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate,
Chichester, West Sussex PO19 8SQ, England, or emailed to , or faxed to
(+44) 1243 770620.
This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering
professional services. If professional advice or other expert assistance is required, the services of a
competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.
Library of Congress Cataloging-in-Publication Data
Reece, Richard J.
Analysis of genes & genomes / Richard J. Reece.
p.;cm.
Includes bibliographical references and index.
ISBN 0-470-84379-9 (cloth : alk. paper) – ISBN 0-470-84380-2 (paper : alk. paper)
1. Molecular genetics – Research – Methodology. 2. Genetic engineering – Research – Methodology.

[DNLM: 1. Genetic Techniques. 2. DNA–analysis. 3. Genome. QZ 52 R322a 2003]
I. Title: Analysis of genes and genomes. II. Title.
QH442.R445 2003
572.8

6 – dc21
2003012937
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-470-84379-9 (HB)
0-470-84380-2 (PB)
Typeset in 11/14pt Sabon by Laserwords Private Limited, Chennai, India
Printed and bound in Italy by Conti Tipocolor SpA, Florence
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
For Judith

Contents
Preface xiii
Acknowledgements xv
Abbreviations and acronyms xvii
1 DNA: Structure and function 1
1.1 Nucleic acid is the material of heredity 2
1.2 Structure of nucleic acids 7
1.3 The double helix 11
1.3.1 The antiparallel helix 12
1.3.2 Base pairs and stacking 14
1.3.3 Gaining access to information with the double
helix without breaking it apart 16
1.3.4 Hydrogen bonding 17

1.4 Reversible denaturing of DNA 18
1.5 Structure of DNA in the cell 21
1.6 The eukaryotic nucleosome 24
1.7 The replication of DNA 28
1.8 DNA polymerases 31
1.9 The replication process 33
1.10 Recombination 37
1.11 Genes and genomes 39
1.12 Genes within a genome 40
1.13 Transcription 43
1.13.1 Transcription in prokaryotes 43
1.13.2 Transcription in eukaryotes 46
1.14 RNA processing 54
1.14.1 RNA splicing 55
1.14.2 Alternative splicing 58
1.15 Translation 59
2 Basic techniques in gene analysis 65
2.1 Restriction enzymes 66
2.1.1 Types of restriction–modification system 70
2.1.2 Other modification systems 72
viii CONTENTS
2.1.3 How do type II restriction enzymes work? 74
2.2 Joining DNA molecules 76
2.3 The basics of cloning 78
2.4 Bacterial transformation 84
2.4.1 Chemical transformation 86
2.4.2 Electroporation 87
2.4.3 Gene gun 88
2.5 Gel electrophoresis 88
2.5.1 Polyacrylamide gels 89

2.5.2 Agarose gels 89
2.5.3 Pulsed-field gel electrophoresis 95
2.6 Nucleic acid blotting 98
2.6.1 Southern blotting 100
2.6.2 The compass points of blotting 102
2.7 DNA purification 103
3 Vectors 109
3.1 Plasmids 112
3.1.1 pBR322 116
3.1.2 pUC plasmids 119
3.2 Selectable markers 122
3.3 λ vectors 126
3.4 Cosmid vectors 135
3.5 M13 vectors 137
3.6 Phagemids 140
3.7 Artificial chromosomes 142
3.7.1 YACs 143
3.7.2 PACs 146
3.7.3 BACs 148
3.7.4 HACs 149
4 Polymerase chain reaction 153
4.1 PCR reaction conditions 159
4.2 Thermostable DNA polymerases 162
4.3 Template DNA 164
4.4 Oligonucleotide primers 165
4.4.1 Synthesis of oligonucleotide primers 167
4.5 Primer mismatches 169
4.6 PCR in the diagnosis of genetic disease 173
4.7 Cloning PCR products 175
CONTENTS ix

4.8 RT–PCR 177
4.9 Real-time PCR 179
4.10 Applications of PCR 181
5 Cloning a gene 183
5.1 Genomic libraries 185
5.2 cDNA libraries 191
5.3 Directional cDNA cloning 196
5.4 PCR based libraries 199
5.5 Subtraction libraries 200
5.6 Library construction in the post-genome era 204
6 Gene identification 205
6.1 Screening by nucleic acid hybridization 206
6.2 Immunoscreening 211
6.3 Screening by function 216
6.4 Screening by interaction 217
6.5 Phage display 218
6.6 Two-hybrid screening 218
6.6.1 Problems, and some solutions, with two-hybrid
screening 225
6.7 Other interaction screens – variations on a theme 228
6.7.1 One hybrid 229
6.7.2 Three hybrid 229
6.7.3 Reverse two hybrid 229
7 Creating mutations 231
7.1 Creating specific DNA changes using primer extension
mutagenesis 233
7.2 Strand selection methods 237
7.2.1 Phosphorothioate strand selection 237
7.2.2 dut


ung

(or Kunkel) strand selection 238
7.3 Cassette mutagenesis 240
7.4 PCR based mutagenesis 241
7.5 QuikChange

mutagenesis 248
7.6 Creating random mutations in specific genes 250
7.7 Protein engineering 254
8 Protein production and purification 257
8.1 Expression in E. coli 258
8.1.1 The lac promoter 259
x CONTENTS
8.1.2 The tac promoter 259
8.1.3 The λP
L
promoter 260
8.1.4 The T7 expression system 261
8.2 Expression in yeast 265
8.2.1 Saccharomyces cerevisiae 265
8.2.1.1 The GAL system 266
8.2.1.2 The CUP1 system 268
8.2.2 Pichia pastoris 268
8.2.3 Schizosaccharomyces pombe 269
8.3 Expression in insect cells 269
8.4 Expression in higher-Eukaryotic cells 272
8.4.1 Tet-on/Tet-off system 272
8.5 Protein purification 275
8.5.1 The His-tag 276

8.5.2 The GST-tag 279
8.5.3 The MBP-tag 282
8.5.4 IMPACT 282
8.5.5 TAP-tagging 286
9 Genome sequencing projects 287
9.1 Genomic mapping 289
9.2 Genetic mapping 290
9.3 Physical mapping 293
9.4 Nucleotide sequencing 295
9.4.1 Manual DNA sequencing 296
9.4.2 Automated DNA sequencing 300
9.5 Genome sequencing 303
9.6 The human genome project 305
9.7 Finding genes 307
9.8 Gene assignment 309
9.9 Bioinformatics 311
10 Post-genome analysis 313
10.1 Global changes in gene expression 314
10.1.1 Differential display 315
10.1.2 Microarrays 317
10.1.3 ChIPs with everything 324
10.2 Protein function on a genome-wide scale 327
10.3 Knock-out analysis 327
10.4 Antisense and RNA interference (RNAi) 329
CONTENTS xi
10.5 Genome-wide two-hybrid screens 333
10.6 Protein detection arrays 335
10.7 Structural genomics 335
11 Engineering plants 341
11.1 Cloning in plants 341

11.1.1 Agrobacterium tumefaciens 342
11.1.2 Direct nuclear transformation 347
11.1.3 Viral vectors 348
11.1.4 Chloroplast transformation 350
11.2 Commercial exploitation of plant transgenics 354
11.2.1 Delayed ripening 354
11.2.2 Insecticidal resistance 355
11.2.3 Herbicidal resistance 356
11.2.4 Viral resistance 357
11.2.5 Fungal resistance 358
11.2.6 Terminator technology 358
11.3 Ethics of genetically engineered crops 360
12 Engineering animal cells 361
12.1 Cell culture 361
12.2 Transfection of animal cells 362
12.2.1 Chemical transfection 363
12.2.2 Electroporation 364
12.2.3 Liposome-mediated transfection 364
12.2.4 Peptides 366
12.2.5 Direct DNA transfer 366
12.3 Viruses as vectors 367
12.3.1 SV40 367
12.3.2 Adenovirus 369
12.3.3 Adeno-associated virus (AAV) 371
12.3.4 Retrovirus 372
12.4 Selectable markers and gene amplification in animal cells 375
12.5 Expressing genes in animal cells 378
13 Engineering animals 379
13.1 Pronuclear injection 381
13.2 Embryonic stem cells 384

13.3 Nuclear transfer 390
13.4 Gene therapy 396
13.5 Examples and potential of gene therapy 398
xii CONTENTS
Glossary 401
Proteins 409
A1.1 409
A1.2 410
A1.3 411
Nobel prize winners 413
References 417
Index 459
Preface
There are few phrases that can elicit such an emotive response as ‘genetic
engineering’ and ‘cloning’. Newspapers and television invariably use these
phrases to describe something that is not quite right – even perhaps against
nature. Genetic engineering and the modification of genes invariably conjures
up images of Frankenstein foods and abnormal animals. During the course of
reading this book, however, I hope that readers will appreciate that genetic
engineering, and the techniques of molecular biology that underpin it, are
essential components to understanding how organisms work. Man has been
playing, often unwittingly, with genes for thousands of years through selective
breeding to promote certain traits that were seen as desirable. We are currently
at a watershed in the way in which we look at genes. Behind us is 50 years of
knowledge of the structure of the genetic material, and ahead is the ability to
see how every gene that we contain responds to other genes and environmental
conditions. Determining the biochemical basis of why certain people respond
differently to drug treatments, for example, may not be possible yet, but the
techniques to address the appropriate questions are in place. The excitement
of entering the post-genome age will go hand-in-hand with concerns over what

we have the ability to do – whether we actually do it or not.
The analysis of genes and genomes could easily fall into a list of techniques
that can be applied to a particular problem. I have tried to avoid this and,
wherever possible, I have used specific examples to illustrate the problem and
potential solutions. I have relied heavily on published works and have endeav-
oured to reference all primary material so that interested readers can explore the
topic further. This has also allowed me to place many of the ideas and experi-
ments into a historical context. It seems a common misconception that Watson
and Crick were solely responsible for our understanding of how genes work.
Their contribution should never be underestimated, but the work of many others
should not be discounted. The full sequence of the human genome and, equally
or even more importantly, the genomes of experimentally amenable organisms
provide exceptional opportunities for advances in biological sciences over the
coming years. More and more experiments can now be performed on a genome-
wide scale and we are just beginning to understand the consequences of this.
One of the main problems that I have encountered during the writing of this
text is attaining a balance between depth and coverage. I have purposefully
xiv PREFACE
concentrated on more amenable experimental systems – E. coli for prokaryotes
and yeast for eukaryotes. In addition, I have treated higher eukaryotes as
being almost exclusively mammals, and especially humans. This is intended
to give readers a flavour of the ideas and experiments that are currently
being undertaken, but also to give a historical framework onto which today’s
experiments may be hung. We ignore the past at our peril. This approach
has, however, led to the exclusion of some other systems, e.g. Drosophila
and prokaryotes other than E. coli, but is by no means meant as a slight to
these neglected fields. Rather than either covering all fields in scant detail or
explaining the intricate details and nuances of only a few, I have attempted to
provide a broad overview that is punctuated with specific examples. Whether I
have succeeded in getting the balance right I will leave to individual readers. I

can say for certain, however, that there has never been a more exciting time to
study biology, and I hope that this is reflected in this text.
Richard J. Reece
The University of Manchester
October 2003
Acknowledgements
I have had a great deal of help in writing this book. Of course, omissions
and inaccuracies are entirely my responsibility, but I thank those who have
(hopefully) kept these to a minimum – David Timson, Noel Curtis, Cristina
Merlotti, Chris Sellick, Carolyn Byrne, Ray Boot-Handford and Ged Brady.
I am also very grateful to Robert Slater (University of Hertfordshire) and to
Mick Tuite (University of Kent) for their immensely helpful comments and
suggestions. I thank the many friends and colleagues, mentioned in the text,
who have so generously provided both figures for the book and for permission
to cite their work. I am also deeply indented to Jordi Bella for showing me
that molecular graphics programmes are usable by idiots. Nicky McGirr at
John Wiley persuaded me that this project was a good idea. Her boundless
enthusiasm and encouragement saw me through the times when I was not so
sure and, of course, she was right. The ‘guinea pigs’ for many of the ideas
presented here have been successive years of Genetic Engineering students at
The University of Manchester. I thank the many of them who read parts of the
manuscript, and all of them for challenging me, and many of my preconceived
ideas. Judith, Daniel and Kathryn have been incredibly patient throughout
the inception and writing of this book. Readers who find it useful should be
thanking them, not me. Finally, I want to thank my teachers – Tony Maxwell
and Mark Ptashne – who, each in his own way, have true passion for science
and an insistence that the right experiments are done.

Abbreviations and acronyms
AAT α

1
-antitrypsin
AAV adeno-associated virus
AD activation domain
BAC bacterial artificial chromosome
CaMV cauliflower mosaic virus
CAP catabolite activator protein
CBD chitin binding domain
CDK cyclin-dependent kinase
cDNA complementary DNA
CFI cleavage factor I
CFII cleavage factor II
CHEF contour-clamped homogeneous electric field
ChIP chromatin immunoprecipitation
CMV cytomegalovirus
CPSF cleavage and polyadenylation specificity factor
CStF cleavage stimulation factor
CTD carboxy-terminal repeat domain
DBD DNA binding domain
DEAE diethylaminoethanol
DHFR dihydrofolate reductase
DNA deoxyribonucleic acid
DTT dithiothreitol
ECM extra-cellular matrix
EMS ethyl methane sulphonate
ER endoplasmic reticulum
ES embryonic stem
EST expressed sequence tag
FIGE field inversion gel electrophoresis
FISH fluorescent in situ hybridization

FRET fluorescence resonance energy transfer
GST glutathione S-transferase
HAC human artificial chromosome
HAT histone acetyltransferase
H-DAC histone deacetylase
xviii ABBREVIATIONS AND ACRONYMS
HSV herpes simplex virus
IMAC immobilized metal ion affinity chromatography
IMPACT intein mediated purification with an affinity chitin binding tag
ITR inverted terminal repeat
LTR long terminal repeat
MBP maltose binding protein
mRNA messenger RNA
MCS multiple cloning site
MLP major late promoter
MSV maize streak virus
NLS nuclear localization signal
OD optical density
ORF open reading frame
PABII polyA binding protein II
PAC P1 artificial chromosome
PAP polyA polymerase
PCR polymerase chain reaction
PFGE pulsed-field gel electrophoresis
RdRp RNA-dependent RNA polymerase
RF release factor
replicative form
RFLP restriction fragment length polymorphism
RIP ribosome inactivating protein
RISC RNA induced silencing complex

RNAi RNA interference
rRNA ribosomal RNA
RT reverse transcription
reverse transcriptase
RT-PCR reverse transcription-polymerase chain reaction
SAM S-adenosylmethionine
SDS sodium dodecyl sulphate
siRNAs small inhibiting RNAs
SNP single-nucleotide polymorphism
snRNP small nuclear ribonucleoprotein
SRB suppressor of RNA polymerase B
STS sequence tagged site
SV40 simian virus 40
TAF TATA-box binding associated factor
ABBREVIATIONS AND ACRONYMS xix
TBP TATA-box binding protein
TdT terminal deoxynucleotidal transferase
TGMV tomato golden mosaic virus
TK thymidine kinase
tRNA transfer RNA
VA RNAs viral associated RNAs
VNTR variable number tandem repeat
YAC yeast artificial chromosome

1
DNA: Structure
and function
Key concepts
 The genetic information is contained within nucleic acids
 DNA is a double-stranded antiparallel helix

 Base pairing (A to T and G to C) holds the two strands of the
helix together
 DNA replication occurs through the unwinding of the DNA strands
and copying each strand
 The central dogma of molecular biology:

DNA makes RNA makes protein
 Transcription is the production of an RNA copy of one of the
DNA strands
 Translation is decoding of an RNA molecule to produce protein
Every organism possesses the information required to construct and maintain
a living copy of itself. The basic concepts of heredity and, as a consequence,
genes can be traced back to 1865 and the studies of Gregor Mendel – discussed
by Orel (1995). From the results of his breeding experiments with peas, Mendel
concluded that each pea plant possessed two alleles for each gene, but only
displayed a single phenotype. Perhaps the most remarkable achievement of
Mendel was his ability to correctly identify a complex phenomenon with
no knowledge of the molecular processes involved in the formation of that
phenomenon. Hereditary transmission through sperm and egg became known
about the same time and Ernst Haeckel, noting that sperm consists largely of
nuclear material, postulated that the nucleus was responsible for heredity.
Analysis of Genes and Genomes Richard J. Reece
 2004 John Wiley & Sons, Ltd ISBNs: 0-470-84379-9 (HB); 0-470-84380-2 (PB)
2 DNA: STRUCTURE AND FUNCTION 1
1.1 Nucleic Acid is the Material of Heredity
The idea that genetic material is physically transmitted from parent to offspring
has been accepted for as long as the concept of inheritance has existed. Both
proteins and nucleic acid were considered as likely candidates for the role of the
genetic material. Until the 1940s, however, many scientists favoured proteins.
There were two main reasons for this. Firstly, proteins are abundant in cells;

although the amount of an individual protein varies considerably from one cell
type to another, the overall protein content of most cells accounts for over 50%
of the dry weight. Secondly, nucleic acids appeared to be too simple to convey
the complex information presumed to be required to convey the characteristics
of heredity. DNA (deoxyribonucleic acid) was first isolated in 1869 by the Swiss
chemist Johann Frederick Miescher. He separated nuclei from the cytoplasm
of cells, and then isolated an acidic substance from these nuclei that he called
nuclein. Miescher showed that nuclein contained large amounts of phosphorus
and no sulphur, characteristics that differentiated it from proteins. In what
proved to be a remarkable insight, he suggested that ‘if one wants to assume
that a single substance is the specific cause of fertilization then one should
undoubtedly first of all think of nuclein’.
In 1926, based on the idea that DNA contained approximately equal amounts
of four different groups, called nucleotides, and by determining the type of
linkage that joined the nucleotides together, Levene and Simms proposed a
tetranucleotide structure (Figure 1.1) to explain the chemical arrangement of
nucleotides within nucleic acids (Levene and Simms, 1926). They proposed
a very simple four-nucleotide unit that was repeated many times to form
long nucleic acid molecules. Because the tetranucleotide structure was rel-
atively simple, it was widely believed that nucleic acids could not provide
the chemical variation expected of the genetic material. Proteins, on the
OH
PO Sugar Adenine
HO
PO Sugar UracilHO
PO Sugar GuanineHO
PO Sugar CytidineHO
Figure 1.1. The tetranucleotide model for nucleic acid structure proposed by Levene
and Simms in 1926. At the time that this model was proposed, it was thought that plant
and animal nucleic acid might be different, and the differences between DNA and RNA

were not fully understood

×