Tải bản đầy đủ (.pdf) (634 trang)

sensen - handbook of genome research

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.29 MB, 634 trang )

Handbook of Genome Research. Genomics, Proteomics, Metabolomics, Bioinformatics, Ethical and Legal Issues.
Edited by Christoph W. Sensen
Copyright © 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 3-527-31348-6
Handbook of Genome Research
Edited by
Christoph W. Sensen
T. Lengauer, R. Mannhold, H. Kubinyi,
H. Timmermann (Eds.)
Bioinformatics
From Genomes to Drugs
2 Volumes
2001, ISBN 3-527-29988-2
A.D. Baxevanis, B.F.F. Ouellette (Eds.)
Bioinformatics
A Practical Guide to the Analysis of Genes
and Proteins
Third Edition
2005, ISBN 0-471-47878-4
C.W. Sensen (Ed.)
Essentials of Genomics
and Bioinformatics
2002, ISBN 3-527-30541-6
G. Kahl
The Dictionary of Gene Technology
Genomics, Transcriptomics, Proteomics
Third edition
2004, ISBN 3-527-30765-6
R.D. Schmid, R. Hammelehle
Pocket Guide to Biotechnology


and Genetic Engineering
2003, ISBN 3-527-30895-4
M.J. Dunn, L.B. Jorde, P.F.R. Little,
S. Subramaniam (Eds.)
Encyclopedia of Genetics,
Genomics, Proteomics
and Bioinformatics
8 Volume Set
2005, ISBN 0-470-84974-6
H J. Rehm, G. Reed, A. Pühler, P. Stadler,
C.W. Sensen (Eds.)
Biotechnology
Vol. 5b Genomics and Bioinformatics
2001, ISBN 0-527-28328-5
C. Saccone, G. Pesole
Handbook of Comparative
Genomics
Principles and Methodology
2003, ISBN 0-471-39128-X
J.W. Dale, M. von Schantz
From Genes to Genomes
Concepts and Applications
of DNA Technology
2002, ISBN 0-471-49783-5
J. Licinio, M L. Wong (Eds.)
Pharmacogenomics
The Search for Individualized Therapies
2002, ISBN 3-527-30380-4
Further Titles of Interest
Handbook of Genome Research

Edited by
Christoph W. Sensen
Genomics, Proteomics, Metabolomics, Bioinformatics,
Ethical and Legal Issues
Edited by
Prof. Dr. Christoph W. Sensen
University of Calgary
Faculty of Medicine,
Biochemistry & Molecular Biology
3330 Hospital Drive N.W.
Calgary, Alberta T2N 4NI
Canada
Cover illustration:
Margot van Lindenberg: “Obsessed”, Fabric, 2002
Fascination with the immense human diversity and
immersion in four distinctly different cultures
inspired artist Margot van Lindenberg to explore
identity embedded in the human genome. In her art
she makes reference to various aspects of genetics
from microscopic images to ethical issues of bio-
engineering. She develops these ideas through
thread and cloth constructions, shadow projections
and performance work. Margot, who currently lives
in Calgary, Alberta, Canada, holds a BFA from the
Alberta College of Art & Design in Calgary.
Artist Statement
Obsessed
is an image of the DNA molecule, with
strips of colours representing genes. The work refers
to the experience of finding particular genes and the

obsession that occupies those involved. It can be
read either positive or negative, used to establish
identity or refer to the insertion of foreign genes as
in bio-engineering. The text speaks of a message, a
code: a hidden knowledge as it is intentionally illeg-
ible. One can become obsessed with attempts to
decipher this information.
The process of construction is part of the conceptual
development of the work. Dyed and found cotton
and silk were given texts, then stitched underneath
ramie, which was cut away to reveal the underlying
coding. The threadwork refers to the delicate struc-
ture of DNA and the raw stages of research and dis-
covery in the field of molecular genetics.
All books published by Wiley-VCH are carefully pro-
duced. Nevertheless, authors, editors and publisher
do not warrant the information contained in these
books, including this book, to be free of errors.
Readers are advised to keep in mind that statements,
data, illustrations, procedural details or other items
may inadvertently be inaccurate.
Library of Congress Card No. applied for
British Library Cataloguing-in-Publication Data:
A catalogue record for this book is available from the
British Library.
Bibliographic information published by Die Deutsche
Bibliothek
Die Deutsche Bibliothek lists this publication in
the Deutsche Nationalbibliografie; detailed
bibliographic data is available in the Internet at

<>.
© 2005 WILEY-VCH Verlag GmbH & Co. KGaA,
Weinheim
All rights reserved (including those of translation
into other languages). No part of this book may be
reproduced in any form – by photoprinting, micro-
film, or any other means – nor transmitted or trans-
lated into a machine language without written per-
mission from the publishers. Registered names,
trademarks, etc. used in this book, even when not
specifically marked as such, are not to be considered
unprotected by law.
Printed in the Federal Republic of Germany
Printed on acid-free paper
Typesetting Detzner Fotosatz, Speyer
Printing betz-druck GmbH, Darmstadt
Binding Litges & Dopf Buchbinderei GmbH,
Heppenheim
ISBN-13: 978-3-527-31348-8
ISBN-10: 3-527-31348-6
V
Life-sciences research, especially in biology and medicine, has undergone dramatic
changes in the last fifteen years. Completion of the sequencing of the first microbe ge-
nome in 1995 was followed by a flurry of activity. Today we have several hundred com-
plete genomes to hand, including that of humans, and many more to follow. Although
genome sequencing has become almost a commodity, the very optimistic initial expecta-
tions of this work, including the belief that much could be learned simply by looking at
the “blueprint” of life, have largely faded into the background.
It has become evident that knowledge about the genomic organization of life forms
must be complemented by understanding of gene-expression patterns and very detailed

information about the protein complement of the organisms, and that it will take many
years before major inroads can be made into a complete understanding of life. This has
led to the development of a variety of “omics” efforts, including genomics, proteomics,
metabolomics, and metabonomics. It is a typical sign of the times that about four years
ago even a journal called “Omics” emerged.
An introduction to the ever-expanding technology of the subject is a major part of this
book, which includes detailed description of the technology used to characterize genomic
organization, gene expression patterns, protein complements, and the post-translational
modification of proteins. The major model organisms and the work done to gain new in-
sights into their biology are another central focus of the book. Several chapters are also
devoted to introducing the bioinformatics tools and analytical strategies which are an in-
tegral part of any large-scale experiment.
As public awareness of relatively recent advances in life-science research increases, in-
tense discussion has arisen on how to deal with this new research field. This discussion,
which involves many groups in society, is also reflected in this book, with several chap-
ters dedicated to the social consequences of research and development which utilizes the
new approaches or the data derived from large-scale experiments. It should be clear that
nobody can just ignore this topic, because it has already had direct and indirect effects on
everyone’s day-to-day life.
The new wave of large-scale research might be of huge benefit to humanity in the fu-
ture, although in most cases we are still years away from this becoming reality. The pro-
mises and dangers of this field must be carefully weighed at each step, and this book tries
to make a contribution by introducing the relevant topics that are being discussed not on-
ly by scientific experts but by Society’s leaders also.
Handbook of Genome Research. Genomics, Proteomics, Metabolomics, Bioinformatics, Ethical and Legal Issues.
Edited by Christoph W. Sensen
Copyright © 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 3-527-31348-6
Preface
VI

Preface
We would like to thank Dr Andrea Pillmann and the staff of Wiley–VCH in Weinheim,
Germany, for the patience they have shown during the preparation of this book. Without
their many helpful suggestions it would have been impossible to publish this book.
Christoph W. Sensen
Calgary, May 2005
VII
Volume 1
Part I Key Organisms 1
1 Genome Projects on Model Organisms 3
Alfred Pühler, Doris Jording, Jörn Kalinowski, Detlev Buttgereit,
Renate Renkawitz-Pohl, Lothar Altschmied, Antoin Danchin, Agnieszka Sekowska,
Horst Feldmann, Hans-Peter Klenk, and Manfred Kröger
1.1 Introduction 3
1.2 Genome Projects of Selected Prokaryotic Model Organisms 4
1.2.1 The Gram
_
Enterobacterium Escherichia coli 4
1.2.1.1 The Organism 4
1.2.1.2 Characterization of the Genome and Early Sequencing Efforts 7
1.2.1.3 Structure of the Genome Project 7
1.2.1.4 Results from the Genome Project 8
1.2.1.5 Follow-up Research in the Postgenomic Era 9
1.2.2 The Gram
+
Spore-forming Bacillus subtilis 10
1.2.2.1 The Organism 10
1.2.2.2 A Lesson from Genome Analysis: The Bacillus subtilis Biotope 11
1.2.2.3 To Lead or to Lag: First Laws of Genomics 12
1.2.2.4 Translation: Codon Usage and the Organization of the Cell’s Cytoplasm 13

1.2.2.5 Post-sequencing Functional Genomics: Essential Genes
and Expression-profiling Studies 13
1.2.2.6 Industrial Processes 15
1.2.2.7 Open Questions 15
1.2.3 The Archaeon Archaeoglobus fulgidus 16
1.2.3.1 The Organism 16
1.2.3.2 Structure of the Genome Project 17
1.2.3.3 Results from the Genome Project 18
1.2.3.4 Follow-up Research 20
1.3 Genome Projects of Selected Eukaryotic Model Organisms 20
1.3.1 The Budding Yeast Saccharomyces cerevisiae 20
1.3.1.1 Yeast as a Model Organism 20
Contents
Handbook of Genome Research. Genomics, Proteomics, Metabolomics, Bioinformatics, Ethical and Legal Issues.
Edited by Christoph W. Sensen
Copyright © 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 3-527-31348-6
VIII
Contents
1.3.1.2 The Yeast Genome Sequencing Project 21
1.3.1.3 Life with Some 6000 Genes 23
1.3.1.4 The Yeast Postgenome Era 25
1.3.2 The Plant Arabidopsis thaliana 25
1.3.2.1 The Organism 25
1.3.2.2 Structure of the Genome Project 27
1.3.2.3 Results from the Genome Project 28
1.3.2.4 Follow-up Research in the Postgenome Era 29
1.3.3 The Roundworm Caenorhabditis elegans 30
1.3.3.1 The Organism 30
1.3.3.2 The Structure of the Genome Project 31

1.3.3.3 Results from the Genome Project 32
1.3.3.4 Follow-up Research in the Postgenome Era 33
1.3.4 The Fruitfly Drosophila melanogaster 34
1.3.4.1 The Organism 34
1.3.4.2 Structure of the Genome Project 35
1.3.4.3 Results of the Genome Project 36
1.3.4.4 Follow-up Research in the Postgenome Era 37
1.4 Conclusions 37
References 39
2 Environmental Genomics: A Novel Tool for Study of Uncultivated
Microorganisms 45
Alexander H. Treusch and Christa Schleper
2.1 Introduction: Why Novel Approaches to Study Microbial Genomes? 45
2.2 Environmental Genomics: The Methodology 46
2.3 Where it First Started: Marine Environmental Genomics 48
2.4 Environmental Genomics of Defined Communities: Biofilms and Microbial
Mats 50
2.5 Environmental Genomics for Studies of Soil Microorganisms 50
2.6 Biotechnological Aspects 53
2.7 Conclusions and Perspectives 54
References 55
3 Applications of Genomics in Plant Biology 59
Richard Bourgault, Katherine G. Zulak, and Peter J. Facchini
3.1 Introduction 59
3.2 Plant Genomes 60
3.2.1 Structure, Size, and Diversity 60
3.2.2 Chromosome Mapping: Genetic and Physical 61
3.2.3 Large-scale Sequencing Projects 62
3.3 Expressed Sequence Tags 64
3.4 Gene Expression Profiling Using DNA Microarrays 66

3.5 Proteomics 68
3.6 Metabolomics 70
IX
Contents
3.7 Functional Genomics 72
3.7.1 Forward Genetics 72
3.7.2 Reverse Genetics 73
3.8 Concluding Remarks 76
References 77
4 Human Genetic Diseases 81
Roger C. Green
4.1 Introduction 81
4.1.1 The Human Genome Project: Where Are We Now
and Where Are We Going? 81
4.1.1.1 What Have We Learned? 81
4.2 Genetic Influences on Human Health 83
4.3 Genomics and Single-gene Defects 84
4.3.1 The Availability of the Genome Sequence Has Changed the Way in which
Disease Genes Are Identified 84
4.3.1.1 Positional Candidate Gene Approach 85
4.3.1.2 Direct Analysis of Candidate Genes 85
4.3.2 Applications in Human Health 86
4.3.2.1 Genetic Testing 86
4.3.3 Gene Therapy 87
4.4 Genomics and Polygenic Diseases 87
4.4.1 Candidate Genes and their Variants 88
4.4.2 Linkage Disequilibrium Mapping 89
4.4.2.1 The Hapmap Project 89
4.4.3 Whole-genome Resequencing 90
4.5 The Genetic Basis of Cancer 90

4.5.1 Breast Cancer 91
4.5.1.1 Cancer Risk in Carriers of BRCA Mutations 92
4.5.2 Colon Cancer 93
4.5.2.1 Familial Adenomatous Polyposis 93
4.5.2.2 Hereditary Non-polyposis Colon Cancer 93
4.5.2.3 Modifier Genes in Colorectal Cancer 94
4.6 Genetics of Cardiovascular Disease 94
4.6.1 Monogenic Disorders 95
4.6.1.1 Hypercholesterolemia 95
4.6.1.2 Hypertension 95
4.6.1.3 Clotting Factors 95
4.6.1.4 Hypertrophic Cardiomyopathy 95
4.6.1.5 Familial Dilated Cardiomyopathy 96
4.6.1.6 Familial Arrhythmias 96
4.6.2 Multifactorial Cardiovascular Disease 96
4.7 Conclusions 97
References 98
X
Contents
Part II Genomic and Proteomic Technologies 103
5 Genomic Mapping and Positional Cloning, with Emphasis on Plant Science 105
Apichart Vanavichit, Somvong Tragoonrung, and Theerayut Toojinda
5.1 Introduction 105
5.2 Genome Mapping 105
5.2.1 Mapping Populations 105
5.2.2 Molecular Markers: The Key Mapping Reagents 106
5.2.2.1 RFLP 107
5.2.2.2 RAPD 107
5.2.2.3 AFLP 107
5.2.2.4 SSR 108

5.2.2.5 SSCP 108
5.2.3 Construction of a Linkage Map 108
5.3 Positional Cloning 110
5.3.1 Successful Positional Cloning 110
5.3.2 Defining the Critical Region 111
5.3.3 Refining the Critical Region: Genetic Approaches 112
5.3.4 Refining the Critical Region: Physical Approaches 113
5.3.5 Cloning Large Genomic Inserts 114
5.3.6 Radiation Hybrid Map 114
5.3.7 Identification of Genes Within the Refined Critical Region 115
5.3.7.1 Gene Detection by CpG Island 115
5.3.7.2 Exon Trapping 115
5.3.7.3 Direct cDNA Selection 115
5.4 Comparative Mapping and Positional Cloning 115
5.4.1 Synteny, Colinearity, and Positional Cloning 116
5.4.2 Bridging Model Organisms 117
5.4.3 Predicting Candidate Genes in the Critical Region 118
5.4.4 EST: Key to Gene Identification in the Critical Region 118
5.4.5 Linkage Disequilibrium Mapping 120
5.5 Genetic Mapping in the Post-genomics Era 120
5.5.1 eQTL 121
References 123
6 DNA Sequencing Technology 129
Lyle R. Middendorf, Patrick G. Humphrey, Narasimhachari Narayanan,
and Stephen C. Roemer
6.1 Introduction 129
6.2 Overview of Sanger Dideoxy Sequencing 130
6.3 Fluorescence Dye Chemistry 131
6.3.1 Fluorophore Characteristics 132
6.3.2 Commercial Dye Fluorophores 132

6.3.3 Energy Transfer 136
6.3.4 Fluorescence Lifetime 137
XI
Contents
6.4 Biochemistry of DNA Sequencing 138
6.4.1 Sequencing Applications and Strategies 138
6.4.1.1 New Sequence Determination 139
6.4.1.2 Confirmatory Sequencing 140
6.4.2 DNA Template Preparation 140
6.4.2.1 Single-stranded DNA Template 140
6.4.2.2 Double-stranded DNA Template 140
6.4.2.3 Vectors for Large-insert DNA 141
6.4.2.4 PCR Products 141
6.4.3 Enzymatic Reactions 141
6.4.3.1 DNA Polymerases 141
6.4.3.2 Labeling Strategy 142
6.4.3.3 The Template–Primer–Polymerase Complex 143
6.4.3.4 Simultaneous Bi-directional Sequencing 144
6.5 Fluorescence DNA Sequencing Instrumentation 144
6.5.1 Introduction 144
6.5.1.1 Excitation Energy Sources 144
6.5.1.2 Fluorescence Samples 145
6.5.1.3 Fluorescence Detection 145
6.5.1.4 Overview of Fluorescence Instrumentation Related to DNA Sequencing 145
6.5.2 Information Throughput 147
6.5.2.1 Sample Channels (n) 147
6.5.2.2 Information per Channel (d) 147
6.5.2.3 Information Independence (I) 148
6.5.2.4 Time per Sample (t) 148
6.5.3 Instrument Design Issues 148

6.5.4 Forms of Commercial Electrophoresis used for Fluorescence
DNA Sequencing 149
6.5.4.1 Slab Gels 149
6.5.4.2 Capillary Gels 151
6.5.4.3 Micro-Grooved Channel Gel Electrophoresis 151
6.5.5 Non-electrophoresis Methods for Fluorescence DNA Sequencing 152
6.5.6 Non-fluorescence Methods for DNA Sequencing 152
6.6 DNA Sequence Analysis 153
6.6.1 Introduction 153
6.6.2 Lane Detection and Tracking 153
6.6.3 Trace Generation and Base Calling 155
6.6.4 Quality/Confidence Values 157
6.7 DNA Sequencing Approaches to Achieving the $1000 Genome 159
6.7.1 Introduction 159
6.7.2 DNA Degradation Strategy 161
6.7.3 DNA Synthesis Strategy 162
6.7.4 DNA Hybridization Strategy 163
6.7.5 Nanopore Filtering Strategy 164
References 165
XII
Contents
7 Proteomics and Mass Spectrometry for the Biological Researcher 181
Sheena Lambert and David C. Schriemer
7.1 Introduction 181
7.2 Defining the Sample for Proteomics 184
7.2.1 Minimize Cellular Heterogeneity, Avoid Mixed Cell Populations 184
7.2.2 Use Isolated Cell Types and/or Cell Cultures 185
7.2.3 Minimize Intracellular Heterogeneity 186
7.2.4 Minimize Dynamic Range 186
7.2.5 Maximize Concentration/Minimize Handling 187

7.3 New Developments – Clinical Proteomics 187
7.4 Mass Spectrometry – The Essential Proteomic Technology 188
7.4.1 Sample Processing 190
7.4.2 Instrumentation 191
7.4.3 MS Bioinformatics/Sequence Databases 193
7.5 Sample-driven Proteomics Processes 195
7.5.1 Direct MS Analysis of a Protein Digest 196
7.5.2 Direct MS–MS Analysis of a Digest 198
7.5.3 LC–MS–MS of a Protein Digest 199
7.5.4 Multidimensional LC–MS–MS of a Digest (Top-down vs. Bottom-up
Proteomics) 201
7.6 Conclusions 204
References 205
8 Proteome Analysis by Capillary Electrophoresis 211
Md Abul Fazal, David Michels, James Kraly, and Norman J. Dovichi
8.1 Introduction 211
8.2 Capillary Electrophoresis 212
8.2.1 Instrumentation 212
8.2.2 Injection 212
8.2.3 Electroosmosis 212
8.2.4 Separation 213
8.2.5 Detection 214
8.3 Capillary Electrophoresis for Protein Analysis 215
8.3.1 Capillary Isoelectric Focusing 215
8.3.2 SDS/Capillary Sieving Electrophoresis 215
8.3.3 Free Solution Electrophoresis 217
8.4 Single-cell Analysis 218
8.5 Two-dimensional Separations 219
8.6 Conclusions 221
References 222

9A DNA Microarray Fabrication Strategy for Research Laboratories 223
Daniel C. Tessier, Mélanie Arbour, François Benoit, Hervé Hogues,
and Tracey Rigby
9.1 Introduction 223
XIII
Contents
9.2 The Database 228
9.3 High-throughput DNA Synthesis 230
9.3.1 Scale and Cost of Synthesis 230
9.3.2 Operational Constraints 231
9.3.3 Quality-control Issues 232
9.4 Amplicon Generation 232
9.5 Microarraying 234
9.6 Probing and Scanning Microarrays 234
9.7 Conclusion 235
References 237
10 Principles of Application of DNA Microarrays 239
Mayi Arcellana-Panlilio
10.1 Introduction 239
10.2 Definitions 240
10.3 Types of Array 240
10.4 Production of Arrays 241
10.4.1 Sources of Arrays 241
10.4.2 Array Content 242
10.4.3 Slide Substrates 242
10.4.4 Arrayers and Spotting Pins 243
10.5 Interrogation of Arrays 243
10.5.1 Experimental Design 244
10.5.2 Sample Preparation 246
10.5.3 Labeling 247

10.5.4 Hybridization and Post-hybridization Washes 249
10.5.5 Data Acquisition and Quantification 250
10.6 Data Analysis 251
10.7 Documentation of Microarrays 254
10.8 Applications of Microarrays in Cancer Research 255
10.9 Conclusion 256
References 257
11 Yeast Two-hybrid Technologies 261
Gregor Jansen, David Y. Thomas, and Stephanie Pollock
11.1 Introduction 261
11.2 The Classical Yeast Two-hybrid System 262
11.3 Variations of the Two-hybrid System 263
11.3.1 The Reverse Two-hybrid System 263
11.3.2 The One-hybrid System 264
11.3.3 The Repressed Transactivator System 264
11.3.4 Three-hybrid Systems 264
11.4 Membrane Yeast Two-hybrid Systems 265
11.4.1 SOS Recruitment System 266
11.4.2 Split-ubiquitin System 266
XIV
Contents
11.4.3 G-Protein Fusion System 266
11.4.4 The Ire1 Signaling System 268
11.4.5 Non-yeast Hybrid Systems 269
11.5 Interpretation of Two-hybrid Results 269
11.6 Conclusion 270
References 271
12 Structural Genomics 273
Aalim M. Weljie, Hans J. Vogel, and Ernst M. Bergmann
12.1 Introduction 273

12.2 Protein Crystallography and Structural Genomics 274
12.2.1 High-throughput Protein Crystallography 274
12.2.2 Protein Production 276
12.2.3 Protein Crystallization 278
12.2.4 Data Collection 279
12.2.5 Structure Solution and Refinement 281
12.2.6 Analysis 282
12.3 NMR and Structural Genomics 282
12.3.1 High-throughput Structure Determination by NMR 282
12.3.1.1 Target Selection 282
12.3.1.2 High-throughput Data Acquisition 284
12.3.1.3 High-throughput Data Analysis 286
12.3.2 Other Non-structural Applications of NMR 287
12.3.2.1 Suitability Screening for Structure Determination 288
12.3.2.2 Determination of Protein Fold 289
12.3.2.3 Rational Drug Target Discovery and Functional Genomics 290
12.4 Epilogue 290
References 292
Volume 2
Part III Bioinformatics 297
13 Bioinformatics Tools for DNA Technology 299
Peter Rice
13.1 Introduction 299
13.2 Alignment Methods 299
13.2.1 Pairwise Alignment 300
13.2.2 Local Alignment 302
13.2.3 Variations on Pairwise Alignment 303
13.2.4 Beyond Simple Alignment 304
13.2.5 Other Alignment Methods 305
13.3 Sequence Comparison Methods 305

13.3.1 Multiple Pairwise Comparisons 307
XV
Contents
13.4 Consensus Methods 309
13.5 Simple Sequence Masking 309
13.6 Unusual Sequence Composition 309
13.7 Repeat Identification 310
13.8 Detection of Patterns in Sequences 311
13.8.1 Physical Characteristics 312
13.8.2 Detecting CpG Islands 313
13.8.3 Known Sequence Patterns 314
13.8.4 Data Mining with Sequence Patterns 315
13.9 Restriction Sites and Promoter Consensus Sequences 315
13.9.1 Restriction Mapping 315
13.9.2 Codon Usage Analysis 315
13.9.3 Plotting Open Reading Frames 317
13.9.4 Codon Preference Statistics 318
13.9.5 Reading Frame Statistics 320
13.10 The Future for EMBOSS 321
References 322
14 Software Tools for Proteomics Technologies 323
David S. Wishart
14.1 Introduction 323
14.2 Protein Identification 324
14.2.1 Protein Identification from 2D Gels 324
14.2.2 Protein Identification from Mass Spectrometry 328
14.2.3 Protein Identification from Sequence Data 332
14.3 Protein Property Prediction 334
14.3.1 Predicting Bulk Properties (pI, UV absorptivity, MW) 334
14.3.2 Predicting Active Sites and Protein Functions 334

14.3.3 Predicting Modification Sites 338
14.3.4 Finding Protein Interaction Partners and Pathways 338
14.3.5 Predicting Sub-cellular Location or Localization 339
14.3.6 Predicting Stability, Globularity, and Shape 340
14.3.7 Predicting Protein Domains 341
14.3.8 Predicting Secondary Structure 342
14.3.9 Predicting 3D Folds (Threading) 343
14.3.10 Comprehensive Commercial Packages 344
References 347
15 Applied Bioinformatics for Drug Discovery and Development 353
Jian Chen, ShuJian Wu, and Daniel B. Davison
15.1 Introduction 353
15.2 Databases 353
15.2.1 Sequence Databases 354
15.2.1.1 Genomic Sequence Databases 354
15.2.1.2 EST Sequence Databases 355
XVI
Contents
15.2.1.3 Sequence Variations and Polymorphism Databases 356
15.2.2 Expression Databases 357
15.2.2.1 Microarray and Gene Chip 357
15.2.2.2 Others (SAGE, Differential Display) 358
15.2.2.3 Quantitative PCR 358
15.2.3 Pathway Databases 358
15.2.4 Cheminformatics 359
15.2.5 Metabonomics and Proteomics 360
15.2.6 Database Integration and Systems Biology 360
15.3 Bioinformatics in Drug-target Discovery 362
15.3.1 Target-class Approach to Drug-target Discovery 362
15.3.2 Disease-oriented Target Identification 364

15.3.3 Genetic Screening and Comparative Genomics in Model Organisms for Target
Discovery 365
15.4 Support of Compound Screening and Toxicogenomics 366
15.4.1 Improving Compound Selectivity 367
15.4.1.1 Phylogeny Analysis 367
15.4.1.2 Tissue Expression and Biological Function Implication 368
15.4.2 Prediction of Compound Toxicity 369
15.4.2.1 Toxicogenomics and Toxicity Signature 369
15.4.2.2 Long QT Syndrome Assessment 370
15.4.2.3 Drug Metabolism and Transport 371
15.5 Bioinformatics in Drug Development 372
15.5.1 Biomarker Discovery 372
15.5.2 Genetic Variation and Drug Efficacy 373
15.5.3 Genetic Variation and Clinical Adverse Reactions 374
15.5.4 Bioinformatics in Drug Life-cycle Management (Personalized Drug and Drug
Competitiveness) 376
15.6 Conclusions 376
References 377
16 Genome Data Representation Through Images:
The MAGPIE/Bluejay System 383
Andrei Turinsky, Paul M. K. Gordon, Emily Xu, Julie Stromer,
and Christoph W. Sensen
16.1 Introduction 383
16.2 The MAGPIE Graphical System 384
16.3 The Hierarchical MAGPIE Display System 386
16.4 Overview Images 387
16.4.1 Whole Project View 387
16.5 Coding Region Displays 391
16.5.1 Contiguous Sequence with ORF Evidence 391
16.5.2 Contiguous Sequence with Evidence 394

16.5.3 Expressed Sequence Tags 394
16.5.4 ORF Close-up 395
XVII
Contents
16.6 Coding Sequence Function Evidence 396
16.6.1 Analysis Tools Summary 396
16.6.2 Expanded Tool Summary 397
16.7 Secondary Genome Context Images 399
16.7.1 Base Composition 399
16.7.2 Sequence Repeats 400
16.7.3 Sequence Ambiguities 401
16.7.4 Sequence Strand Assembly Coverage 402
16.7.5 Restriction Enzyme Fragmentation 402
16.7.6 Agarose Gel Simulation 403
16.8 The Bluejay Data Visualization System 404
16.9 Bluejay Architecture 405
16.10 Bluejay Display and Data Exploration 407
16.10.1 The Main Bluejay Interface 407
16.10.2 Semantic Zoom and Levels of Details 408
16.10.3 Operations on the Sequence 408
16.10.4 Interaction with Individual Elements 410
16.10.5 Eukaryotic Genomes 411
16.11 Bluejay Usability Features 411
16.12 Conclusions and Open Issues 413
References 414
17 Bioinformatics Tools for Gene-expression Studies 415
Greg Finak, Michael Hallett, Morag Park, and François Pepin
17.1 Introduction 415
17.1.1 Microarray Technologies 416
17.1.1.1 cDNA Microarrays 416

17.1.1.2 Oligonucleotide Microarrays 417
17.1.2 Objectives and Experimental Design 417
17.2 Background Knowledge and Tools 419
17.2.1 Standards 419
17.2.2 Microarray Data Management Systems 420
17.2.3 Statistical and General Analysis Software 420
17.3 Preprocessing 421
17.3.1 Image, Spot, and Array Quality 421
17.3.2 Gene Level Summaries 422
17.3.3 Normalization 422
17.4 Class Comparison – Differential Expression 423
17.5 Class Prediction 425
17.6 Class Discovery 426
17.6.1 Clustering Algorithms 426
17.6.2 Validation of Clusters 427
17.7 Searching for Meaning 428
References 430
XVIII
Contents
18 Protein Interaction Databases 433
Gary D. Bader and Christopher W. V. Hogue
18.1 Introduction 433
18.2 Scientific Foundations of Biomolecular Interaction Information 434
18.3 The Graph Abstraction for Interaction Databases 434
18.4 Why Contemplate Integration of Interaction Data? 435
18.5 A Requirement for More Detailed Abstractions 435
18.6 An Interaction Database as a Framework for a Cellular CAD System 437
18.7 BIND – The Biomolecular Interaction Network Database 437
18.8 Other Molecular-interaction Databases 439
18.9 Database Standards 439

18.10 Answering Scientific Questions Using Interaction Databases 440
18.11 Examples of Interaction Databases 440
References 455
19 Bioinformatics Approaches for Metabolic Pathways 461
Ming Chen, Andreas Freier, and Ralf Hofestädt
19.1 Introduction 461
19.2 Formal Representation of Metabolic Pathways 463
19.3 Database Systems and Integration 463
19.3.1 Database Systems 463
19.3.2 Database Integration 465
19.3.3 Model-driven Reconstruction of Molecular Networks 466
19.3.3.1 Modeling Data Integration 467
19.3.3.2 Object-oriented Modeling 469
19.3.3.3 Systems Reconstruction 471
19.4 Different Models and Aspects 472
19.4.1 Petri Net Model 473
19.4.1.1 Basics 473
19.4.1.2 Hybrid Petri Nets 474
19.4.1.3 Applications 476
19.4.1.4 Petri Net Model Construction 478
19.5 Simulation Tools 479
19.5.1 Metabolic Data Integration 481
19.5.2 Metabolic Pathway Layout 481
19.5.3 Dynamics Representation 482
19.5.4 Hierarchical Concept 482
19.5.5 Prediction Capability 482
19.5.6 Parallel Treatment and Development 482
19.6 Examples and Discussion 483
References 487
20 Systems Biology 491

Nathan Goodman
20.1 Introduction 491
XIX
Contents
20.2 Data 492
20.2.1 Available Data Types 492
20.2.2 Data Quality and Data Fusion 493
20.3 Basic Concepts 494
20.3.1 Systems and Models 494
20.3.2 States 494
20.3.3 Informal and Formal Models 495
20.3.4 Modularity 495
20.4 Static Models 496
20.4.1 Graphs 496
20.4.2 Analysis of Static Models 498
20.5 Dynamic Models 499
20.5.1 Types of Model 499
20.5.2 Modeling Formalisms 500
20.6 Summary 500
20.7 Guide to the Literature 501
20.7.1 Highly Recommended Reviews 501
20.7.2 Recommended Detailed Reviews 502
20.7.3 Recommended High-level Reviews 502
References 504
Part IV Ethical, Legal and Social Issues 507
21 Ethical Aspects of Genome Research and Banking 509
Bartha Maria Knoppers and Clémentine Sallée
21.1 Introduction 509
21.2 Types of Genetic Research 509
21.3 Research Ethics 510

21.4 “Genethics” 513
21.5 DNA Banking 516
21.5.1 International 517
21.1.2 Regional 520
21.5.3 National 521
21.6 Ownership 526
21.7 Conclusion 530
References 532
22 Biobanks and the Challenges of Commercialization 537
Edna Einsiedel and Lorraine Sheremeta
22.1 Introduction 537
22.2 Background 538
22.3 Population Genetic Research and Public Opinion 540
22.4 The Commercialization of Biobank Resources 541
22.4.1 An Emerging Market for Biobank Resources 542
XX
Contents
22.4.2 Public Opinion and the Commercialization of Genetic Resources 543
22.5 Genetic Resources and Intellectual Property: What Benefits? For Whom? 544
22.5.1 Patents as The Common Currency of the Biotech Industry 544
22.5.2 The Debate over Genetic Patents 545
22.5.3 Myriad Genetics 546
22.5.4 Proposed Patent Reforms 547
22.5.5 Patenting and Public Opinion 548
22.6 Human Genetic Resources and Benefit-Sharing 549
22.7 Commercialization and Responsible Governance of Biobanks 551
22.7.1 The Public Interest and the Exploitation of Biobank Resources 552
22.7.2 The Role of the Public and Biobank Governance 553
22.8 Conclusion 554
References 555

23 The (Im)perfect Human – His Own Creator? Bioethics and Genetics at the
Beginning of Life 561
Gebhard Fürst
23.1 Life Sciences and the Untouchable Human Being 563
23.2 Consequences from the Untouchability of Humans and Human Dignity for
the Bioethical Discussion 564
23.3 Conclusion 567
References 570
Part V Outlook 571
24 The Future of Large-Scale Life Science Research 573
Christoph W. Sensen
24.1 Introduction 573
24.2 Evolution of the Hardware 574
24.2.1 DNA Sequencing as an Example 574
24.2.2 General Trends 574
24.2.3 Existing Hardware Will be Enhanced for more Throughput 575
24.2.4 The PC-style Computers that Run most Current Hardware will be Replaced
with Web-based Computing 575
24.2.5 Integration of Machinery will Become Tighter 576
24.2.6 More and more Biological and Medical Machinery will be “Genomized” 576
24.3 Genomic Data and Data Handling 577
24.4 Next-generation Genome Research Laboratories 579
24.4.1 The Toolset of the Future 579
24.4.2 Laboratory Organization 581
24.5 Genome Projects of the Future 582
24.6 Epilog 583
Subject Index 585
XXI
Lothar Altschmied
Institute of Plant Genetics and Crop Plant

Research (IPK)
Corrensstr. 3
06466 Gatersleben
Germany
Mélanie Arbour
MicroArray Laboratory
National Research Council of Canada
Biotechnology Research Institute
6100 Royalmount Avenue
Montreal
Quebec, H4P 2R2
Canada
Mayi Arcellana-Panlilio
Southern Alberta Microarray Facility
University of Calgary
HM 393b
3330 Hospital Drive, N.W.
Calgary
Alberta, T2N 4N1
Canada
Gary D. Bader
Computational Biology Center
Memorial Sloan-Kettering Cancer Center
Box 460
New York, 10021
USA
François Benoit
MicroArray Laboratory
National Research Council of Canada
Biotechnology Research Institute

6100 Royalmount Avenue
Montreal
Quebec, H4P 2R2
Canada
Ernst M. Bergmann
Alberta Synchrotron Institute
University of Alberta
Edmonton
Alberta, T6G 2E1
Canada
Richard Bourgault
Department of Biological Sciences
University of Calgary
2500 University Drive N.W.
Calgary
Alberta, T2N 1N4
Canada
Detlev Buttgereit
Fachbereich Biologie
Entwicklungsbiologie
Philipps-Universität Marburg
Karl-von-Frisch-Straße 8b
35043 Marburg
Germany
List of Contributors
Handbook of Genome Research. Genomics, Proteomics, Metabolomics, Bioinformatics, Ethical and Legal Issues.
Edited by Christoph W. Sensen
Copyright © 2005 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 3-527-31348-6
XXII

List of Contributors
Jian Chen
Bristol Myers Squibb Pharmaceutical
Research Institute
311 Pennington-Rocky Hill Road
Pennington
New Jersey, 08534
USA
Ming Chen
Department of Bioinformatics /
Medical Informatics
Faculty of Technology
University of Bielefeld
33501 Bielefeld
Germany
Antoine Danchin
Institut Pasteur
Unité de Génétique des Génomes
Bactériens
Département Structure et
Dynamique des Génomes
28 rue du Docteur Roux
75724 PARIS Cedex 15
France
Daniel B. Davison
Bristol Myers Squibb
Pharmaceutical Research Institute
311 Pennington-Rocky Hill Road
Pennington
New Jersey, 08534

USA
Norman J. Dovichi
Department of Chemistry
University of Washington
Seattle
Washington, 98195-1700
USA
Edna Einsiedel
University of Calgary
2500 University Drive N.W., SS318
Calgary
Alberta, T2N 1N4
Canada
Peter J. Facchini
Department of Biological Sciences
University of Calgary
2500 University Drive N.W.
Calgary
Alberta, T2N 1N4
Canada
Abul Fazal
Department of Chemistry
University of Washington
Seattle
Washington, 98195-1700
USA
Horst Feldmann
Adolf-Butenandt-Institut für
Physiologische Chemie der
Ludwig-Maximilians-Universität

Schillerstraße 44
80336 München
Germany
Greg Finak
Department of Biochemistry
McGill University
3775 University St
Montreal
Quebeck, H3A 2B4
Canada
Andreas Freier
Department of Bioinformatics /
Medical Informatics
Faculty of Technology
University of Bielefeld
33501 Bielefeld
Germany
XXIII
List of Contributors
His Excellency Dr. Gebhard Fürst
Bischof von Rottenburg-Stuttgart
Postfach 9
72101 Rottenburg a. N.
Germany
Paul Gordon
University of Calgary
Department of Biochemistry and
Molecular Biology
3330 Hospital Drive N.W.
Calgary

Alberta, T2N 4N1
Canada
Roger C. Green
Faculty of Medicine
Memorial University of Newfoundland
St. Johns
Newfoundland, A1B3Y1
Canada
Michael Hallett
Department of Biochemistry
3775 University St
McGill University
Montreal, H3A 2B4
Canada
Ralf Hofestädt
Department of Bioinformatics /
Medical Informatics
Faculty of Technology
University of Bielefeld
33501, Bielefeld
Germany
Christopher W.V. Hogue
Dept. Biochemistry
University of Toronto and the
Samuel Lunenfeld Research Institute
Mt. Sinai Hospital
600 University Avenue
Toronto, ON M5G 1X5
Canada
Hervé Hogues

MicroArray Laboratory
National Research Council of Canada
Biotechnology Research Institute
6100 Royalmount Avenue
Montreal
Quebec, H4P 2R2
Canada
Patrick G. Humphrey
LI-COR Inc.
4308 Progressive Ave.
P.O. Box 4000
Lincoln
Nebraska, 68504
USA
Gregor Jansen
Department of Biochemistry
McGill University
3655 Promenade Sir William Osler
Montreal
Quebec, H3G 1Y6
Canada
Doris Jording
Fakulät für Biologie
Lehrstuhl für Genetik
Universität Bielefeld
33594 Bielefeld
Germany
Jörn Kalinowski
Fakulät für Biologie
Lehrstuhl für Genetik

Universität Bielefeld
33594 Bielefeld
Germany
Hans-Peter Klenk
e.gene Biotechnologie GmbH
Pöckinger Fußweg 7a
82340 Feldafing
Germany
XXIV
List of Contributors
Bartha Maria Knoppers
University of Montreal
3101, Chemin de la Tour
Montreal
Quebeck, H3C 3J7
Canada
James Kraly
Department of Chemistry
University of Washington
Seattle
Washington, 98195-1700
USA
Manfred Kröger
Institut für Mikro- und Molekularbiologie
Justus-Liebig-Universität
Heinrich-Buff-Ring 26-32
35392 Giessen
Germany
Sheena Lambert
Department of Biochemistry and

Molecular Biology
University of Calgary
3330 Hospital Drive N.W.
Calgary
Alberta, T2N 4N1
Canada
David Michels
Department of Chemistry
University of Washington
Seattle
Washington, 98195-1700
USA
Lyle R. Middendorf
LI-COR Inc.
4308 Progressive Ave.
P.O. Box 4000
Lincoln
Nebraska, 68504
USA
Narasimhachari Narayanan
VisEn Medical, Inc.
12B Cabot Road
Woburn
Massachusetts, 01801
USA
Morag Park
Department of Biochemistry
McGill University
3775 University St.
Montreal

Quebec, H3A 2B4
Canada
François Pepin
Department of Biochemistry
McGill University
3775 University St.
Montreal
Quebec, H3A 2B4
Canada
Stephanie Pollock
Department of Biochemistry
McGill University
3655 Promenade Sir William Osler
Montreal
Quebec, H3G 1Y6
Canada
Alfred Pühler
Fakulät für Biologie
Lehrstuhl für Genetik
Universität Bielefeld
33594 Bielefeld
Germany
Renate Renkawitz-Pohl
Fachbereich Biologie,
Entwicklungsbiologie
Philipps-Universität Marburg
Karl-von-Frisch-Straße 8b
35043 Marburg
Germany

×