Tải bản đầy đủ (.pdf) (309 trang)

grid database design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.47 MB, 309 trang )

AU2800_half titlepage 4/26/05 9:33 AM Page 1
Grid
Database
Design
AUERBACH PUBLICATIONS
www.auerbach-publications.com
To Order Call: 1-800-272-7737 • Fax: 1-800-374-3401
E-mail:
Agent-Based Manufacturing and Control
Systems: New Agile Manufacturing
Solutions for Achieving Peak Performance
Massimo Paolucci and Roberto Sacile
ISBN: 1574443364
Curing the Patch Management Headache
Felicia M. Nicastro
ISBN: 0849328543
Cyber Crime Investigator's Field Guide,
Second Edition
Bruce Middleton
ISBN: 0849327687
Disassembly Modeling for Assembly,
Maintenance, Reuse and Recycling
A. J. D. Lambert and Surendra M. Gupta
ISBN: 1574443348
The Ethical Hack: A Framework for
Business Value Penetration Testing
James S. Tiller
ISBN: 084931609X
Fundamentals of DSL Technology
Philip Golden, Herve Dedieu,


and Krista Jacobsen
ISBN: 0849319137
The HIPAA Program Reference Handbook
Ross Leo
ISBN: 0849322111
Implementing the IT Balanced Scorecard:
Aligning IT with Corporate Strategy
Jessica Keyes
ISBN: 0849326214
Information Security Fundamentals
Thomas R. Peltier, Justin Peltier,
and John A. Blackley
ISBN: 0849319579
Information Security Management
Handbook, Fifth Edition, Volume 2
Harold F. Tipton and Micki Krause
ISBN: 0849332109
Introduction to Management
of Reverse Logistics and Closed
Loop Supply Chain Processes
Donald F. Blumberg
ISBN: 1574443607
Maximizing ROI on Software Development
Vijay Sikka
ISBN: 0849323126
Mobile Computing Handbook
Imad Mahgoub and Mohammad Ilyas
ISBN: 0849319714
MPLS for Metropolitan
Area Networks

Nam-Kee Tan
ISBN: 084932212X
Multimedia Security Handbook
Borko Furht and Darko Kirovski
ISBN: 0849327733
Network Design: Management and
Technical Perspectives, Second Edition
Teresa C. Piliouras
ISBN: 0849316081
Network Security Technologies,
Second Edition
Kwok T. Fung
ISBN: 0849330270
Outsourcing Software Development
Offshore: Making It Work
Tandy Gold
ISBN: 0849319439
Quality Management Systems:
A Handbook for Product
Development Organizations
Vivek Nanda
ISBN: 1574443526
A Practical Guide to Security
Assessments
Sudhanshu Kairab
ISBN: 0849317061
The Real-Time Enterprise
Dimitris N. Chorafas
ISBN: 0849327776
Software Testing and Continuous

Quality Improvement,
Second Edition
William E. Lewis
ISBN: 0849325242
Supply Chain Architecture:
A Blueprint for Networking the Flow
of Material, Information, and Cash
William T. Walker
ISBN: 1574443577
The Windows Serial Port
Programming Handbook
Ying Bai
ISBN: 0849322138
OTHER AUERBACH PUBLICATIONS
AU2800_titlepage 4/26/05 9:32 AM Page 1
Boca Raton London New York Singapore
Grid
Database
Design
April J. Wells

Published in 2005 by
Auerbach Publications
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2005 by Taylor & Francis Group, LLC
Auerbach is an imprint of Taylor & Francis Group
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper

10987654321
International Standard Book Number-10: 0-8493-2800-4 (Hardcover)
International Standard Book Number-13: 978-0-8493-2800-8 (Hardcover)
Library of Congress Card Number 2005040962
This book contains information obtained from authentic and highly regarded sources. Reprinted material is
quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts
have been made to publish reliable data and information, but the author and the publisher cannot assume
responsibility for the validity of all materials or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
( or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive,
Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration
for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate
system of payment has been arranged.

Trademark Notice:

Product or corporate names may be trademarks or registered trademarks, and are used only
for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Wells, April J.
Grid database design / April J. Wells.
p. cm.
Includes bibliographical references and index.
ISBN 0-8493-2800-4 (alk. paper)
1. Computational grids (Computer systems) 2. Database design. I. Title.

QA76.9C58W45 2005

004'.36 dc22 2005040962

Visit the Taylor & Francis Web site at

and the Auerbach Publications Web site at

Taylor & Francis Group
is the Academic Division of T&F Informa plc.

v

Preface

Computing has come a long way since our earliest beginnings. Many of
us have seen complete revisions of computing technology in our lifetimes.
I am not that old, and I have seen punch cards and Cray supercomputers,
numbered Basic on an Apple IIe, and highly structured C. Nearly all of
us can remember when the World Wide Web began its popularity and
when there were only a few pictures available in a nearly all textual
medium. Look at where we are now. Streaming video, MP3s, games, and
chat are a part of many thousands of lives, from the youngest children
just learning to mouse and type, to senior citizens staying in touch and
staying active and involved regardless of their locations. The Internet and
the World Wide Web have become a part of many households’ daily lives
in one way or another. They are often taken for granted, and highly
missed when they are unavailable. There are Internet cafés springing up
in towns all over the United States, and even major cruise lines have them
available for not only the passengers, but the crew as well.

We are now standing on the edge of yet another paradigm shift, Grid
computing. Grid computing, it is suggested, may even be bigger than the
Internet and World Wide Web, and for most of us, the adventure is just
beginning. For many of us, especially those of us who grew up with
mainframes and stand-alone systems getting bigger and bigger, the new
model is a big change. But it is also an exciting change — where will
we be in the next five years?

Goals of This Book

My main goal in writing this book is to provide you with information on
the Grid, its beginning, background, and components, and to give you
an idea of how databases will be designed to fit into this new computing

vi



Grid Database Design

model. Many of the ideas and concepts are not new, but will have to be
addressed in the context of the new model, with many different consid-
erations to be included.
Many people in academia and research already know about the Grid
and the power that it can bring to computing, but many in business are
just beginning to hear the rumblings and need to be made aware of ways
in which the new concepts could potentially impact them and their ways
of computing in the foreseeable future.

Audience


The proposed audience is those who are looking at Grid computing as
an option, or those who want to learn more about the emerging technol-
ogy. When I started out, I wanted to let other database administrators in
on what might be coming in the future, and what they could expect that
future to look like. However, I believe that the audience is even bigger
and should encompass not only database administrators, but systems
administrators and programmers and executives — anyone hearing the
rumblings and wanting to know more.
The background in Section 1 is designed as just that, background. If
you have a grasp on how we got to where we are now, you may want
to read it for the entertainment value, the trip down memory lane, so to
speak, or you may just want to skip large portions of it as irrelevant to
where you are now.
Section 2 starts the meat of the book, introducing the Grid and its
components and important concepts and ideas, and Section 3 delves into
the part that databases will play in the new paradigm and how those
databases need to act to play nicely together.

Structure of the Book

This book is broken down into three sections and twelve chapters, as
follows:

Section 1

In Section 1 we lay the groundwork. We cover some background on
computing and how we got to where we are. We are, in many places
and situations, already taking baby steps toward integration of the new
paradigm into the existing framework.


Preface



vii

Chapter 1

Chapter 1 will cover computing history, how we got here, the major
milestones for computing, and the groundwork for the Grid, where we
are launching the future today. It includes information on the beginnings
of networking and the Internet, as it is the model on which many people
are defining the interaction with the Grid.

Chapter 2

Chapter 2 will provide definitions of where much of the Grid is now, the
major players, and many of the components that make up the Grid system.

Chapter 3

Chapter 3 is sort of the proof of the pudding. It provides a partial list of
those commercial and academic ventures that have been the early adopters
of Grid and have started to realize its potential. We have a long way to
go before anyone can hope to realize anything as ubiquitous as commodity
computing, but we have come a long way from our beginnings, too.

Section 2


Section 2 goes into what is entailed in building a Grid. There are a variety
of ideas and components that are involved in the definition, concepts that
you need to have your arms around before stepping off of the precipices
and flying into the future.

Chapter 4

Chapter 4 looks at the security concerns and some of the means that can
be used to address these concerns. As the Grid continues to emerge, so
will the security concerns and the security measures developed to address
those concerns.

Chapter 5

Chapter 5 looks at the underlying hardware on which the Grid runs. With
the definition of the Grid being that it can run on nearly anything, from
PC to Supercomputer, the hardware is hard to define, but there are
emerging components being built today specifically with the goal of
enabling the new technology.

viii



Grid Database Design

Chapter 6

Metadata is important in any large system; the Grid is definitely the rule,
rather than the exception. Chapter 6 will look at the role that metadata

plays and will need to play in the Grid as it continues to evolve.

Chapter 7

What are the business and technology drivers that are pushing the Grid
today and will continue to push it into the future? Chapter 7 looks at not
only the technological reasons for implementing a Grid environment (and
let us face it, the best reason for many technologists is simply because it
is really cool), but also the business drivers that will help to allow the
new technology to make its inroads into the organization.

Section 3

Section 3 delves into the details of databases in a Grid environment.
Databases have evolved on their own over the last several decades, and
continue to redefine themselves depending on the organization in which
they find themselves. The Grid will add environmental impact to the
evolution and will help to steer the direction that that evolution will take.

Chapter 8

Chapter 8 will provide us with an introduction to databases, particularly
relational database, which are where some of the greatest gains can be
made in the Grid environment. We will look at the terminology, the
mathematical background, and some of the differences in different rela-
tional models.

Chapter 9

Chapter 9 will look at parallelism in database design and how parallelized

databases can be applied in the Grid environment.

Chapter 10

Chapter 10 will take parallelism a step further and look at distributed
databases and the ramifications of distributing in a highly distributed Grid
environment.

Preface



ix

Chapter 11

Finally, Chapter 11 will look at the interaction with the database from the
applications and end users. We will look at design issues and issues with
interacting with the different ideas of database design in the environment.

Chapter 12

Chapter 12 provides a summary of the previous chapters.
We are standing on the edge of a new era. Let the adventure begin.


xi

Acknowledgments


My heartiest thanks go to everyone who contributed to my ability to bring
this book to completion. Thanks especially to John Wyzalek from Auerbach
Publications for his support and faith that I could do it. His support has
been invaluable.
As always, my deepest gratitude goes to Larry, Adam, and Amandya
for being there for me, standing beside me, and putting up with the long
hours shut away and the weekends that we did not get to do a lot of
fun things because I was writing. Thank you for being there, for under-
standing, and for rescuing me when I needed rescuing.


xiii

Contents

SECTION I: IN THE BEGINNING
1

History 3

Computing 3
Early Mechanical Devices 3
Computing Machines 11
The 1960s 17
The 1970s 22
The 1980s 26
The 1990s 30
The 21st Century 33

2


Defi nition and Components 35

P2P 37
Napster 38
Gnutella 38
Types 40
Computational Grid 40
Distributed Servers and Computation Sites 41
Remote Instrumentation 41
Data Archives 42
Networks 43
Portal (User Interface) 43
Security 44
Broker 45
User Profile 45
Searching for Resources 46
Batch Job Submittal 46
Credential Repository 48
Scheduler 48
Data Management 49
Data Grid 50

xiv



Grid Database Design

Storage Mechanism Neutrality 51

Policy Neutrality 51
Compatibility with Other Grid Infrastructure 51
Storage Systems 51
Access or Collaboration Grid 52
Large-Format Displays 52
Presentation Environments 53
Interfaces to Grid Middleware 53
Others 54
Scavenging Grid 54
Grid Scope 56
Project Grid, Departmental Grid, or Cluster Grid 56
Enterprise Grid or Campus Grid 58
Global Grid 58

3

Early Adopters 59

Computational and Experimental Scientists 59
Bioinformatics 60
Corporations 60
Academia 60
University of Houston 61
University of Ulm Germany 61
The White Rose University Consortium 62
Science 62
Particle Physics 62
Industries 63
Gaming 63
Financial 65

Wachovia 66
RBC Insurance 66
Charles Schwab 66
Life Science 67
The American Diabetes Association 67
North Carolina Genomics and Bioinformatics Consortium 69
Spain’s Institute of Cancer Research 69
Petroleum 69
Royal Dutch Shell 69
Utilities 70
Kansai Electric Power Co., Inc. 70
Manufacturing 70
Ford Motor Company 70
Saab Automobile 71
Motorola 71
Government 71
NASA 72
U.S. Department of Defense 72
European Union 73

Contents



xv

Flemish Government 74
Benefits 75
Virtualization 75


SECTION II: THE PARTS AND PIECES
4

Security 83

Security 83
Authentication 84
Reciprocity of Identification 85
Computational Efficiency 85
Communication Efficiency 86
Third-Party Real-Time Involvement 86
Nature of Security 86
Secret Storage 87
Passwords 87
Private Key 88
Block Ciphers 89
Stream Ciphers 89
Public Key 91
Digital Signature 96
Authorization 101
Delegation of Identity 102
Delegation of Authority 103
Accounting 103
Audit 103
Access Control 104
DAC 104
MAC 105
Allow and Deny 106
Satisfy 107
Role-Based Access 107

Usage Control 108
Cryptography 108
Block Cipher 109
Stream Ciphers 110
Linear Feedback Shift Register 110
One-Time Pad 111
Shift Register Cascades 111
Shrinking Generators 112
Accountability 112
Data Integrity 115
Attenuation 116
Impulse Noise 116
Cross Talk 116
Jitter 117
Delay Distortion 117

xvi



Grid Database Design

Capability Resource Management 118
Database Security 121
Inference 121
Server Security 124
Database Connections 125
Table Access Control 125
Restricting Database Access 130
DBMS Specific 131


5

The Har dwar e 133

Computers 133
Blade Servers 138
Storage 140
I/O Subsystems 143
Underlying Network 143
Operating Systems 144
Visualization Environments 144
People 145

6

Metadata 147

Grid Metadata 152
Data Metadata 153
Physical Metadata 154
Domain-Independent Metadata 154
Content-Dependent Metadata 154
Content-Independent Metadata 155
Domain-Specific Metadata 155
Ontology 155
User Metadata 155
Application Metadata 156
External Metadata 156
Logical Metadata 157

User 157
Data 158
Resources 158
Metadata Services 158
Context 158
Structure 158
Define the Data Granularity 159
Database 159
Access 159
Metadata Formatting 160
XML 161
What Is XML? 161
Application 168
MCAT 169
Conclusion 170

Contents



xvii

7

Drivers 171

Business 174
Accelerated Time to Results 174
Operational Flexibility 174
Leverage Existing Capital Investments 175

Better Resource Utilization 176
Enhanced Productivity 176
Better Collaboration 178
Scalability 178
ROI 179
Reallocation of Resources 180
TCO 181
Technology 183
Infrastructure Optimization 183
Increase Access to Data and Collaboration 183
Resilient, Highly Available Infrastructure 183
Make Most Efficient Use of Resources 184
Services Oriented 185
Batch Oriented 186
Object Oriented 186
Supply and Demand 186
Open Standards 187
Corporate IT Spending Budgets 187
Cost, Complexity, and Opportunity 188
Better, Stronger, Faster 190
Efficiency Initiatives 191

SECTION III: DATABASES IN THE GRID
8

Intr oducing Databases 195

Databases 195
Relational Database 196
Tuples 197

Attributes 198
Entities 198
Relationship 198
Relational Algebra 198
Union 198
Intersection 198
Difference 199
Cartesian Product 199
Select 199
Project 200
Join 200
Relational Calculus 200
Object Database 202
Architecture Differences between Relational and Object Databases 203

xviii



Grid Database Design

Object Relational Database 203
SQL 205
Select 206
Where 206
And/Or 206
In 207
Between 207
Like 207
Insert 207

Update 208
Delete 208
Database 209
Data Model 209
Schema 209
Relational Model 209
Anomalies 209
Insert Anomaly 210
Deletion Anomaly 210
Update Anomaly 210

9

Parallel Database 213

Data Independence 213
Parallel Databases 214
Start-Up 216
Interference 216
Skew 217
Attribute Data Skew 217
Tuple Placement Skew 217
Selectivity Skew 217
Redistribution Skew 217
Join Product Skew 218
Multiprocessor Architecture Alternatives 218
Shared Everything 218
Shared Disk 219
Shared Nothing (Message Passing) 220
Hybrid Architecture 221

Hierarchical Cluster 221
NUMA 222
Disadvantages of Parallelism 222
Database Parallelization Techniques 224
Data Placement 224
Parallel Data Processing 224
Parallel Query Optimization 224
Transaction Management 224
Parallelism Versus Fragmentation 224
Round-Robin 225
Hash Partitioning 225

Contents



xix

Range Partitioning 225
Horizontal Data Partitioning 226
Replicated Data Partitioning 226
Chained Partitioning 227
Placement Directory 227
Index Partitioning 228
Partitioning Data 228
Data-Based Parallelism 228
Interoperation 228
Intraoperation 229
Pipeline Parallelism 229
Partitioned Parallelism 230

Parallel Data Flow Approach 231
Retrieval 232
Point Query 232
Range Query 232
Inverse Range Query 232
Parallelizing Relational Operators 233
Operator Replication 233
Merge Operators 233
Parallel Sorting 233
Parallel Aggregation 234
Parallel Joins 234
Data Skew 237
Load Balancing Algorithm 237
Dynamic Load Balancing 238

10

Distributing Databases 241

Advantages 245
Disadvantages 245
Rules for Distributed Databases 246
Fragmentation 248
Completeness 249
Reconstruction 249
Disjointedness 249
Transparency 249
Distribution Transparency 250
Fragmentation Transparency 250
Location Transparency 250

Replication Transparency 250
Local Mapping Transparency 251
Naming Transparency 251
Transaction Transparency 251
Performance Transparency 252
Vertical Fragmentation 252
Horizontal Fragmentation 254
Hybrid 255

xx



Grid Database Design

Replication 255
Metadata 256
Distributed Database Failures 257
Failure of a Site 257
Loss of Messages 257
Failure of a Communication Link 257
Network Partition 257
Data Access 258

11

Data Synchr onization 261

Concurrency Control 262
Distributed Deadlock 262

Database Deadlocks 264
Multiple-Copy Consistency 265
Pessimistic Concurrency Control 266
Two-Phase Commit Protocol 267
Time Stamp Ordering 267
Optimistic Concurrency Control 268
Heterogeneous Concurrency Control 270
Distributed Serializability 271
Query Processing 271
Query Transformations 271
Transaction Processing 271
Heterogeneity 272

12

Conclusion 275
Index 277

I

IN THE

BEGINNING

The adventure begins. We will start our adventure with the history of
computing (not just computers). Computing in one fashion or another has
been around as long as man. This section looks at those beginnings and
takes a trip through time to the present. It follows computing as its servers
and processors grew bigger and bigger, through the introduction of the
Internet, and through the rise of the supercomputer.

We will then take those advances and look at the beginnings of
distributed computing, first looking at peer-to-peer processing, then at the
beginnings of the Grid as it is becoming defined. We look at the different
kinds of Grids and how the different definitions can be combined to play
together. Regardless of what you want to accomplish, there is a Grid that
is likely to fill the need. There are even Grids that include the most
overlooked resource that a company has, its intellectual capital.
Finally, we will look at others who have stood where many stand
today, on the edge of deciding if they really want to make the step out
of the known and into the future with the implementation of the Grid
and its new concepts in computing.
This background section will bring you up to speed to where we find
ourselves today. Many will skip or skim the material, others will enjoy
the walk down memory lane, and others will find it very educational
walking through these pages of the first section.
Enjoy your adventure.


3

Chapter 1

History

In pioneer days they used oxen for heavy pulling, and when
one ox couldn’t budge a log, they didn’t try to grow a larger
ox. We shouldn’t be trying for bigger computers, but for more
systems of computers.

—Rear Admiral Grace Murray Hopper


Computing

Computing has become synonymous with mechanical computing and the
PC, mainframe, midrange, supercomputers, servers, and other modern
views on what is computing, but computers and computing have a rich
history.

Early Mechanical Devices

The very first counting device was (and still is) the very first one we use
when starting to deal with the concept of numbers and calculations, the
human hand with its remarkable fingers (and occasionally, for those bigger
numbers, the human foot and its toes). Even before the formal concept
of numbers was conceived, there was the need to determine amounts
and to keep track of time. Keeping track of numbers, before numbers
were numbers, was something that people wanted to do. When the volume

4



Grid Database Design

of things to be counted grew too large to be determined by the amount
of personal fingers and toes (or by the additional available fingers and
toes of people close by), whatever was readily at hand was used. Pebbles,
sticks, and other natural objects were among the first things to extend the
countability and calculability of things. This idea can be equally observed
in young children today in counting beads, beans, and cereal.

People existing in early civilizations needed ways not only to count
things, but also to allow merchants to calculate the amounts to be charged
for goods that were traded and sold. This was still before the formal
concept of numbers was a defined thing. Counting devices were used
then to determine these everyday calculations.
One of the very first mechanical computational aids that man used in
history was the counting board, or the early abacus. The abacus (Figure
1.1), a simple counting aid, was probably invented sometime in the fourth
century

B

.

C

. The counting board, the precursor to what we think of today
as the abacus, was simply a piece of wood or a simple piece of stone
with carved, etched, or painted lines on the surface between which beads
or pebbles would have been moved. The abacus was originally made of
wood with a frame that held rods with freely sliding beads mounted on
the rods. These would have simply been mechanical aids to counting,
not counting devices themselves, and the person operating these aids still
had to perform the calculations in his or her head. The device was simply
a tool to assist in keeping track of where in the process of calculation
the person was, by visually tracking carries and sums.
Arabic numerals (for example, the numbers we recognize today as 1,
2, 3, 4, 5 …) were first introduced to Europe around the eighth century

A


.

D

., although Roman numerals (I, II, III, IV, V …) remained in heavy use
in some parts of Europe until as late as the late 17th century

A

.

D

. and are
often still used today in certain areas. Although math classes taught Roman

Figure 1.1 The abacus. (From
abacus.jpg.)

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×