Tải bản đầy đủ (.pdf) (220 trang)

the data warehousing ebusiness dba handbook

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.67 MB, 220 trang )


The Data Warehouse eBusiness DBA
Handbook














Donald K. Burleson

Joseph Hudicka
William H.

Inmon

Craig Mullins

Fabian Pascal








The Data Warehouse eBusiness DBA
Handbook

By Donald K. Burleson, Joseph Hudicka, William H. Inmon,
Craig Mullins, Fabian Pascal

Copyright © 2003 by BMC Software and DBAzine. Used with permission.

Printed in the United States of America.


Series Editor:
Donald K. Burleson


Production Manager: John Lavender

Production Editor: Teri Wade

Cover Design:
Bryan Hoff

Printing History:

August, 2003 for First Edition


Oracle, Oracle7, Oracle8, Oracle8i and Oracle9i are trademarks of Oracle Corporation.


Many of the designations used by computer vendors to distinguish their products are
claimed as Trademarks. All names known to Rampant TechPress to be trademark names
appear in this text as initial caps.

The information provided by the authors of this work is believed to be accurate and
reliable, but because of the possibility of human error by our authors and staff, BMC
Software, DBAZine and Rampant TechPress cannot guarantee the accuracy or
completeness of any information included in this work and is not responsible for any
errors, omissions or inaccurate results obtained from the use of information or scripts in
this work.

Links to external sites are subject to change; DBAZine.com, BMC Software and
Rampant TechPress do not control or endorse the content of these external web sites,
and are not responsible for their content.

ISBN 0-9740716-2-5

iii
The Data Warehousing eBusiness DBA Handbook

Table of Contents

Conventions Used in this Book ix

About the Authors xi


Foreword xiii

Chapter 1 - Data Warehousing and eBusiness 1

Making the Most of E-business by W. H. Inmon 1

Chapter 2 - The Benefits of Data Warehousing 9

The Data Warehouse Foundation by W. H. Inmon 9

References 18

Chapter 3 - The Value of the Data Warehouse 19

The Foundations of E-Business by W. H. Inmon 19

Why the Internet? 19

Intelligent Messages 20

Integration, History and Versatility 21

The Value of Historical Data 22

Integrated Data 23

Looking Smarter 26

Chapter 4 - The Role of the eDBA 28


Logic, e-Business, and the Procedural eDBA by Craig S.
Mullins 28

The Classic Role of the DBA 28

The Trend of Storing Process With Data 30

Database Code Objects and e-Business 32

Database Code Object Programming Languages 34

The Duality of the DBA 35

The Role of the Procedural DBA 37

Synopsis 38

Chapter 5 - Building a Solid Information Architecture 39

iv
The Data Warehousing eBusiness DBA Handbook

How to Select the Optimal Information Exchange Architecture
by Joseph Hudicka 39

Introduction 39
The Main Variables to Ponder 40

Data Volume 40


Available System Resources 41

Transformation Requirements 41

Frequency 41

Optimal Architecture Components 42

Conclusion 42

Chapter 6 - Data 101 43

Getting Down to Data Basics by Craig S. Mullins 43

Data Modeling and Database Design 43

Physical Database Design 45

The DBA Management Discipline 46

The 17 Skills Required of a DBA 47

Meeting the Demand 51

Chapter 7 - Designing Efficient Databases 52

Design and the eDBA by Craig S. Mullins 52

Living at Web Speed 52


Database Design Steps 54

Database Design Traps 57

Taming the Hostile Database 59

Chapter 8 - The eBusiness Infrastructure 61

E-Business and Infrastructure by W. H. Inmon 61

Chapter 9 - Conforming to Your Corporate Structure 68

Integrating Data in the Web-Based E-Business Environment by
W. H. Inmon 68

Chapter 10 - Building Your Data Warehouse 77

The Issues of the E-Business Infrastructure by W. H. Inmon 77

Large Volumes of Data 79

Performance 83

Integration 85

Table of Contents
v


Addressing the Issues 87

Chapter 11 - The Importance of Data Quality Strategy 88

Develop a Data Quality Strategy Before Implementing a Data
Warehouse by Joseph Hudicka 88

Data Quality Problems in the Real World 88

Why Data Quality Problems Go Unresolved 89

Fraudulent Data Quality Problems 90

The Seriousness of Data Quality Problems 91

Data Collection 92

Solutions for Data Quality Issues 92

Option 1: Integrated Data Warehouse 92

Option 2: Value Rules 94

Option 3: Deferred Validation 94

Periodic sampling averts future disasters 94

Conclusion 96

Chapter 12 - Data Modeling and eBusiness 97

Data Modeling for the Data Warehouse by W. H. Inmon 97


"Just the Facts, Ma'am" 97

Modeling Atomic Data 98

Through Data Attributes, Many Classes of Subject Areas Are
Accumulated 100

Other Possibilities - Generic Data Models 103

Design Continuity from One Iteration of Development to the
Next 104

Chapter 13 - Don't Forget the Customer 105

Interacting with the Internet Viewer by W. H. Inmon 105

IN SUMMARY 113

Chapter 14 - Getting Smart 114

Elasticity and Pricing: Getting Smart by W. H. Inmon 114

Historically Speaking 114

At the Price Breaking Point 116

vi
The Data Warehousing eBusiness DBA Handbook


How Good Are the Numbers 117

How Elastic Is the Price 118
Conclusion 120

Chapter 15 - Tools of the Trade: Java 121

The eDBA and Java by Craig S. Mullins 121

What is Java? 121

Why is Java Important to an eDBA? 122

How can Java improve availability? 123

How Will Java Impact the Job of the eDBA? 124

Resistance is Futile 127

Conclusion 128

Chapter 16 - Tools of the Trade: XML 129

New Technologies of the eDBA: XML by Craig S. Mullins . 129

What is XML? 129

Some Skepticism 132

Integrating XML 133


Defining the Future Web 134

Chapter 17 - Multivalue Database Technology Pros and
Cons 136

MultiValue Lacks Value by Fabian Pascal 136

References 144

Chapter 18 - Securing your Data 146

Data Security Internals by Don Burleson 146

Traditional Oracle Security 147

Concerns About Role-based Security 150

Closing the Back Doors 151

Oracle Virtual Private Databases 152

Procedure Execution Security 158

Conclusion 160

Chapter 19 - Maintaining Efficiency 162

eDBA: Online Database Reorganization by Craig S. Mullins 162


Reorganizing Tablespaces 166

Table of Contents
vii


Online Reorganization 167
Synopsis 168

Chapter 20 - The Highly Available Database 170

The eDBA and Data Availability by Craig S. Mullins 170

The First Important Issue is Availability 171

What is Implied by e-vailability? 171

The Impact of Downtime on an e-business 175

Conclusion 176

Chapter 21 - eDatabase Recovery Strategy 177

The eDBA and Recovery by Craig S. Mullins 177

eDatabase Recovery Strategies 179

Recovery-To-Current 181

Point-in-Time Recovery 183


Transaction Recovery 184

Choosing the Optimum Recovery Strategy 188

Database Design 189

Reducing the Risk 189

Chapter 22 - Automating eDBA Tasks 191

Intelligent Automation of DBA Tasks by Craig S. Mullins 191

Duties of the DBA 192

A Lot of Effort 194

Intelligent Automation 195

Synopsis 196

Chapter 23 - Where to Turn for Help 197

Online Resources of the eDBA by Craig S. Mullins 197

Usenet Newsgroups 197

Mailing Lists 200

Websites and Portals 201


No eDBA Is an Island 203


viii
The Data Warehousing eBusiness DBA Handbook

Conventions Used in this Book
It is critical for any technical publication to follow rigorous
standards and employ consistent punctuation conventions to
make the text easy to read.

However, this is not an easy task. Within Oracle there are
many types of notation that can confuse a reader. Some Oracle
utilities such as STATSPACK and TKPROF are always spelled
in CAPITAL letters, while Oracle parameters and procedures
have varying naming conventions in the Oracle documentation.
It is also important to remember that many Oracle commands
are case sensitive, and are always left in their original executable
form, and never altered with italics or capitalization.

Hence, all Rampant TechPress books follow these conventions:

Parameters
- All Oracle parameters will be lowercase italics.
Exceptions to this rule are parameter arguments that are
commonly capitalized (KEEP pool, TKPROF), these will be
left in ALL CAPS.
Variables
– All PL/SQL program variables and arguments will

also remain in lowercase italics (dbms_job, dbms_utility).
Tables & dictionary objects
– All data dictionary objects are
referenced in lowercase italics (dba_indexes, v$sql). This
includes all v$ and x$ views (x$kcbcbh, v$parameter) and
dictionary views (dba_tables, user_indexes).
SQL
– All SQL is formatted for easy use in the code depot,
and all SQL is displayed in lowercase. The main SQL terms
(select, from, where, group by, order by, having) will always
appear on a separate line.
Conventions Used in this Book
ix

Programs & Products
– All products and programs that are
known to the author are capitalized according to the vendor
specifications (IBM, DBXray, etc). All names known by
Rampant TechPress to be trademark names appear in this
text as initial caps. References to UNIX are always made in
uppercase.
x
The Data Warehousing eBusiness DBA Handbook

About the Authors
Bill Inmon

is universally recognized as the "father of the data
warehouse." He has more than 26 years of database
technology management experience and data warehouse

design expertise, and has published 36 books and more than
350 articles in major computer journals. He is known
globally for his seminars on developing data warehouses and
has been a keynote speaker for many major computing
associations. Inmon has consulted with a large number of
Fortune 1000 clients, offering data warehouse design and
database management services. For more information, visit
www.BillInmon.com or call (303) 221-4000.
Joseph Hudicka is the founder of the Information Architecture
Team, an organization that specializes in data quality, data
migration, and ETL. Winner of the ODTUG Best Speaker
award for the Spring 1999 conference, Joseph is an
internationally recognized speaker at ODTUG, OOW,
IOUG-A, TDWI and many local user groups. Joseph
coauthored Oracle8 Design Using UML Object Modeling
for Osborne/McGraw-Hill & Oracle Press, and has also
written or contributed to several articles for publication in
DMReview, Intelligent Enterprise and The Data
Warehousing Institute (TDWI).
Craig S. Mullins is a director of technology planning for BMC
Software. He has over 15 years of experience dealing with
data and database technologies. He is the author of the book
DB2 Developer's Guide (now available in a fourth edition that
covers up to and includes the latest release of DB2 -Version
6) and is working on a book about database administration
practices (to be published this year by Addison Wesley).
About the Authors
xi

Craig can be reached via his Website at

www.craigsmullins.com or at
Fabian Pascal has a national and international reputation as an
independent technology analyst, consultant, author and
lecturer specializing in data management. He was affiliated
with Codd & Date and for 20 years held various analytical
and management positions in the private and public sectors,
has taught and lectured at the business and academic levels,
and advised vendor and user organizations on data
management technology, strategy and implementation.
Clients include IBM, Census Bureau, CIA, Apple, Borland,
Cognos, UCSF, and IRS. He is founder, editor and publisher
of DATABASE DEBUNKINGS
( a Web site dedicated to
dispelling persistent fallacies, flaws, myths and
misconceptions prevalent in the IT industry (Chris Date is a
senior contributor). Author of three books, he has published
extensively in most trade publications, including DM Review,
Database Programming and Design, DBMS, Byte, Infoworld and
Computerworld. He is author of the contrarian columns Against
the Grain, Setting Matters Straight, and for The Journal of
Conceptual Modeling. His third book, Practical Issues in Database
MANAGEMENT serves as text for his seminars.

xii
The Data Warehousing eBusiness DBA Handbook

Foreword
With the advent of cheap disk I/O subsystems, it is finally
possible for database professionals to have databases store
multiple billions and even multiple trillions of bytes of

information. As the size of these databases increases to
behemoth proportions, it is the challenge of the database
professionals to understand the correct techniques for loading,
maintaining, and extracting information from very large
database management systems. The advent of cheap disks has
also led to an explosion in business technology, where even the
most modest financial investment can bring forth an online
system with many billions of bytes. It is imperative that the
business manager understand how to manage and control large
volumes of information while at the same time provide the
consumer with high-volume throughput and sub-second
response time

This book provides you with insight into how to build the
foundation of your eBusiness application. You’ll learn the
importance of the Data Warehouse in your daily operations.
You’ll gain lots of insight into how to properly design and build
your information architecture to handle the rapid growth that
eCommerce business sees today. Once your system is up and
running, it must be maintained. There is information in this
text that goes through how to maintain online data systems to
reduce downtime. Keeping your online data secure is another
big issue with online business. To wrap things up, you’ll get
links to some of the best online resources on Data
Warehousing.

The purpose of this book is to give you significant insights into
how you can manage and control large volumes of data. As the
Foreword
xiii


technology has expanded to support terabyte data capacity, the
challenge to the database professionals is to understand
effective techniques for the loading and maintaining of these
very large database systems. This book brings together some of
the world's foremost authors on data warehousing in order to
provide you with the insights that you need to be successful in
your data warehousing endeavors.
xiv
The Data Warehousing eBusiness DBA Handbook

1
Data Warehousing
and eBusiness
CHAPTER

Making the Most of E-business
Everywhere you look today, you see e-business. In the trade
journals. On TV. In the Wall Street Journal. Everywhere. And
the message is that if your business is not e-business enabled,
that you will be behind the curve.

So what is all the fuss about? Behind the corporate push to get
into e-business is a Web site. Or multiple Web sites. The Web
site allows your corporation to have a reach into the
marketplace that is direct and far reaching. Businesses that
would never have entertained entry to foreign marketplaces and
other marketplaces that are hard to access suddenly have easy
and cheap presence. In a word, e-business opens up
possibilities that previously were impractical or even

impossible.

So the secret to e-business is a Web site. Right? Well almost.
Indeed, a Web site is a wonderful delivery mechanism. The
Web site allows you to go where you might not have ever been
able to go before. But after all is said and done, a Web site is
merely a delivery mechanism. To be effective, the delivery
mechanism must be allied with application of strong business
propositions. There is a way of expressing this opportunity =
delivery mechanism + business proposition.
Making the Most of E-business
1


Figure 1:
The web site is at the heart of e-Business


To illustrate the limitations of a Web site, consider the personal
Web sites that many people have created. If there were any
inherent business advantage to having a Web site, then these
personal sites would be achieving business results for their
owners. But no one thinks that just putting up a Web site
produces results. It is what you do with the Web site that
counts.

To exploit the delivery mechanism that is the Web
environment, applications are necessary. There are many kinds
of applications that can be adapted to the Web environment.
But the most potent, most promising applications are a class

that are called Customer Relationship Management (CRM)
applications. CRM applications have the capability of
producing very important business results. Executed properly,
CRM applications:
 protect market share
 gain new market share
 increase revenues
 increase profits
2
The Data Warehousing eBusiness DBA Handbook

And there's not a business around that doesn't want to do these
things.

So what kind of applications are we talking about here? There
are many different flavors. Typical CRM applications include:
 yield management
 customer retention
 customer segmentation
 cross selling
 up selling
 household selling
 affinity analysis
 market basket analysis
 fraud detection
 credit scoring, and so forth
In short, there are many different ways that applications can be
created to absolutely maximize the effectiveness of the Web.
Stated differently, without these applications, the Web
environment is just another Web site.


And there are other related non-CRM applications that can
improve the bottom line of business as well. These applications
include:
 quality control
 profitability analysis
 destination analysis (for airlines)
 purchasing consolidation, and the like
Making the Most of E-business
3

In short, once the Web is enabled by supporting applications,
then very real business advantage occurs.

But applications do not just happen by themselves.
Applications such as CRM and others are built on a foundation
of data called a data warehouse. The data warehouse is at the
center of an infrastructure called the "corporate information
factory." Figure 2 shows the corporate information factory and
the Web environment.


Figure 2:
Sitting behind the web site is the infrastructure called the
"corporate information factory"


Figure 2 shows that the Web environment serves as a conduit
into the corporate information factory. The corporate
information factory provides a variety of important functions

for the Web environment:
4
The Data Warehousing eBusiness DBA Handbook

 the corporate information factory enables the Web
environment to gather and manage an unlimited amount of
data
 the corporate information factory creates and environment
where sweeping business patterns can be detected and
analyzed
 the corporate information factory provides a place where
Web-based data can be integrated with other corporate data
 the corporate information factory makes edited and
integrated data quickly available to the Web environment,
and so forth
In a word, the corporate information factory provides the
background infrastructure that turns the Web from a delivery
mechanism into a truly powerful tool. The different
components of the corporate information factory are:
 the data warehouse
 the corporate ODS
 data marts
 the exploration warehouse
 alternative/near-line storage
The heart of the corporate information factory is the data
warehouse. The data warehouse is a structure that contains:
 detailed, granular data
 integrated data
 historical data
 corporate data

Making the Most of E-business
5

A convenient way to think of the data warehouse is as a
structure that contain very fine grains of sand. Different
applications take those grains of sand and reshape them into
the form and structure that is most familiar to the organization.

One of the issues that frequently arises with applications for
the Web is whether it is necessary to have a data warehouse in
support of the applications. Strictly speaking, it is not necessary
to have a data warehouse in support of the applications that
run on the Web. Figure 3 shows that different applications
have been built from the legacy foundation.

Figure 3:
Building applications without a data warehouse

6
The Data Warehousing eBusiness DBA Handbook

In Figure 3, multiple applications have been built from the
same supporting applications. Looking at figure 3, it becomes
clear that the same processing accessing data, gathering data,
editing data, cleansing data, merging data and integrating data
are done for every application. Almost all of the processing
shown is redundant. There is no need for every application to
repeat what every other application has done. Figure 4 shows
that by building a data warehouse, the repetitive activities are
done just once.



Figure 3:
Building a data warehouse for the different applications
Making the Most of E-business
7

In figure 4, the infrastructure activities of accessing data,
gathering data, editing data, cleansing data, merging data and
integrating data are done once. The savings are obvious. But
there are some other powerful reasons why building a data
warehouse makes sense:
 when it comes time to build a new application, with a data
warehouse in place the application can be constructed
quickly; with no data warehouse in place, the infrastructure
has to be built again
 if there is a discrepancy in values, with a data warehouse
those values can be resolved easily and quickly
 the resources required for access of legacy data are minimal
when there is a data warehouse; when there is no data
warehouse, the resources required for the access of legacy
data grow with each new application, and so forth
In short, when an organization takes a long-term perspective,
the data warehouse at the center of the corporate information
factory is the only way to fly.

It is intuitively obvious that a foundation of integrated
historical granular data is useful for competitive advantage. But
one step beyond intuition, the question must be asked exactly
how can integrated historical data be turned into competitive

advantage. It is the purpose of the articles to follow to explain
how integrated historical data can be turned into competitive
advantage and how that competitive advantage can be delivered
through the Web.

8
The Data Warehousing eBusiness DBA Handbook

×