The Data Warehouse eBusiness DBA
Handbook
Donald K. Burleson
Joseph Hudicka
William H.
Inmon
Craig Mullins
Fabian Pascal
The Data Warehouse eBusiness DBA
Handbook
By Donald K. Burleson, Joseph Hudicka, William H. Inmon,
Craig Mullins, Fabian Pascal
Copyright © 2003 by BMC Software and DBAzine. Used with permission.
Printed in the United States of America.
Series Editor:
Donald K. Burleson
Production Manager: John Lavender
Production Editor: Teri Wade
Cover Design:
Bryan Hoff
Printing History:
August, 2003 for First Edition
Oracle, Oracle7, Oracle8, Oracle8i and Oracle9i are trademarks of Oracle Corporation.
Many of the designations used by computer vendors to distinguish their products are
claimed as Trademarks. All names known to Rampant TechPress to be trademark names
appear in this text as initial caps.
The information provided by the authors of this work is believed to be accurate and
reliable, but because of the possibility of human error by our authors and staff, BMC
Software, DBAZine and Rampant TechPress cannot guarantee the accuracy or
completeness of any information included in this work and is not responsible for any
errors, omissions or inaccurate results obtained from the use of information or scripts in
this work.
Links to external sites are subject to change; DBAZine.com, BMC Software and
Rampant TechPress do not control or endorse the content of these external web sites,
and are not responsible for their content.
ISBN 0-9740716-2-5
iii
The Data Warehousing eBusiness DBA Handbook
Table of Contents
Conventions Used in this Book ix
About the Authors xi
Foreword xiii
Chapter 1 - Data Warehousing and eBusiness 1
Making the Most of E-business by W. H. Inmon 1
Chapter 2 - The Benefits of Data Warehousing 9
The Data Warehouse Foundation by W. H. Inmon 9
References 18
Chapter 3 - The Value of the Data Warehouse 19
The Foundations of E-Business by W. H. Inmon 19
Why the Internet? 19
Intelligent Messages 20
Integration, History and Versatility 21
The Value of Historical Data 22
Integrated Data 23
Looking Smarter 26
Chapter 4 - The Role of the eDBA 28
Logic, e-Business, and the Procedural eDBA by Craig S.
Mullins 28
The Classic Role of the DBA 28
The Trend of Storing Process With Data 30
Database Code Objects and e-Business 32
Database Code Object Programming Languages 34
The Duality of the DBA 35
The Role of the Procedural DBA 37
Synopsis 38
Chapter 5 - Building a Solid Information Architecture 39
iv
The Data Warehousing eBusiness DBA Handbook
How to Select the Optimal Information Exchange Architecture
by Joseph Hudicka 39
Introduction 39
The Main Variables to Ponder 40
Data Volume 40
Available System Resources 41
Transformation Requirements 41
Frequency 41
Optimal Architecture Components 42
Conclusion 42
Chapter 6 - Data 101 43
Getting Down to Data Basics by Craig S. Mullins 43
Data Modeling and Database Design 43
Physical Database Design 45
The DBA Management Discipline 46
The 17 Skills Required of a DBA 47
Meeting the Demand 51
Chapter 7 - Designing Efficient Databases 52
Design and the eDBA by Craig S. Mullins 52
Living at Web Speed 52
Database Design Steps 54
Database Design Traps 57
Taming the Hostile Database 59
Chapter 8 - The eBusiness Infrastructure 61
E-Business and Infrastructure by W. H. Inmon 61
Chapter 9 - Conforming to Your Corporate Structure 68
Integrating Data in the Web-Based E-Business Environment by
W. H. Inmon 68
Chapter 10 - Building Your Data Warehouse 77
The Issues of the E-Business Infrastructure by W. H. Inmon 77
Large Volumes of Data 79
Performance 83
Integration 85
Table of Contents
v
Addressing the Issues 87
Chapter 11 - The Importance of Data Quality Strategy 88
Develop a Data Quality Strategy Before Implementing a Data
Warehouse by Joseph Hudicka 88
Data Quality Problems in the Real World 88
Why Data Quality Problems Go Unresolved 89
Fraudulent Data Quality Problems 90
The Seriousness of Data Quality Problems 91
Data Collection 92
Solutions for Data Quality Issues 92
Option 1: Integrated Data Warehouse 92
Option 2: Value Rules 94
Option 3: Deferred Validation 94
Periodic sampling averts future disasters 94
Conclusion 96
Chapter 12 - Data Modeling and eBusiness 97
Data Modeling for the Data Warehouse by W. H. Inmon 97
"Just the Facts, Ma'am" 97
Modeling Atomic Data 98
Through Data Attributes, Many Classes of Subject Areas Are
Accumulated 100
Other Possibilities - Generic Data Models 103
Design Continuity from One Iteration of Development to the
Next 104
Chapter 13 - Don't Forget the Customer 105
Interacting with the Internet Viewer by W. H. Inmon 105
IN SUMMARY 113
Chapter 14 - Getting Smart 114
Elasticity and Pricing: Getting Smart by W. H. Inmon 114
Historically Speaking 114
At the Price Breaking Point 116
vi
The Data Warehousing eBusiness DBA Handbook
How Good Are the Numbers 117
How Elastic Is the Price 118
Conclusion 120
Chapter 15 - Tools of the Trade: Java 121
The eDBA and Java by Craig S. Mullins 121
What is Java? 121
Why is Java Important to an eDBA? 122
How can Java improve availability? 123
How Will Java Impact the Job of the eDBA? 124
Resistance is Futile 127
Conclusion 128
Chapter 16 - Tools of the Trade: XML 129
New Technologies of the eDBA: XML by Craig S. Mullins . 129
What is XML? 129
Some Skepticism 132
Integrating XML 133
Defining the Future Web 134
Chapter 17 - Multivalue Database Technology Pros and
Cons 136
MultiValue Lacks Value by Fabian Pascal 136
References 144
Chapter 18 - Securing your Data 146
Data Security Internals by Don Burleson 146
Traditional Oracle Security 147
Concerns About Role-based Security 150
Closing the Back Doors 151
Oracle Virtual Private Databases 152
Procedure Execution Security 158
Conclusion 160
Chapter 19 - Maintaining Efficiency 162
eDBA: Online Database Reorganization by Craig S. Mullins 162
Reorganizing Tablespaces 166
Table of Contents
vii
Online Reorganization 167
Synopsis 168
Chapter 20 - The Highly Available Database 170
The eDBA and Data Availability by Craig S. Mullins 170
The First Important Issue is Availability 171
What is Implied by e-vailability? 171
The Impact of Downtime on an e-business 175
Conclusion 176
Chapter 21 - eDatabase Recovery Strategy 177
The eDBA and Recovery by Craig S. Mullins 177
eDatabase Recovery Strategies 179
Recovery-To-Current 181
Point-in-Time Recovery 183
Transaction Recovery 184
Choosing the Optimum Recovery Strategy 188
Database Design 189
Reducing the Risk 189
Chapter 22 - Automating eDBA Tasks 191
Intelligent Automation of DBA Tasks by Craig S. Mullins 191
Duties of the DBA 192
A Lot of Effort 194
Intelligent Automation 195
Synopsis 196
Chapter 23 - Where to Turn for Help 197
Online Resources of the eDBA by Craig S. Mullins 197
Usenet Newsgroups 197
Mailing Lists 200
Websites and Portals 201
No eDBA Is an Island 203
viii
The Data Warehousing eBusiness DBA Handbook
Conventions Used in this Book
It is critical for any technical publication to follow rigorous
standards and employ consistent punctuation conventions to
make the text easy to read.
However, this is not an easy task. Within Oracle there are
many types of notation that can confuse a reader. Some Oracle
utilities such as STATSPACK and TKPROF are always spelled
in CAPITAL letters, while Oracle parameters and procedures
have varying naming conventions in the Oracle documentation.
It is also important to remember that many Oracle commands
are case sensitive, and are always left in their original executable
form, and never altered with italics or capitalization.
Hence, all Rampant TechPress books follow these conventions:
Parameters
- All Oracle parameters will be lowercase italics.
Exceptions to this rule are parameter arguments that are
commonly capitalized (KEEP pool, TKPROF), these will be
left in ALL CAPS.
Variables
– All PL/SQL program variables and arguments will
also remain in lowercase italics (dbms_job, dbms_utility).
Tables & dictionary objects
– All data dictionary objects are
referenced in lowercase italics (dba_indexes, v$sql). This
includes all v$ and x$ views (x$kcbcbh, v$parameter) and
dictionary views (dba_tables, user_indexes).
SQL
– All SQL is formatted for easy use in the code depot,
and all SQL is displayed in lowercase. The main SQL terms
(select, from, where, group by, order by, having) will always
appear on a separate line.
Conventions Used in this Book
ix
Programs & Products
– All products and programs that are
known to the author are capitalized according to the vendor
specifications (IBM, DBXray, etc). All names known by
Rampant TechPress to be trademark names appear in this
text as initial caps. References to UNIX are always made in
uppercase.
x
The Data Warehousing eBusiness DBA Handbook
About the Authors
Bill Inmon
is universally recognized as the "father of the data
warehouse." He has more than 26 years of database
technology management experience and data warehouse
design expertise, and has published 36 books and more than
350 articles in major computer journals. He is known
globally for his seminars on developing data warehouses and
has been a keynote speaker for many major computing
associations. Inmon has consulted with a large number of
Fortune 1000 clients, offering data warehouse design and
database management services. For more information, visit
www.BillInmon.com or call (303) 221-4000.
Joseph Hudicka is the founder of the Information Architecture
Team, an organization that specializes in data quality, data
migration, and ETL. Winner of the ODTUG Best Speaker
award for the Spring 1999 conference, Joseph is an
internationally recognized speaker at ODTUG, OOW,
IOUG-A, TDWI and many local user groups. Joseph
coauthored Oracle8 Design Using UML Object Modeling
for Osborne/McGraw-Hill & Oracle Press, and has also
written or contributed to several articles for publication in
DMReview, Intelligent Enterprise and The Data
Warehousing Institute (TDWI).
Craig S. Mullins is a director of technology planning for BMC
Software. He has over 15 years of experience dealing with
data and database technologies. He is the author of the book
DB2 Developer's Guide (now available in a fourth edition that
covers up to and includes the latest release of DB2 -Version
6) and is working on a book about database administration
practices (to be published this year by Addison Wesley).
About the Authors
xi
Craig can be reached via his Website at
www.craigsmullins.com or at
Fabian Pascal has a national and international reputation as an
independent technology analyst, consultant, author and
lecturer specializing in data management. He was affiliated
with Codd & Date and for 20 years held various analytical
and management positions in the private and public sectors,
has taught and lectured at the business and academic levels,
and advised vendor and user organizations on data
management technology, strategy and implementation.
Clients include IBM, Census Bureau, CIA, Apple, Borland,
Cognos, UCSF, and IRS. He is founder, editor and publisher
of DATABASE DEBUNKINGS
( a Web site dedicated to
dispelling persistent fallacies, flaws, myths and
misconceptions prevalent in the IT industry (Chris Date is a
senior contributor). Author of three books, he has published
extensively in most trade publications, including DM Review,
Database Programming and Design, DBMS, Byte, Infoworld and
Computerworld. He is author of the contrarian columns Against
the Grain, Setting Matters Straight, and for The Journal of
Conceptual Modeling. His third book, Practical Issues in Database
MANAGEMENT serves as text for his seminars.
xii
The Data Warehousing eBusiness DBA Handbook
Foreword
With the advent of cheap disk I/O subsystems, it is finally
possible for database professionals to have databases store
multiple billions and even multiple trillions of bytes of
information. As the size of these databases increases to
behemoth proportions, it is the challenge of the database
professionals to understand the correct techniques for loading,
maintaining, and extracting information from very large
database management systems. The advent of cheap disks has
also led to an explosion in business technology, where even the
most modest financial investment can bring forth an online
system with many billions of bytes. It is imperative that the
business manager understand how to manage and control large
volumes of information while at the same time provide the
consumer with high-volume throughput and sub-second
response time
This book provides you with insight into how to build the
foundation of your eBusiness application. You’ll learn the
importance of the Data Warehouse in your daily operations.
You’ll gain lots of insight into how to properly design and build
your information architecture to handle the rapid growth that
eCommerce business sees today. Once your system is up and
running, it must be maintained. There is information in this
text that goes through how to maintain online data systems to
reduce downtime. Keeping your online data secure is another
big issue with online business. To wrap things up, you’ll get
links to some of the best online resources on Data
Warehousing.
The purpose of this book is to give you significant insights into
how you can manage and control large volumes of data. As the
Foreword
xiii
technology has expanded to support terabyte data capacity, the
challenge to the database professionals is to understand
effective techniques for the loading and maintaining of these
very large database systems. This book brings together some of
the world's foremost authors on data warehousing in order to
provide you with the insights that you need to be successful in
your data warehousing endeavors.
xiv
The Data Warehousing eBusiness DBA Handbook
1
Data Warehousing
and eBusiness
CHAPTER
Making the Most of E-business
Everywhere you look today, you see e-business. In the trade
journals. On TV. In the Wall Street Journal. Everywhere. And
the message is that if your business is not e-business enabled,
that you will be behind the curve.
So what is all the fuss about? Behind the corporate push to get
into e-business is a Web site. Or multiple Web sites. The Web
site allows your corporation to have a reach into the
marketplace that is direct and far reaching. Businesses that
would never have entertained entry to foreign marketplaces and
other marketplaces that are hard to access suddenly have easy
and cheap presence. In a word, e-business opens up
possibilities that previously were impractical or even
impossible.
So the secret to e-business is a Web site. Right? Well almost.
Indeed, a Web site is a wonderful delivery mechanism. The
Web site allows you to go where you might not have ever been
able to go before. But after all is said and done, a Web site is
merely a delivery mechanism. To be effective, the delivery
mechanism must be allied with application of strong business
propositions. There is a way of expressing this opportunity =
delivery mechanism + business proposition.
Making the Most of E-business
1
Figure 1:
The web site is at the heart of e-Business
To illustrate the limitations of a Web site, consider the personal
Web sites that many people have created. If there were any
inherent business advantage to having a Web site, then these
personal sites would be achieving business results for their
owners. But no one thinks that just putting up a Web site
produces results. It is what you do with the Web site that
counts.
To exploit the delivery mechanism that is the Web
environment, applications are necessary. There are many kinds
of applications that can be adapted to the Web environment.
But the most potent, most promising applications are a class
that are called Customer Relationship Management (CRM)
applications. CRM applications have the capability of
producing very important business results. Executed properly,
CRM applications:
protect market share
gain new market share
increase revenues
increase profits
2
The Data Warehousing eBusiness DBA Handbook
And there's not a business around that doesn't want to do these
things.
So what kind of applications are we talking about here? There
are many different flavors. Typical CRM applications include:
yield management
customer retention
customer segmentation
cross selling
up selling
household selling
affinity analysis
market basket analysis
fraud detection
credit scoring, and so forth
In short, there are many different ways that applications can be
created to absolutely maximize the effectiveness of the Web.
Stated differently, without these applications, the Web
environment is just another Web site.
And there are other related non-CRM applications that can
improve the bottom line of business as well. These applications
include:
quality control
profitability analysis
destination analysis (for airlines)
purchasing consolidation, and the like
Making the Most of E-business
3
In short, once the Web is enabled by supporting applications,
then very real business advantage occurs.
But applications do not just happen by themselves.
Applications such as CRM and others are built on a foundation
of data called a data warehouse. The data warehouse is at the
center of an infrastructure called the "corporate information
factory." Figure 2 shows the corporate information factory and
the Web environment.
Figure 2:
Sitting behind the web site is the infrastructure called the
"corporate information factory"
Figure 2 shows that the Web environment serves as a conduit
into the corporate information factory. The corporate
information factory provides a variety of important functions
for the Web environment:
4
The Data Warehousing eBusiness DBA Handbook
the corporate information factory enables the Web
environment to gather and manage an unlimited amount of
data
the corporate information factory creates and environment
where sweeping business patterns can be detected and
analyzed
the corporate information factory provides a place where
Web-based data can be integrated with other corporate data
the corporate information factory makes edited and
integrated data quickly available to the Web environment,
and so forth
In a word, the corporate information factory provides the
background infrastructure that turns the Web from a delivery
mechanism into a truly powerful tool. The different
components of the corporate information factory are:
the data warehouse
the corporate ODS
data marts
the exploration warehouse
alternative/near-line storage
The heart of the corporate information factory is the data
warehouse. The data warehouse is a structure that contains:
detailed, granular data
integrated data
historical data
corporate data
Making the Most of E-business
5
A convenient way to think of the data warehouse is as a
structure that contain very fine grains of sand. Different
applications take those grains of sand and reshape them into
the form and structure that is most familiar to the organization.
One of the issues that frequently arises with applications for
the Web is whether it is necessary to have a data warehouse in
support of the applications. Strictly speaking, it is not necessary
to have a data warehouse in support of the applications that
run on the Web. Figure 3 shows that different applications
have been built from the legacy foundation.
Figure 3:
Building applications without a data warehouse
6
The Data Warehousing eBusiness DBA Handbook
In Figure 3, multiple applications have been built from the
same supporting applications. Looking at figure 3, it becomes
clear that the same processing accessing data, gathering data,
editing data, cleansing data, merging data and integrating data
are done for every application. Almost all of the processing
shown is redundant. There is no need for every application to
repeat what every other application has done. Figure 4 shows
that by building a data warehouse, the repetitive activities are
done just once.
Figure 3:
Building a data warehouse for the different applications
Making the Most of E-business
7
In figure 4, the infrastructure activities of accessing data,
gathering data, editing data, cleansing data, merging data and
integrating data are done once. The savings are obvious. But
there are some other powerful reasons why building a data
warehouse makes sense:
when it comes time to build a new application, with a data
warehouse in place the application can be constructed
quickly; with no data warehouse in place, the infrastructure
has to be built again
if there is a discrepancy in values, with a data warehouse
those values can be resolved easily and quickly
the resources required for access of legacy data are minimal
when there is a data warehouse; when there is no data
warehouse, the resources required for the access of legacy
data grow with each new application, and so forth
In short, when an organization takes a long-term perspective,
the data warehouse at the center of the corporate information
factory is the only way to fly.
It is intuitively obvious that a foundation of integrated
historical granular data is useful for competitive advantage. But
one step beyond intuition, the question must be asked exactly
how can integrated historical data be turned into competitive
advantage. It is the purpose of the articles to follow to explain
how integrated historical data can be turned into competitive
advantage and how that competitive advantage can be delivered
through the Web.
8
The Data Warehousing eBusiness DBA Handbook