Tải bản đầy đủ (.pdf) (53 trang)

Data Warehousing Fundamentals A Comprehensive Guide for IT Professionals phần 1 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (416.67 KB, 53 trang )

DATA WAREHOUSING
FUNDAMENTALS
Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. Paulraj Ponniah
Copyright © 2001 John Wiley & Sons, Inc.
ISBNs: 0-471-41254-6 (Hardback); 0-471-22162-7 (Electronic)
DATA WAREHOUSING
FUNDAMENTALS
A Comprehensive Guide for
IT Professionals
PAULRAJ PONNIAH
A Wiley-Interscience Publication
JOHN WILEY & SONS, INC.
New York / Chichester / Weinheim / Brisbane / Singapore / Toronto
Designations used by companies to distinguish their products are often claimed as trademarks. In all instances
where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL
LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding
trademarks and registration.
Copyright © 2001 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic
or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under
Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to
the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue,
New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ @ WILEY.COM.
This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold with the understanding that the publisher is not engaged in
rendering professional services. If professional advice or other expert assistance is required, the
services of a competent professional person should be sought.
ISBN 0-471-22162-7
This title is also available in print as ISBN 0-471-41254-6.
For more information about Wiley products, visit our web site at www.Wiley.com.
To


Vimala, my loving wife
and to
Joseph, David, and Shobi,
my dear children
CONTENTS
Foreword xxi
Preface xxiii
Part 1 OVERVIEW AND CONCEPTS
1 The Compelling Need for Data Warehousing 1
1 Chapter Objectives 1
1 Escalating Need for Strategic Information 2
1 The Information Crisis 3
1 Technology Trends 4
1 Opportunities and Risks 5
1 Failures of Past Decision-Support Systems 7
1 History of Decision-Support Systems 8
1 Inability to Provide Information 9
1 Operational Versus Decision-Support Systems 9
1 Making the Wheels of Business Turn 10
1 Watching the Wheels of Business Turn 10
1 Different Scope, Different Purposes 10
1 Data Warehousing—The Only Viable Solution 12
1 A New Type of System Environment 12
1 Processing Requirements in the New Environment 12
1 Business Intelligence at the Data Warehouse 12
1 Data Warehouse Defined 13
1 A Simple Concept for Information Delivery 14
vii
1 An Environment, Not a Product 14
1 A Blend of Many Technologies 14

1 Chapter Summary 15
1 Review Questions 16
1 Exercises 16
2 Data Warehouse: The Building Blocks 19
1 Chapter Objectives 19
1 Defining Features 20
1 Subject-Oriented Data 20
1 Integrated Data 21
1 Time-Variant Data 22
1 Nonvolatile Data 23
1 Data Granularity 23
1 Data Warehouses and Data Marts 24
1 How are They Different? 251
1 Top-Down Versus Bottom-Up Approach 26
1 A Practical Approach 27
1 Overview of the Components 28
1 Source Data Component 28
1 Data Staging Component 31
1 Data Storage Component 33
1 Information Delivery Component 34
1 Metadata Component 35
1 Management and Control Component 35
1 Metadata in the Data Warehouse 35
1 Types of Metadata 36
1 Special Significance 36
1 Chapter Summary 36
1 Review Questions 37
1 Exercises 37
3 Trends in Data Warehousing 39
1 Chapter Objectives 39

1 Continued Growth in Data Warehousing 40
1 Data Warehousing is Becoming Mainstream 40
1 Data Warehouse Expansion 41
1 Vendor Solutions and Products 42
1 Significant Trends 43
1 Multiple Data Types 44
1 Data Visualization 46
1 Parallel Processing 48
viii
CONTENTS
1 Query Tools 49
1 Browser Tools 50
1 Data Fusion 50
1 Multidimensional Analysis 51
1 Agent Technology 51
1 Syndicated Data 52
1 Data Warehousing and ERP 52
1 Data Warehousing and KM 53
1 Data Warehousing and CRM 54
1 Active Data Warehousing 56
1 Emergence of Standards 56
1 Metadata 57
1 OLAP 57
1 Web-Enabled Data Warehouse 58
1 The Warehouse to the Web 59
1 The Web to the Warehouse 59
1 The Web-Enabled Configuration 60
1 Chapter Summary 61
1 Review Questions 61
1 Exercises 62

Part 2 PLANNING AND REQUIREMENTS
4 Planning and Project Management 63
1 Chapter Objectives 63
1 Planning Your Data Warehouse 64
1 Key Issues 64
1 Business Requirements, Not Technology 66
1 Top Management Support 67
1 Justifying Your Data Warehouse 67
1 The Overall Plan 68
1 The Data Warehouse Project 69
1 How is it Different? 70
1 Assessment of Readiness 71
1 The Life-Cycle Approach 71
1 The Development Phases 73
1 The Project Team 74
1 Organizing the Project Team 75
1 Roles and Responsibilities 75
1 Skills and Experience Levels 77
1 User Participation 78
1 Project Management Considerations 80
1 Guiding Principles 81
CONTENTS
ix
1 Warning Signs 82
1 Success Factors 82
1 Anatomy of a Successful Project 83
1 Adopt a Practical Approach 84
1 Chapter Summary 86
1 Review Questions 86
1 Exercises 87

5 Defining the Business Requirements 89
1 Chapter Objectives 89
1 Dimensional Analysis 90
1 Usage of Information Unpredictable 90
1 Dimensional Nature of Business Data 90
1 Examples of Business Dimensions 92
1 Information Packages—A New Concept 93
1 Requirements Not Fully Determinate 93
1 Business Dimensions 95
1 Dimension Hierarchies/Categories 95
1 Key Business Metrics or Facts 96
1 Requirements Gathering Methods 97
1 Interview Techniques 99
1 Adapting the JAD Methodology 102
1 Review of Existing Documentation 103
1 Requirements Definition: Scope and Content 104
1 Data Sources 105
1 Data Transformation 105
1 Data Storage 105
1 Information Delivery 105
1 Information Package Diagrams 106
1 Requirements Definition Document Outline 106
1 Chapter Summary 106
1 Review Questions 107
1 Exercises 107
6 Requirements as the Driving Force for Data Warehousing 109
1 Chapter Objectives 109
1 Data Design 110
1 Structure for Business Dimensions 112
1 Structure for Key Measurements 112

1 Levels of Detail 113
1 The Architectural Plan 113
1 Composition of the Components 114
x
CONTENTS
1 Special Considerations 115
1 Tools and Products 118
1 Data Storage Specifications 119
1 DBMS Selection 120
1 Storage Sizing 120
1 Information Delivery Strategy 121
1 Queries and Reports 122
1 Types of Analysis 123
1 Information Distribution 1231
1 Decision Support Applications 123
1 Growth and Expansion 123
1 Chapter Summary 124
1 Review Questions 124
1 Exercises 125
Part 3 ARCHITECTURE AND INFRASTRUCTURE
7 The Architectural Components 127
1 Chapter Objectives 127
1 Understanding Data Warehouse Architecture 127
1 Architecture: Definitions 127
1 Architecture in Three Major Areas 128
1 Distinguishing Characteristics 129
1 Different Objectives and Scope 130
1 Data Content 130
1 Complex Analysis and Quick Response 131
1 Flexible and Dynamic 131

1 Metadata-driven 132
1 Architectural Framework 132
1 Architecture Supporting Flow of Data 132
1 The Management and Control Module 133
1 Technical Architecture 134
1 Data Acquisition 135
1 Data Storage 138
1 Information Delivery 140
1 Chapter Summary 142
1 Review Questions 142
1 Exercises 143
8 Infrastructure as the Foundation for Data Warehousing 145
1 Chapter Objectives 145
1 Infrastructure Supporting Architecture 145
CONTENTS
xi
1 Operational Infrastructure 147
1 Physical Infrastructure 147
1 Hardware and Operating Systems 148
1 Platform Options 150
1 Server Hardware 158
1 Database Software 164
1 Parallel Processing Options 164
1 Selection of the DBMS 166
1 Collection of Tools 167
1 Architecture First, Then Tools 168
1 Data Modeling 169
1 Data Extraction 169
1 Data Transformation 169
1 Data Loading 169

1 Data Quality 169
1 Queries and Reports 170
1 Online Analytical Processing (OLAP) 170
1 Alert Systems 170
1 Middleware and Connectivity 170
1 Data Warehouse Management 170
1 Chapter Summary 170
1 Review Questions 171
1 Exercises 171
9 The Significant Role of Metadata 173
1 Chapter Objectives 173
1 Why Metadata is Important 173
1 A Critical Need in the Data Warehouse 175
1 Why Metadata is Vital for End-Users 177
1 Why Metadata is Essential for IT 179
1 Automation of Warehousing Tasks 181
1 Establishing the Context of Information 183
1 Metadata Types by Functional Areas 183
1 Data Acquisition 184
1 Data Storage 186
1 Information Delivery 186
1 Business Metadata 187
1 Content Overview 188
1 Examples of Business Metadata 188
1 Content Highlights 189
1 Who Benefits? 190
1 Technical Metadata 190
xii
CONTENTS
12 Content Overview 190

12Examples of Technical Metadata 191
12 Content Highlights 192
12 Who Benefits? 192
12 How to Provide Metadata 193
12 Metadata Requirements 193
12 Sources of Metadata 194
12 Challenges for Metadata Management 196
12 Metadata Repository 196
12Metadata Integration and Standards 198
12 Implementation Options 199
12Chapter Summary 200
12Review Questions 201
12Exercises 201
Part 4 DATA DESIGN AND DATA PREPARATION
10 Principles of Dimensional Modeling 203
11Chapter Objectives 203
11From Requirements to Data Design 203
12 Design Decisions 204
12 Dimensional Modeling Basics 204
12 E-R Modeling Versus Dimensional Modeling 209
12 Use of CASE Tools 209
11The STAR Schema 210
12 Review of a Simple STAR Schema 210
12 Inside a Dimension Table 212
12 Inside the Fact Table 214
12 The Factless Fact Table 216
12 Data Granularity 217
11STAR Schema Keys 218
12 Primary Keys 218
12 Surrogate Keys 219

12 Foreign Keys 219
11Advantages of the STAR Schema 220
12 Easy for Users to Understand 220
12 Optimizes Navigation 221
12 Most Suitable for Query Processing 222
12 STARjoin and STARindex 223
11Chapter Summary 223
11Review Questions 224
11Exercises 224
CONTENTS
xiii
11 Dimensional Modeling: Advanced Topics 225
11Chapter Objectives 225
11Updates to the Dimension Tables 226
12 Slowly Changing Dimensions 226
12 Type 1 Changes: Correction of Errors 227
12 Type 2 Changes: Preservation of History 228
12 Type 3 Changes: Tentative Soft Revisions 230
11Miscellaneous Dimensions 231
12 Large Dimensions 231
12 Rapidly Changing Dimensions 233
12 Junk Dimensions 235
11The Snowflake Schema 235
12 Options to Normalize 235
12 Advantages and Disadvantages 238
12 When to Snowflake 238
11Aggregate Fact Tables 239
12 Fact Table Sizes 241
12 Need for Aggregates 242
12 Aggregating Fact Tables 243

12 Aggregation Options 247
11Families of STARS 249
12 Snapshot and Transaction Tables 250
12 Core and Custom Tables 251
12 Supporting Enterprise Value Chain or Value Circle 251
12 Conforming Dimensions 253
12 Standardizing Facts 254
12 Summary of Family of STARS 254
11Chapter Summary 255
11Review Questions 255
11Exercises 256
12 Data Extraction, Transformation, and Loading 257
11Chapter Objectives 257
11ETL Overview 258
12 Most Important and Most Challenging 259
12 Time-consuming and Arduous 260
12 ETL Requirements and Steps 260
12 Key Factors 261
11Data Extraction 262
12 Source Identification 263
12 Data Extraction Techniques 263
12 Evaluation of the Techniques 270
xiv
CONTENTS
11Data Transformation 271
12 Data Transformation: Basic Tasks 272
12 Major Transformation Types 273
12 Data Integration and Consolidation 275
12 Transformation for Dimension Attributes 277
12 How to Implement Transformation 277

11Data Loading 279
12 Applying Data: Techniques and Processes 280
12 Data Refresh Versus Update 282
12 Procedure for Dimension Tables 283
12 Fact Tables: History and Incremental Loads 284
12ETL Summary 285
12 ETL Tool Options 285
12 Reemphasizing ETL Metadata 286
12 ETL Summary and Approach 287
11Chapter Summary 288
11Review Questions 288
11Exercises 289
13 Data Quality: A Key to Success 291
11Chapter Objectives 291
11Why is Data Quality Critical? 292
12 What is Data Quality? 292
12 Benefits of Improved Data Quality 295
12 Types of Data Quality Problems 296
11Data Quality Challenges 299
12 Sources of Data Pollution 299
12 Validation of Names and Addresses 301
12 Costs of Poor Data Quality 302
11Data Quality Tools 303
12 Categories of Data Cleansing Tools 303
12 Error Discovery Features 303
12 Data Correction Features 303
12 The DBMS for Quality Control 304
11Data Quality Initiative 304
12 Data Cleansing Decisions 305
12 Who Should be Responsible? 307

12 The Purification Process 309
12 Practical Tips on Data Quality 311
11Chapter Summary 311
11Review Questions 312
11Exercises 312
CONTENTS
xv
Part 5 INFORMATION ACCESS AND DELIVERY
14 Matching Information to the Classes of Users 315
11Chapter Objectives 315
11Information from the Data Warehouse 316
12 Data Warehouse Versus Operational Systems 316
12 Information Potential 318
12 User-Information Interface 321
12 Industry Applications 323
11Who Will Use the Information? 323
12 Classes of Users 323
12 What They Need 326
12 How to Provide Information 329
11Information Delivery 329
12 Queries 331
12 Reports 332
12 Analysis 333
12 Applications 334
11Information Delivery Tools 335
12 The Desktop Environment 335
12 Methodology for Tool Selection 335
12 Tool Selection Criteria 338
12 Information Delivery Framework 340
11Chapter Summary 341

11Review Questions 341
11Exercises 341
15 OLAP in the Data Warehouse 343
11Chapter Objectives 343
11Demand for Online Analytical Processing 344
12 Need for Multidimensional Analysis 344
12 Fast Access and Powerful Calculations 345
12 Limitations of Other Analysis Methods 347
12 OLAP is the Answer 349
12 OLAP Definitions and Rules 349
12 OLAP Characteristics 352
11Major Features and Functions 353
12 General Features 353
12 Dimensional Analysis 353
12 What are Hypercubes? 357
12 Drill-Down and Roll-Up 360
12 Slice-and-Dice or Rotation 362
xvi
CONTENTS
12 Uses and Benefits 363
11OLAP Models 363
12 Overview of Variations 364
12 The MOLAP Model 365
12 The ROLAP Model 366
12 ROLAP Versus MOLAP 367
11OLAP Implementation Considerations 368
12 Data Design and Preparation 368
12 Administration and Performance 370
12 OLAP Platforms 372
12 OLAP Tools and Products 373

12 Implementation Steps 374
11Chapter Summary 374
11Review Questions 374
11Exercises 375
16 Data Warehousing and the Web 377
11Chapter Objectives 377
11Web-Enabled Data Warehouse 378
12 Why the Web? 378
12 Convergence of Technologies 380
12 Adapting the Data Warehouse for the Web 381
12 The Web as a Data Source 382
11Web-Based Information Delivery 383
12 Expanded Usage 383
12 New Information Strategies 385
12 Browser Technology for the Data Warehouse 387
12 Security Issues 389
11OLAP and the Web 389
12 Enterprise OLAP 389
12 Web-OLAP Approaches 390
12 OLAP Engine Design 390
11Building a Web-Enabled Data Warehouse 391
12 Nature of the Data Webhouse 391
12 Implementation Considerations 393
12 Putting the Pieces Together 394
12 Web Processing Model 394
11Chapter Summary 396
11Review Questions 396
11Exercises 396
CONTENTS
xvii

17 Data Mining Basics 399
11Chapter Objectives 399
11What is Data Mining? 400
12 Data Mining Defined 401
12 The Knowledge Discovery Process 402
12 OLAP Versus Data Mining 404
12 Data Mining and the Data Warehouse 406
11Major Data Mining Techniques 408
12 Cluster Detection 409
12 Decision Trees 411
12 Memory-Based Reasoning 413
12 Link Analysis 415
12 Neural Networks 417
12 Genetic Algorithms 418
12 Moving into Data Mining 419
11Data Mining Applications 422
12 Benefits of Data Mining 423
12 Applications in Retail Industry 424
12 Applications in Telecommunications Industry 425
12 Applications in Banking and Finance 426
11Chapter Summary 426
11Review Questions 426
11Exercises 427
Part 6 IMPLEMENTATION AND MAINTENANCE
18 The Physical Design Process 429
11Chapter Objectives 429
11Physical Design Steps 430
12 Develop Standards 430
12 Create Aggregates Plan 431
12 Determine the Data Partitioning Scheme 431

12 Establish Clustering Options 432
12 Prepare an Indexing Strategy 432
12 Assign Storage Structures 432
12 Complete Physical Model 433
11Physical Design Considerations 433
12 Physical Design Objectives 433
12 From Logical Model to Physical Model 434
12 Physical Model Components 435
12 Significance of Standards 436
11Physical Storage 438
xviii
CONTENTS
12 Storage Area Data Structures 439
12 Optimizing Storage 440
12 Using RAID Technology 442
12 Estimating Storage Sizes 442
11Indexing the Data Warehouse 443
12 Indexing Overview 443
12 B-Tree Index 445
12 Bitmapped Index 446
12 Clustered Indexes 448
12 Indexing the Fact Table 448
12 Indexing the Dimension Tables 449
11Performance Enhancement Techniques 449
12 Data Partitioning 449
12 Data Clustering 450
12 Parallel Processing 450
12 Summary Levels 451
12 Referential Integrity Checks 451
12 Initialization Parameters 451

12 Data Arrays 452
11Chapter Summary 452
11Review Questions 452
11Exercises 453
19 Data Warehouse Deployment 455
11Chapter Objectives 455
11Major Deployment Activities 456
12 Complete User Acceptance 456
12 Perform Initial Loads 457
12 Get User Desktops Ready 458
12 Complete Initial User Training 459
12 Institute Initial User Support 460
12 Deploy in Stages 460
11Considerations for a Pilot 462
12 When Is a Pilot Data Mart Useful? 462
12 Types of Pilot Projects 463
12 Choosing the Pilot 465
12 Expanding and Integrating the Pilot 466
11Security 467
12 Security Policy 467
12 Managing User Privileges 468
12 Password Considerations 469
12 Security Tools 469
CONTENTS
xix
11Backup and Recovery 470
12 Why Back Up the Data Warehouse? 470
12 Backup Strategy 471
12 Setting Up a Practical Schedule 472
12 Recovery 472

11Chapter Summary 473
11Review Questions 474
11Exercises 474
20 Growth and Maintenance 477
11Chapter Objectives 477
11Monitoring the Data Warehouse 478
12 Collection of Statistics 478
12 Using Statistics for Growth Planning 480
12 Using Statistics for Fine-Tuning 480
12 Publishing Trends for Users 481
11User Training and Support 481
12 User Training Content 482
12 Preparing the Training Program 482
12 Delivering the Training Program 484
12 User Support 485
11Managing the Data Warehouse 487
12 Platform Upgrades 487
12 Managing Data Growth 488
12 Storage Management 488
12 ETL Management 489
12 Data Model Revisions 489
12 Information Delivery Enhancements 489
12 Ongoing Fine-Tuning 490
11Chapter Summary 490
11Review Questions 491
11Exercises 491
Appendix A. Project Life Cycle Steps and Checklists 493
Appendix B. Critical Factors for Success 497
Appendix C. Guidelines for Evaluating Vendor Solutions 499
References 501

Glossary 503
Index 511
xx
CONTENTS
FOREWORD
I am delighted to share my thoughts with information technology professionals about my
faculty colleague Paulraj Ponniah’s textbook Data Warehousing Fundamentals. In the
spring of 1998, Raritan Valley Community College decided to offer a course on data
warehousing. This was mainly through the initiative of Dr. Ponniah, who had been teach-
ing our database design and development course for several years. It was very difficult to
find a good textbook for a college course on data warehousing. We had to settle for a book
that was not quite suitable. In order to make the course effective, Paul had to supplement
the book with his own data warehousing seminar materials. Our students, primarily IT
professionals from local industries, received the course very well. Now this magnificent
textbook on data warehousing comes to you through the foresight and diligent work of Dr.
Ponniah, along with the insightful support of the publishers, John Wiley and Sons.
This book has numerous features that make it a winner:
ț The order of topics is very logical.
ț The choice of topics is quite appropriate for a comprehensive introductory book.
The coverage of topics is also very well balanced.
ț The subject matter is logically structured, with chapters covering essential compo-
nents of the data warehousing field. The sequence of topics is well planned to pro-
vide a seamless transition from design to implementation.
ț Within each chapter, the continuity of topics is excellent.
ț None of the topics included in the textbook is superfluous to the basic objectives.
ț The material included is technically correct and up-to-date. The figures appropriate-
ly enhance and amplify the topics.
ț Ample review questions and exercises can be found at the end of each chapter. This
is something lacking in most books on data warehousing. These review questions
and exercises are pedagogically sound. They are designed to test the knowledge, not

the ignorance, of the reader.
xxi
Dr. Ponniah’s writing style is clear and concise. Because of the simplicity and com-
pleteness of this book, I believe it will find a definite market niche, particularly among
college students, not-so-technically savvy IT people, and data warehousing mavens.
In spite of a plethora of books on data warehousing by luminaries such as Kimball, In-
mon, Barquin, and Singh, this book fulfills a special purpose, and information technology
professionals will definitely benefit from reading it. In addition, the book should be well
received by college professors for use by students in their data warehousing courses. To
put it succinctly, this book fills a void in the midst of plenty.
In summary, Dr. Ponniah has produced a winner for both students and experienced IT
professionals. As someone who has been in IT education for many years, I certainly rec-
ommend this book to college professors and seminar leaders for their data warehousing
courses.
P
RATAP
P. R
EDDY
, Ph.D.
Professor and Chair of CIS Department
Raritan Valley Community College
North Branch, New Jersey
xxii
FOREWORD
PREFACE
THIS BOOK IS FOR YOU
Are you an information technology professional watching, with great interest, the massive
unfolding of the data warehouse movement? Are you contemplating a move into this new
area of opportunity? Are you a systems analyst, programmer, data analyst, database ad-
ministrator, project leader, or software engineer eager to grasp the fundamentals of data

warehousing? Do you wonder how many different books you may have to read to learn the
basics? Are you lost in the maze of the literature and products on the subject? Do you
wish for a single publication on data warehousing, clearly and specifically designed for IT
professionals? Do you need a textbook that helps you learn the fundamentals in sufficient
depth—not more, not less? If you answered “yes” to any of the above, this book is written
specially for you.
This is the one definitive book on data warehousing clearly intended for IT profession-
als. The organization and presentation of the book are specially tuned for IT professionals.
This book does not presume to target anyone and everyone remotely interested in the sub-
ject for some reason or another, but is written to address the specific needs of IT profes-
sionals like you. It does not tend to emphasize certain aspects and neglect other critical
ones. The book takes you over the entire landscape of data warehousing.
How can this book be exactly suitable for IT professionals? As a veteran IT profession-
al with wide and intensive industry experience, as a successful database and data ware-
housing consultant for many years, and as one who teaches data warehousing fundamen-
tals in the college classroom and in public seminars, I have come to appreciate the precise
needs of IT professionals, and in every chapter I have incorporated these requirements of
the IT community.
xxiii
THE SCENARIO
Why are companies rushing into data warehousing? Why is there a tremendous surge in
interest? Data warehousing is no longer a purely novel idea just for research and experi-
mentation. It has become a mainstream phenomenon. True, the data warehouse is not in
every doctor’s office yet, but neither is it confined to only high-end businesses. More than
half of all U.S. companies and a large percentage of worldwide businesses have made a
commitment to data warehousing.
In every industry across the board, from retail chain stores to financial institutions,
from manufacturing enterprises to government departments, and from airline companies
to utility businesses, data warehousing is revolutionizing the way people perform business
analysis and make strategic decisions. Every company that has a data warehouse is realiz-

ing the enormous benefits translated into positive results at the bottom line. These compa-
nies, now incorporating Web-based technologies, are enhancing the potential for greater
and easier delivery of vital information.
Over the past five years, hundreds of vendors have flooded the market with numerous
data warehousing products. Vendor solutions and products run the gamut of data ware-
housing—data modeling, data acquisition, data quality, data analysis, metadata, and so
on. The market is already large and continues to grow.
CHANGED ROLE OF IT
In this scenario, information technology departments of all progressive companies per-
ceive a radical change in their roles. IT is no longer required to create every report and
present every screen for providing information to the end-users. IT is now charged with
the building of information delivery systems and letting the end-users themselves retrieve
information in innovative ways for analysis and decision making. Data warehousing is
proving to be just that type of successful information delivery system.
IT professionals responsible for building data warehouses need to revise their mindsets
about building applications. They have to understand that a data warehouse is not a one-
size-fits-all proposition; they must get a clear understanding of the extraction of data from
source systems, data transformations, data staging, data warehouse architecture, infra-
structure, and the various methods of information delivery.
In short, IT professionals, like you, must get a strong grip on the fundamentals of data
warehousing.
WHAT THIS BOOK CAN DO FOR YOU
The book is comprehensive and detailed. You will be able to study every significant topic
in planning, requirements, architecture, infrastructure, design, data preparation, informa-
tion delivery, deployment, and maintenance. It is specially designed for IT professionals;
you will be able to follow the presentation easily because it is built upon the foundation of
your background as an IT professional, your knowledge, and the technical terminology fa-
miliar to you. It is organized logically, beginning with an overview of concepts, moving
on to planning and requirements, then to architecture and infrastructure, on to data design,
then to information delivery, and concluding with deployment and maintenance. This pro-

xxiv
PREFACE
gression is typical of what you are most familiar with in your experience and day-to-day
work.
The book provides an interactive learning experience. It is not a one-way lecture. You
participate through the review questions and exercises at the end of each chapter. For each
chapter, the objectives set the theme and the summary provides a list of the topics cov-
ered. You can relate each concept and technique to the data warehousing industry and
marketplace. You will notice a substantial number of industry examples. Although intend-
ed as a first course on fundamentals, this book provides sufficient coverage of each topic
so that you can comfortably proceed to the next step of specialization for specific roles in
a data warehouse project.
Featuring all the significant topics in appropriate measure, this book is eminently suit-
able as a textbook for serious self-study, a college course, or a seminar on the essentials. It
provides an opportunity for you to become a data warehouse expert.
I acknowledge my indebtedness to the authors listed in the reference section at the end
of the book. Their insights and observations have helped me cover adequately the topics. I
must also express my appreciation to my students and professional colleagues. Our inter-
actions have enabled me to shape this textbook according to the needs of IT professionals.
P
AULRAJ
P
ONNIAH
, Ph.D.
Edison, New Jersey
June 2001
PREFACE
xxv
DATA WAREHOUSING
FUNDAMENTALS

CHAPTER 1
THE COMPELLING NEED
FOR DATA WAREHOUSING
CHAPTER OBJECTIVES
ț Understand the desperate need for strategic information
ț Recognize the information crisis at every enterprise
ț Distinguish between operational and informational systems
ț Learn why all past attempts to provide strategic information failed
ț Clearly see why data warehousing is the viable solution
As an information technology professional, you have worked on computer applications
as an analyst, programmer, designer, developer, database administrator, or project manag-
er. You have been involved in the design, implementation, and maintenance of systems
that support day-to-day business operations. Depending on the industries you have
worked in, you must have been involved in applications such as order processing, general
ledger, inventory, in-patient billing, checking accounts, insurance claims, and so on.
These applications are important systems that run businesses. They process orders,
maintain inventory, keep the accounting books, service the clients, receive payments, and
process claims. Without these computer systems, no modern business can survive. Com-
panies started building and using these systems in the 1960s and have become completely
dependent on them. As an enterprise grows larger, hundreds of computer applications are
needed to support the various business processes. These applications are effective in what
they are designed to do. They gather, store, and process all the data needed to successfully
perform the daily operations. They provide online information and produce a variety of
reports to monitor and run the business.
In the 1990s, as businesses grew more complex, corporations spread globally, and
competition became fiercer, business executives became desperate for information to stay
competitive and improve the bottom line. The operational computer systems did provide
information to run the day-to-day operations, but what the executives needed were differ-
ent kinds of information that could be readily used to make strategic decisions. They
1

Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. Paulraj Ponniah
Copyright © 2001 John Wiley & Sons, Inc.
ISBNs: 0-471-41254-6 (Hardback); 0-471-22162-7 (Electronic)

×