Tải bản đầy đủ (.pdf) (34 trang)

Microsoft Data Mining integrated business intelligence for e commerc and knowledge phần 1 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (312.54 KB, 34 trang )

Microsoft
®
Data Mining
Related
Titles From
Rhonda Delmater and Monte Hancock, Data Mining Explained:
A Manager’s Guide to Customer-Centric Business Intelligence,
ISBN 1-55558-231-1, 352pp, 2001
Thomas C. Redman, Data Quality: The Field Guide,
ISBN 1-55558-251-6, 240pp, 2001
Jesus Mena, Data Mining Your Website,
ISBN 1-55558-222-2, 384pp, 1999
Lilian Hobbs and Susan Hillson, Oracle8i Data Warehousing,
ISBN 1-55558-205-2, 400pp, 1999
Lilian Hobbs, Oracle8 on Windows NT, ISBN 1-55558-190-0, 384pp, 1998
Tony Redmond, Microsoft
®
Exchange Server for Windows 2000: Planning,
Design, and Implementation, ISBN 1-55558-224-9, 1072pp, 2000
Jerry Cochran, Mission-Critical Microsoft
®
Exchange 2000:
Building Highly Available Messaging and Knowledge Management Systems,
ISBN 1-55558-233-8, 352pp, 2000
For more information or to order these and other Digital Press
titles please visit our website at www.bhusa.com/digitalpress!
At www.bhusa.com/digitalpress you can:
• Join the Digital Press Email Service and have news about
our books delivered right to your desktop
• Read the latest news on titles


• Sample chapters on featured titles for free
• Question our expert authors and editors
• Download free software to accompany select texts
Microsoft
®
Data Mining
Integrated Business Intelligence for e-Commerce
and Knowledge Management
Barry de Ville
Boston • Oxford • Auckland • Johannesburg • Melbourne • New Delhi
Copyright © 2001 Butterworth–Heinemann
A member of the Reed Elsevier group
All rights reserved.
Digital Press™ is an imprint of Butterworth–Heinemann.
All trademarks found herein are property of their respective owners.
No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or
otherwise, without the prior written permission of the publisher.
Recognizing the importance of preserving what has been written, Butterworth–Heinemann
prints its books on acid-free paper whenever possible.
Library of Congress Cataloging-in-Publication Data
de Ville, Barry.
Microsoft
®
data mining : integrated business intelligence for e-commerce and knowledge
management / by Barry de Ville.
p. cm.
Includes index.
ISBN 1-55558-242-7 (pbk. : alk. paper)
1. Data mining. 2. OLE (Computer file) 3. SQL server. I. Title.

QA76.9.D343 D43 2000
006.3 dc21
00-047514
British Library Cataloging-in-Publication Data
A catalogue record for this book is available from the British Library.
The publisher offers special discounts on bulk orders of this book.
For information, please contact:
Manager of Special Sales
Butterworth–Heinemann
225 Wildwood Avenue
Woburn, MA 01801-2041
Tel: 781-904-2500
Fax: 781-904-2620
For information on all Butterworth–Heinemann publications available, contact our
World Wide Web home page at: .
10987654321
Printed in the United States of America
To Naomi and Gaetan
This Page Intentionally Left Blank
vii
Contents
Foreword xi
Preface xiii
Acknowledgments xix
1 Introduction to Data Mining 1
1.1 Something old, something new 3
1.2 Microsoft’s approach to developing the right set of tools 7
1.3 Benefits of data mining 10
1.4 Microsoft’s entry into data mining 18
1.5 Concept of operations 19

2 The Data Mining Process 23
2.1 Best practices in knowledge discovery in databases 24
2.2 The scientific method and the paradigms that come with it 25
2.3 How to develop your paradigm 30
2.4 The data mining process methodology 37
2.5 Business understanding 39
2.6 Data understanding 41
2.7 Data preparation 44
2.8 Modeling 45
2.9 Evaluation 49
2.10 Deployment 51
2.11 Performance measurement 54
2.12 Collaborative data mining: the confluence of data mining
and knowledge management 55
3 Data Mining Tools and Techniques 59
3.1 Microsoft’s entry into data mining 60
3.2 The Microsoft data mining perspective 60
viii Contents
3.3 Data mining and exploration (DMX) projects 64
3.4 OLE DB for data mining architecture 65
3.5 The Microsoft data warehousing framework and alliance 71
3.6 Data mining tasks supported by SQL Server 2000
Analysis Services 72
3.7 Other elements of the Microsoft data mining strategy 86
4 Managing the Data Mining Project 93
4.1 The mining mart 94
4.2 Unit of analysis 95
4.3 Defining the level of aggregation 97
4.4 Defining metadata 98
4.5 Calculations 99

4.6 Standardized values 102
4.7 Transformations for discrete values 103
4.8 Aggregates 103
4.9 Enrichments 111
4.10 Example process (target marketing) 112
4.11 The data mart 115
5 Modeling Data 117
5.1 The database 118
5.2 Problem scenario 118
5.3 Setting up analysis services 120
5.4 Defining the OLAP cube 124
5.5 Adding to the dimensional representation 132
5.6 Building the analysis view for data mining 135
5.7 Setting up the data mining analysis 137
5.8 Predictive modeling (classification) tasks 139
5.9 Creating the mining model 141
5.10 The tree navigator 147
5.11 Clustering (creating segments) with cluster analysis 151
5.12 Confirming the model through validation 158
5.13 Summary 159
6 Deploying the Results 163
6.1 Deployments for predictive tasks (classification) 164
6.2 Lift charts 172
6.3 Backing up and restoring databases 175
Contents ix
Contents
7 The Discovery and Delivery of Knowledge for Effective
Enterprise Outcomes: Knowledge Management 177
7.1 The role of implicit and explicit knowledge 179
7.2 A primer on knowledge management 180

7.3 The Microsoft technology-enabling framework 199
7.4 Summary 208
Appendix A: Glossary 213
Appendix B: References 219
Appendix C: Web Sites 223
Appendix D: Data Mining and Knowledge Discovery
Data Sets in the Public Domain 229
Appendix E: Microsoft Solution Providers 255
Appendix F: Summary of Knowledge Management
Case Studies and Web Locations 289
Index 301
This Page Intentionally Left Blank
xi
Foreword
The year 1989 seems so long ago! Back in those heady days of the software
industry, a chap named Barry de Ville approached me with a view to having
my organization license a rudimentary software tool for data mining. At the
time, I worked in a large multinational software firm and was responsible
for the business side of several mission-critical R&D projects aimed at
changing the paradigm of software tools for knowledge workers. Each
project involved data modeling and data mining. In the end, after spending
many millions of dollars, these projects were either dropped or significantly
altered. However, one piece that survived this purge was the software
licensed from de Ville. In fact, it went on to become part of the product
that changed the company and established a market for desktop analytics.
The business decision to terminate or severely curtail what were once
corporate priorities had its roots in the realization that the marketplace, and
in particular the high-end business customer in large corporations, was not
yet ready for large-scale data mining. Two reasons for this dominated, and
both related to the past and not the present. First, there were no generally

accepted standards to link nascent mining tools to various data models, and
there certainly were no widely used data mining frameworks. Second, there
was a general lack of know-how and a poor understanding of analytics in
the target user community.
Today, the advent of de facto standards such as OLAP databases and
tools such as OLE DB for DM, along with the emergence of data mining
frameworks, have firmly established data mining as a viable and important
use of computing in business. For example, this capability has been honed
into powerful applications such as customer relationship management. This
application domain is becoming all the more important with the advent of
large-scale databases underpinning e-commerce and e-business.
The second reason for the earlier failure had much more to do with the
receptor capacity of the marketplace than with the vendor community’s
xii Foreword
ability to deliver appropriate tools. With the vast majority of organizations
seeing the database only in terms of a relational model, the concept of
applying multidimensional analytics to corporate data was little more than a
dream. Consequently, the second key to opening the data mining market
has been the spread of know-how. In the workplace this know-how is pri-
marily supplied through widely available information in the trade press and
commercial computer-related publications.
The decision by Microsoft Corporation, as early as 1998, to become a
major player in the data mining arena set the stage for things to come.
Today’s coupling of the latest data mining capabilities with SQL Server
2000 has created a clear and present need to capture and consolidate in one
place the principles of data mining and multidimensional analytics with a
practical description of the Microsoft data mining architecture and tool set.
This book does just that.
Recognizing the receptor problem and the power and ease of use of the
new Microsoft data mining solution has afforded Barry de Ville with the

opportunity to help redress receptor capacity by writing this practical guide-
book, which contains illustrative and illuminating examples from business,
science, and society. Moreover, he has taken an approach that compartmen-
talizes concepts and relationships so that the reader can more readily assimi-
late the content in terms of his or her own general knowledge and work
experience, rather than dig through the more classical formalisms of an aca-
demic treatise.
Peter K. MacKinnon
Managing Director
Synergy Technology Management
e-mail:
telephone: (613) 241-1264
xiii
Preface
Data mining exploits the knowledge that is held in the enterprise data store
by examining the data to reveal patterns that suggest better ways to produce
profit, savings, higher-quality products, and greater customer satisfaction.
Just as the lines on our faces reveal a history of laughter and frowns, the pat-
terns embedded in data reveal a history of, for example, profits and losses.
The retrieval of these patterns from data and the implementation of the les-
sons learned from the patterns are what data mining and knowledge discov-
ery are all about.
This book will appeal to people who have come to depend upon
Microsoft to provide a high-performance and economical point of entry for
an ever-increasing range of computer applications and who sense the poten-
tial value of pursuing data mining approaches to support business intelli-
gence initiatives in their enterprises. Traditional producers and consumers
of business intelligence products and processes, especially OLAP (On-Line
Analytical Processing), will also be attracted by this information. Most busi-
ness intelligence vendors, especially Microsoft, recognize that business intel-

ligence and data mining are different facets of the same process of turning
data into knowledge. SQL Server 7, released late in 1998, introduced SQL
Server 7 OLAP services, thus providing a built-in OLAP reporting facility
for the database. In the same manner, SQL Server 2000 provides built-in
data mining services as a fundamental part of the database. Now, both these
important forms of business reporting will be available as core components
of the database functionality; further, by providing both sets of facilities in a
common interface and platform, Microsoft has taken the first step in pro-
viding a seamless integration of the various methods and metaphors of busi-
ness reporting so that one simple, unified interface to the knowledge
contained in data is provided. Whether that knowledge was delivered on
the basis of an OLAP technique or data mining technique is irrelevant to
most users, and now it will be irrelevant in a unified SQL 2000 framework.
xiv Preface
This book will emphasize the data mining aspects of business intelli-
gence in order to explain and illustrate data mining techniques and best
practices, particularly with respect to the data mining functionality that is
available in the new generation of Microsoft business intelligence tools: the
new OLE DB for DM (data mining) and SQL Server functions. Both
OLAP and data mining are complex technologies. OLAP, however, is intu-
itively easier to grasp, since the reporting dimensions are almost always
business terms and concepts and are organized as such. Data mining is more
flexible than OLAP, however, and the patterns that are sought out in data
through data mining are often counter-intuitive from a business standpoint.
So, initially, it can be more difficult to conceptualize data mining. A core
goal of this book is to help all users to move through this conceptualiza-
tional task in order to reap the benefits of an integrated OLAP and data
mining framework.
Discovering successful patterns that are contained in data, but that are
normally hidden, can be a formidable challenge. For example, take gross

margins in a retail sales data store. Here we see that the margins fluctuate
over the course of a year. A plot of the values held in the gross margin field
in the data store might reveal a 10 percent increase in gross margin between
summer and fall. We might be tempted to conclude that sales margins
increase as we move from summer to fall. In this case we would say that the
increase in gross margin depends upon the season.
But there are many other potential dependencies, which could influence
gross margin, that are locked in the data store. Along with the field season
are other fields of data—for example, quantity sold, discount rate, commis-
sion paid, customer location, other purchases made, length of time as a cus-
tomer, and so on. What if the discount rate is greater in the summer than in
the fall? Then, possibly, the increase in gross margin that we see in the fall is
simply a result of a lower discount rate. In this case gross margin does not
vary by season at all—it varies according to the discount rate! In this case
the apparent relationship, or dependency, that we observed between season
and discount rate is a spurious one. If we adjust our view of gross margins to
remove the effect of discount rate, then maybe we would find that, actually,
gross margins would be higher in the summer. So, in order to do a thorough
job of data mining and knowledge discovery it is essential to look at all
potential explanatory factors and associated data elements to ensure that the
very best pattern is retrieved from the data and that no spurious, and poten-
tially misleading, effects are introduced into the patterns that we select.
What if the data store could be manipulated so that all of the dependen-
cies that affect the questions we are looking at (e.g., gross margin) could be
Preface xv
Preface
considered together? What if we could search through all the combinations
of dependencies and find a unique combination, or pattern, that isolates a
particular combination of events that maximizes the gross margin? Then,
instead of simply showing the effect of one condition, say season, on gross

margin, we could show the combined effect of a pattern, say a particular
time, location, and discount rate, that produces the maximum gross mar-
gin. Once we have isolated this optimal pattern, we have a particular gem of
wisdom, since, if we can reproduce that pattern more often in the future, we
can establish a strategy that will systematically increase our gross margin
and associated profitability over time.
There is no lack of data in the modern enterprise. So the raw material
for data mining and knowledge discovery is abundantly available. The data
store contains records that have the potential to reveal patterns of depend-
encies that can enrich a wide variety of enterprise goals, missions, and
objectives. Retail sales can benefit from the examination of sales records to
reveal highly profitable retail sales patterns. Financial analysts can examine
the records of financial transactions to reveal patterns of successful transac-
tions. An engineering enterprise can search through its records surrounding
the engineering process—manufacturing time, lot size, assembly parame-
ters, and operator number—to determine the combination of data condi-
tions that relate to the quality measure of the device coming off the
assembly line. Marketing analysts can look at the marketing data store to
detect patterns that are associated with market growth or customer respon-
siveness.
The data are freely available and the pay-offs are enormous: the ability to
decrease inventory, increase customer buying propensity, drive product
defects detection closer to the assembly line, and so on by as little as 1 per-
cent represents a truly staggering, Midas-like fortune in the billion-dollar-a-
day industries of finance, manufacturing, retail services, and high technol-
ogy. The key to reaping the rewards of data mining is to have a cost-effective
set of tools and body of knowledge to undertake the knowledge discovery.
Until recently the tools that were available to accomplish this task were
relatively rare and relatively expensive. Business intelligence OLAP facilities
have become much more commonplace but, as demonstrated above, busi-

ness intelligence OLAP tools may not find all the patterns and dependen-
cies that lie in data. For this, a data mining tool is required.
Microsoft recognized this requirement after the release of SQL Server 7
and began a development program to migrate data mining and knowledge
discovery capabilities into the SQL Server 2000 release. This release, and
xvi Preface
the associated data mining and knowledge discovery tools, techniques, con-
cepts, and best practices, are reviewed here. The primary task will be to
explain data mining and the Microsoft data mining framework. The chap-
ters are as follows:
1. Introduction to Data Mining: its relevance and utility to 2000-era
enterprises and the role of Microsoft architecture and technolo-
gies. This chapter provides a big-picture view of data mining:
what it is, why it is useful, and how it works. What are the barri-
ers to the adoption of data mining and what is Microsoft doing
about these barriers? This covers the Microsoft Socrates project
and the directions that Microsoft will pursue in data mining in
the future.
2. The Data Mining Process: This chapter discusses the process of
using data to model and reflect real-world events and activity: the
interoperation of measurement, data and business models, and
conceptual paradigms to reflect real-world phenomena. Testing
and refining the models—patterns, structure, relationships, expla-
nation, and prediction—are also discussed. Best practices in exe-
cuting the data mining mission, such as business goal, ROI
outcome identification, the conceptual model, operational mea-
sures, data elements, data transformation, data exploration,
model development, model exploration, model verification, and
performance measurement, are addressed in depth.
Chapter 2 also discusses the following topics:

 ROI and the choice of an appropriate business objective
 Creating a seamless business process for data mining
 Closed-loop processes
You can’t manage what you can’t measure—the role of perform-
ance measurement and campaign management for continuous
improvement in data mining is explained in this chapter.
3. Data Mining Tools and Techniques (and the associated Microsoft
data mining architecture): revealing structure in data—profiling
and segmentation approaches, and predictive modeling—applica-
tions and their lifetime value optimization through profitable cus-
tomer acquisition. Data mining query languages and the
integration with OLAP, OLE DB for DM, and scaling to large
databases are explained in detail. Leveraging the Microsoft archi-
tecture—how developers and users can leverage the Microsoft
Preface xvii
Preface
technology and architecture and how Microsoft’s strategy plays to
the broadened focus of data mining, the Web, and the desktop—
is discussed in this chapter.
4. Managing the Data Mining Project: assembling the data mart;
best practices in preparing data for analysis (includes best prac-
tices in data assembly for data mining). Techniques for integrat-
ing data preparation with Microsoft database management tools:
Access and SQL.
5. Modeling Data: best practices for producing models (includes a
discussion of best practices for producing both descriptive and
predictive models); techniques for using OLE DB DM exten-
sions; best practices for testing models, including validation and
sanity checks.
6. Deploying the Data Mining Project Results: best practices for

deploying the model results; predictive models. This includes
managing the deployment results by implementing a closed-loop
campaign management and performance measurement system.
7. Managing Knowledge about the Project: knowledge management
in data mining. A framework for conceptualizing and capturing
an integrated view of profitability drivers in the business and asso-
ciated Microsoft technologies.
This Page Intentionally Left Blank
xix
Acknowledgments
This book would never have been produced without input from several tal-
ented and supportive people at various stages along the way. Rolf Schliewen
and Jacques deVerteuil were the first to introduce me to decision trees and
sent me down a path toward data mining from which I have never returned.
Doug Laurie Lean, Ed Suen, David Biggs, Rob Rose, Tim Eastland, Mike
Faulkner, and Andy Burans provided invaluable assistance on this path.
I never would have written this book without the advice, assistance, sup-
port, and urging of my long time friend, Peter Neville. Another Neville,
Padraic, urged me on, and two dear business associates, Ken Ono and Eric
Apps, provided a safe, supportive, and stimulating environment in which to
hatch the plan.
Lorna Palmer is a seasoned hand in the area of knowledge management
and has helped me over the years come to an understanding and apprecia-
tion of the relationship between it and data mining. She contributed the
outline and much of the content in the chapter on knowledge management
and I simply could not have written it without her.
Jesus Mena, a fellow data mining traveler for over ten years, introduced
me to the publishing family that has continued to move this project for-
ward: Phil Sutherland, who started the project, and Theron Shreve, Pam
Chester, Katherine Antonsen, Lauralee Reinke, and Alan Rose who,

together, finished it.
Peter MacKinnon, who wrote the Foreword, has been a constant inspira-
tion, as has Stan Matwin. Greg James and Sergei Anayan have provided
many detailed notes and comments to help shape the text you see today.
I received constant support and encouragement as the product you see
here took its sometimes-torturous route to completion from Laurie O’Neil.
Writing tends to be a solitary activity and Laurie chose to support me rather
xx Acknowledgments
than regret my absence or chafe at not being with me; for that, I am both
thankful and mindful that—thanks to this support—better results are
reflected in the pages to follow.
1
1
Introduction to Data Mining
The thirst for knowledge is an innate human characteristic.
—Aristotle
People have been recording and extracting knowledge from data since the
beginning of time. The cave drawings of Arles, the cuneiform tablets docu-
menting shipboard loading manifests of ancient Babylon, and the Rosetta
Stone are examples of the defining human characteristic to make sense of
the world through data constructs recorded in symbolic—frequently
numeric—form. The cave drawings capture the experience of the day—the
life and death dramas of the hunt, the harvest, the feasting, and the fertility;
the cuneiform tablets record the minutiae of early trade—counting the
weight, cut, and number of precious stones or the number and volume of
amphorae filled with olive oil; and the Rosetta Stone provides a key to
Egyptian hieroglyphics.
Everywhere and always people reflect and record their reality in data laid
down in various recording media. The earliest data miners reconstructed
life styles from cave drawings so as to describe and predict human activity in

those circumstances. They could describe and predict trading patterns and
the effect of variables on the olive tree harvest in the ancient Mediterranean
Sea area. Indeed, even today archeologists and anthropologists can infer
effects on current-day trading patterns based on early trading models built
from examining the data contained in these and other tablets. These tablets,
of course, are “little tables”—the precursors of modern database systems.
2 Introduction to Data Mining
So data mining has its roots in one of the oldest of human activities: the
desire to summarize experience in some numeric or symbolic form so as to
describe it better and preserve both meaning and experience. As soon as we
describe and preserve experience through data and symbolic traces, we inev-
itably begin the process of disentangling the meanings through some kind
of data mining process. Regardless of the source of the record, it seems inev-
itable that someone will come along to interpret it so as to make better pre-
dictions about the experience that has been recorded. Often the description
seems out of idle curiosity, but inevitably the motivation turns toward
extracting some kind of knowledge for profit or knowledge that can poten-
tially be translated into another kind of spiritual or material return.
Although data mining, or knowledge discovery as it is sometimes called,
seems to be a very recent and novel invention, the origins of data mining
and knowledge discovery are as old as the record of civilization. Nowadays,
as our ability to record data increases—astronomically it seems—so too
does our ingenuity turn to develop more powerful data mining (data disen-
tangling) methods to keep up with the interpretation of the constant, and
growing, accumulation of data. This leads us to a definition of data mining:
Data mining is a current-day term for the computer implementation of a
timeless human activity. It is the process of using automated methods to
uncover meaning—in the form of trends, patterns, and relationships—from
accumulated electronic traces of data.
It is normal to use data mining for a purpose—typically to gain insight

and improvements in business functions. The utility of data is unques-
tioned. But how does the utility present itself? Utility presents itself in the
form of a model. If I can describe the operation of natural phenomena with
a few well-chosen data elements, then I can present a simple data summari-
zation—a model built from data—that is easy to grasp and conceptualize.
Knowing my minimal monthly average in my savings account provides me
with a simple and readily understandable indicator of many aspects of my
financial well-being. The average is a conceptual construct—built from
data. It is a model of the world and, by manipulating the model in my
mind, I can make intelligence guesses, or inferences, about the real world
that is reflected in my model. For example, if I can plot an upward or down-
ward trend in my account earnings rate, then I can predict with greater cer-
tainty the likely rate of return in the next period. If I double (or halve) my
average savings rate, then I can make some important inferences about the
state of my earning power that is behind this doubling or halving.
We use symbolic models to reflect real-world events in an ever-broaden-
ing range of areas. By manipulating the models we find out more about the
1.1 Something old, something new 3
Chapter 1
real world. We can translate model manipulations, carried out conceptually
(and safely) into real-world manipulations, often at a profit. This is the
power of knowledge; specifically, this is the power of knowledge discovery
or data mining.
Data are exploding all around us: We leave daily electronic traces of our
activity in almost everything we do. As computer power, storage capacity,
and broadband networking continue to expand, so too do our data traces
broaden and deepen. It can be interesting to imagine what future anthro-
pologists, armed with data detection devices, will make of our current civili-
zation. It is easy to believe that the archeological digs of tomorrow will be
data mining workstations that can detect, extract, summarize, and apply

meaning to the data traces we leave in this vast, interconnected computer
and communications network we inhabit.
In fact, in this age of the Web, the data mining workstations of tomor-
row are being built today. What do these workstations look like? How do
they work? What can we do with them? Specifically, what is Microsoft up to
in this area? Will organizations such as Microsoft tap into something like a
basic human instinct to extract meaning from data? Does data mining have
the same intuitive appeal as the graphical user interface? Does it have the
same appeal as word processing, spreadsheets, e-mail, and database soft-
ware? Does data mining belong on the desktops of computer users every-
where? Data mining is certainly here. It is not going to go away. Let’s see
where current technology appears to be taking data mining and where it is
likely to go.
1.1 Something old, something new
As suggested previously, civilization has always tapped data to uncover
meaning and to make intellectual, economic, and technical progress. Until
the late 1990s, most of the data tapping and disentangling of meaning took
place in a specialized research and development–oriented environment and
took specialized skills to produce results. Data mining techniques were
developed in scientific settings and, originally, had scientific goals and
objectives in mind. The computer algorithms to perform data mining tasks
were developed by statisticians and artificial intelligence researchers.
However, by the turn of the century, computer technology—and associ-
ated computer networks—had become commodity items. Just as a spread-
sheet program provides sophisticated business planning functions on the
desktop, so too could tools be designed to provide sophisticated statistical
and artificial intelligence functions on the desktop. If statistical and numer-
4 1.1 Something old, something new
ical algorithms could be harnessed to design buildings, bridges, and even
nuclear weapons, so too could these same algorithms be used to build new

products, better customer relationships, and, quite possibly, new forms of
businesses based on intelligent and automated data mining algorithms.
To mine data you need to have access to data. It is no coincidence that
data mining grew at the same time that data warehousing developed. As
computer power and database capability grew through the late 1900s, it
became increasingly clear that data were not simply passive receptacles, use-
ful in performing billing or order-entry functions, but data could also be
used in a more proactive role so as to provide predictive value that would be
useful in guiding a business forward. This concept led to the development
of computer decision support, or executive information systems (EIS). The
idea was to harness growing computing power and improved graphical
interfaces in order to slice and dice data in novel ways to blow away old,
static reporting concepts. Slicing and dicing data—drilling down into many
detailed reports or zooming up to a 10,000-foot “big-picture” view—
required special ways of organizing data for decision making. This gave rise
to the concept of the data warehouse.
Decision support systems and associated data warehousing created an
environment that integrated data from disparate business systems. This
extended traditional business reporting to support consolidated reporting
across multiple sources of data, usually in an interactive, graphically-
enhanced mode.
The term data warehousing was virtually unknown in 1990. Ten years
later data warehousing had become a $10+ billion business annually—a
business that was devoted to capturing and organizing data so as to provide
a proactive analytical (versus operational) environment in which to deploy
data in the service of defining and guiding business activity.
Business caught on to the same thing that science caught on to: Data
capture experience and, appropriately treated, can provide lots of ammuni-
tion to win competitive battles. Data warehousing, by organizing data for
analysis, provides the raw material of data organized for analysis and deci-

sion making.
The field of decision support and executive information systems contin-
ued to evolve in line with the growth of data warehousing. Decision support
and executive information systems gave way to the more general concept of
business intelligence (coined in 1996 by IT trend watcher, Howard Dresner
of the Gartner Group). Dresner’s insight suggested that as data moved from
supporting operational purposes to include analytical purposes, the analyti-

×