BIG DATA
ANALYTICS
A Practical Guide
for Managers
Kim H. Pries
Robert Dunnigan
BIG DATA
ANALYTICS
A Practical Guide
for Managers
BIG DATA
ANALYTICS
A Practical Guide
for Managers
Kim H. Pries
Robert Dunnigan
MATLAB® and Simulink® are trademarks of The MathWorks, Inc. and are used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion
of MATLAB® and Simulink® software or related products does not constitute endorsement or sponsorship
by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® and Simulink®
software.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20141024
International Standard Book Number-13: 978-1-4822-3452-7 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com ( or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
and the CRC Press Web site at
Contents
Preface.................................................................................................. xiii
Acknowledgments................................................................................. xv
Authors.................................................................................................xvii
Chapter 1 Introduction........................................................................ 1
So What Is Big Data?....................................................................1
Growing Interest in Decision Making.......................................4
What This Book Addresses.........................................................6
The Conversation about Big Data..............................................7
Technological Change as a Driver of Big Data.......................12
The Central Question: So What?..............................................13
Our Goals as Authors................................................................18
References....................................................................................19
Chapter 2 The Mother of Invention’s Triplets: Moore’s Law, the
Proliferation of Data, and Data Storage Technology........21
Moore’s Law................................................................................22
Parallel Computing, between and within Machines.............25
Quantum Computing................................................................31
Recap of Growth in Computing Power...................................31
Storage, Storage Everywhere.....................................................32
Grist for the Mill: Data Used and Unused..............................39
Agriculture................................................................................. 40
Automotive................................................................................. 42
Marketing in the Physical World.............................................45
Online Marketing.......................................................................49
Asset Reliability and Efficiency............................................... 54
Process Tracking and Automation......................................... 56
Toward a Definition of Big Data...............................................58
Putting Big Data in Context.....................................................62
Key Concepts of Big Data and Their Consequences............. 64
Summary.....................................................................................67
References....................................................................................67
v
vi • Contents
Chapter 3 Hadoop............................................................................... 73
Power through Distribution.....................................................75
Cost Effectiveness of Hadoop..............................................79
Not Every Problem Is a Nail.....................................................81
Some Technical Aspects.......................................................81
Troubleshooting Hadoop..........................................................83
Running Hadoop....................................................................... 84
Hadoop File System................................................................... 84
MapReduce............................................................................ 86
Pig and Hive............................................................................... 90
Installation..................................................................................91
Current Hadoop Ecosystem......................................................91
Hadoop Vendors.........................................................................94
Cloudera..................................................................................94
Amazon Web Services (AWS)..................................................95
Hortonworks...............................................................................97
IBM...............................................................................................97
Intel.............................................................................................. 99
MapR......................................................................................... 100
Microsoft................................................................................... 100
Running Pig Latin Using Powershell................................101
Pivotal........................................................................................103
References..................................................................................104
Chapter 4 HBase and Other Big Data Databases........................... 105
Evolution from Flat File to the Three V’s..............................105
Flat File..................................................................................106
Hierarchical Database.........................................................110
Network Database...............................................................110
Relational Database.............................................................111
Object-Oriented Databases................................................114
Relational-Object Databases..............................................114
Transition to Big Data Databases...........................................115
What Is Different about HBase?........................................116
What Is Bigtable?.................................................................119
What Is MapReduce?......................................................... 120
What Are the Various Modalities for Big Data
Databases?............................................................................ 122
Contents • vii
Graph Databases...................................................................... 123
How Does a Graph Database Work?................................ 123
What Is the Performance of a Graph Database?............ 124
Document Databases.............................................................. 124
Key-Value Databases................................................................131
Column-Oriented Databases..................................................138
HBase.....................................................................................138
Apache Accumulo...............................................................142
References..................................................................................149
Chapter 5 Machine Learning........................................................... 151
Machine Learning Basics........................................................151
Classifying with Nearest Neighbors......................................153
Naive Bayes............................................................................... 154
Support Vector Machines........................................................155
Improving Classification with Adaptive Boosting..............156
Regression..................................................................................157
Logistic Regression...................................................................158
Tree-Based Regression.............................................................160
K-Means Clustering.................................................................161
Apriori Algorithm....................................................................162
Frequent Pattern-Growth........................................................164
Principal Component Analysis (PCA)..................................165
Singular Value Decomposition...............................................166
Neural Networks......................................................................168
Big Data and MapReduce........................................................173
Data Exploration......................................................................175
Spam Filtering...........................................................................176
Ranking.....................................................................................177
Predictive Regression...............................................................177
Text Regression.........................................................................178
Multidimensional Scaling.......................................................179
Social Graphing........................................................................182
References..................................................................................191
Chapter 6 Statistics........................................................................... 193
Statistics, Statistics Everywhere..............................................193
Digging into the Data..............................................................195
viii • Contents
Standard Deviation: The Standard Measure of
Dispersion................................................................................. 200
The Power of Shapes: Distributions.......................................201
Distributions: Gaussian Curve.............................................. 205
Distributions: Why Be Normal?.............................................214
Distributions: The Long Arm of the Power Law................. 220
The Upshot? Statistics Are Not Bloodless............................ 227
Fooling Ourselves: Seeing What We Want to See in the
Data........................................................................................... 228
We Can Learn Much from an Octopus.................................232
Hypothesis Testing: Seeking a Verdict................................. 234
Two-Tailed Testing............................................................. 240
Hypothesis Testing: A Broad Field.........................................241
Moving On to Specific Hypothesis Tests............................. 242
Regression and Correlation.................................................... 247
p Value in Hypothesis Testing: A Successful
Gatekeeper?.............................................................................254
Specious Correlations and Overfitting the Data................. 268
A Sample of Common Statistical Software Packages..........273
Minitab..................................................................................273
SPSS.......................................................................................274
R.............................................................................................275
SAS........................................................................................ 277
Big Data Analytics......................................................... 277
Hadoop Integration........................................................278
Angoss...................................................................................278
Statistica................................................................................279
Capabilities......................................................................279
Summary.................................................................................. 280
References................................................................................. 282
Chapter 7 Google.............................................................................. 285
Big Data Giants........................................................................ 285
Google....................................................................................... 286
Go.......................................................................................... 292
Android.................................................................................293
Google Product Offerings.................................................. 294
Google Analytics................................................................ 299
Contents • ix
Advertising and Campaign Performance.................. 299
Analysis and Testing..................................................... 300
Facebook................................................................................... 308
Ning............................................................................................310
Non-United States Social Media............................................311
Tencent..................................................................................311
Line........................................................................................311
Sina Weibo............................................................................312
Odnoklassniki......................................................................312
Vkontakte.............................................................................312
Nimbuzz................................................................................312
Ranking Network Sites............................................................313
Negative Issues with Social Networks...................................314
Amazon......................................................................................316
Some Final Words................................................................... 320
References..................................................................................321
Chapter 8 Geographic Information Systems (GIS)......................... 323
GIS Implementations.............................................................. 324
A GIS Example..........................................................................332
GIS Tools....................................................................................335
GIS Databases.......................................................................... 346
References................................................................................. 348
Chapter 9 Discovery......................................................................... 351
Faceted Search versus Strict Taxonomy................................352
First Key Ability: Breaking Down Barriers......................... 356
Second Key Ability: Flexible Search and Navigation............ 358
Underlying Technology.......................................................... 364
The Upshot............................................................................... 365
Summary.................................................................................. 366
References................................................................................. 367
Chapter 10 Data Quality.................................................................... 369
Know Thy Data and Thyself................................................... 369
Structured, Unstructured, and Semistructured Data.........373
Data Inconsistency: An Example from This Book...............374
The Black Swan and Incomplete Data...................................378
x • Contents
How Data Can Fool Us............................................................379
Ambiguous Data..................................................................379
Aging of Data or Variables................................................ 384
Missing Variables May Change the Meaning................. 386
Inconsistent Use of Units and Terminology................... 388
Biases......................................................................................... 392
Sampling Bias...................................................................... 392
Publication Bias.................................................................. 396
Survivorship Bias................................................................ 396
Data as a Video, Not a Snapshot: Different Viewpoints
as a Noise Filter........................................................................ 400
What Is My Toolkit for Improving My Data?..................... 406
Ishikawa Diagram.............................................................. 409
Interrelationship Digraph...................................................412
Force Field Analysis.............................................................414
Data-Centric Methods.............................................................415
Troubleshooting Queries from Source Data....................416
Troubleshooting Data Quality beyond the Source
System....................................................................................419
Using Our Hidden Resources........................................... 422
Summary.................................................................................. 423
References................................................................................. 424
Chapter 11 Benefits............................................................................ 427
Data Serendipity...................................................................... 427
Converting Data Dreck to Usefulness.................................. 428
Sales........................................................................................... 430
Returned Merchandise........................................................... 432
Security..................................................................................... 434
Medical..................................................................................... 435
Travel......................................................................................... 437
Lodging................................................................................ 437
Vehicle.................................................................................. 439
Meals.................................................................................... 440
Geographical Information Systems...................................... 442
New York City..................................................................... 442
Chicago CLEARMAP........................................................ 443
Baltimore............................................................................. 446
Contents • xi
San Francisco...................................................................... 448
Los Angeles.......................................................................... 449
Tucson, Arizona, University of Arizona, and
COPLINK.............................................................................451
Social Networking....................................................................452
Education.................................................................................. 454
General Educational Data................................................. 454
Legacy Data..........................................................................455
Grades and Other Indicators............................................ 456
Testing Results.................................................................... 456
Addresses, Phone Numbers, and More........................... 457
Concluding Comments.......................................................... 458
References................................................................................. 459
Chapter 12 Concerns.......................................................................... 463
Logical Fallacies....................................................................... 469
Affirming the Consequent..................................................470
Denying the Antecedent.....................................................471
Ludic Fallacy........................................................................473
Cognitive Biases........................................................................473
Confirmation Bias...............................................................473
Notational Bias.....................................................................475
Selection/Sample Bias.........................................................475
Halo Effect............................................................................476
Consistency and Hindsight Biases................................... 477
Congruence Bias..................................................................478
Von Restorff Effect...............................................................478
Data Serendipity.......................................................................479
Converting Data Dreck to Usefulness..............................479
Sales............................................................................................479
Merchandise Returns.............................................................. 482
Security..................................................................................... 483
CompStat............................................................................. 483
Medical................................................................................. 486
Travel......................................................................................... 487
Lodging................................................................................ 487
Vehicle.................................................................................. 488
Meals.................................................................................... 490
xii • Contents
Social Networking....................................................................491
Education.................................................................................. 492
Making Yourself Harder to Track......................................... 497
Misinformation................................................................... 498
Disinformation.................................................................... 499
Reducing/Eliminating Profiles......................................... 500
Social Media................................................................... 500
Self Redefinition............................................................. 500
Identity Theft...................................................................501
Facebook.............................................................................. 503
Concluding Comments...........................................................519
References..................................................................................521
Chapter 13 Epilogue........................................................................... 525
Michael Porter’s Five Forces Model.......................................527
Bargaining Power of Customers........................................528
Bargaining Power of Suppliers...........................................530
Threat of New Entrants.......................................................531
Others....................................................................................533
The OODA Loop.......................................................................533
Implementing Big Data............................................................534
Nonlinear, Qualitative Thinking............................................538
Closing.......................................................................................539
References................................................................................. 540
Preface
When we started this book, “big data” had not quite become a business
buzzword. As we did our research, we realized the books we perused
were either of the “Gee, whiz! Can you believe this?” class or incredibly
abstruse. We felt the market needed explanation oriented toward managers who had to make potentially expensive decisions.
We would like managers and implementors to know where to start when
they decide to pursue the big data option. As we indicate, the marketplace
for big data is much like that for personal computing in the early 1980s—
full of consultants, products with bizarre names, and tons of hyperbole.
Luckily, in the 2010s, much of the software is open source and extremely
powerful. Big data consultancies exist to translate this “free” software into
useful tools for the enterprise. Hence, nothing is really free.
We also ensure our readers can understand both the benefits and the
costs of big data in the marketplace, especially the dark side of data. By
now, we think it is obvious that the US National Security Agency is an
archetype for big data problem solving. Large-city police departments
have their own statistical data tools and some of them ponder the usefulness of cell phone confiscation and investigation as well as the use of social
media, which are public.
As we researched, we found ourselves surprised at the size of well-known
marketers such as Google and Amazon. Both of these enterprises have
purchased companies and have grown themselves organically. Facebook
continues to purchase companies (e.g., Oculus, the supplier of a potentially game-changing virtual reality system) and has over 1 billion users.
Algorithmic analysis of colossal volumes of data yields information; information allows vendors to tickle our buying reflexes before we even know
our own patterns.
Previously, we thought Esri owned the geographical information systems market, but we found a variety of geographical information systems
solutions—although the Esri product line is relatively mature and they
serve large-city police departments across the United States. Database creators explore new ways of looking at and storing/retrieving data—methods
going beyond the relational paradigm. New and old algorithmic methods
xiii
xiv • Preface
called machine learning allow computers to sort and separate the useful
data from the useless.
We have grown to appreciate the open-source statistical language R over
the years. R has become the statistical lingua franca for big data. Some of
the major statistical vendors advertise their functional partnerships with
R. We use the tool ourselves to generate many of our figures. We suspect R
is now the most powerful generally available statistical tool on the planet.
Let’s move on and see what we can learn about big data!
MATLAB® is a registered trademark of The MathWorks, Inc. For product
information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508 647 7000
Fax: 508-647-7001
E-mail:
Web: www.mathworks.com
Acknowledgments
Kim H. Pries would like to acknowledge Janise Pries, the love of his life,
for her support and editing skills. In addition, Robert Dunnigan supplied
verbiage, chapters, Six Sigma expertise, and big data professionalism. As
always, John Wyzalek and the Taylor & Francis team are key players in the
production and publication of technical works such as this one.
Robert Dunnigan thanks his wife, Flabia Dunnigan, and his son Robert III
for their love and patience during the composition of this book. He would
also like to thank Kim H. Pries for his depth of expertise in a broad array
of technical subjects as well as his experience as an author. He skillfully
navigated the process of proposing, developing, and finalizing what is
a unique and practical offering in the field of big data literature. Robert
would also like to thank his employer, The Kratos Group, for their interest
and moral support during the writing of this book. Kratos is a remarkable
company of which Robert is proud to be a part. Finally, thanks are due to
Taylor & Francis for bringing this new perspective on big data to market.
xv
Authors
Kim H. Pries has four college degrees: a bachelor of arts in history from
the University of Texas at El Paso (UTEP), a bachelor of science in metallurgical engineering from UTEP, a master of science in engineering from
UTEP, and a master of science in metallurgical engineering and materials
science from Carnegie-Mellon University. In addition, he holds the following certifications:
• APICS
• Certified Production and Inventory Manager (CPIM)
• American Society for Quality (ASQ)
• Certified Reliability Engineer (CRE)
• Certified Quality Engineer (CQE)
• Certified Software Quality Engineer (CSQE)
• Certified Six Sigma Black Belt (CSSBB)
• Certified Manager of Quality/Operational Excellence (CMQ/OE)
• Certified Quality Auditor (CQA)
Pries worked as a computer systems manager, a software engineer for an
electrical utility, and a scientific programmer under a defense contract; for
Stoneridge, Incorporated (SRI), he has worked as the following:
•
•
•
•
Software manager
Engineering services manager
Reliability section manager
Product integrity and reliability director
In addition to his other responsibilities, Pries has provided Six Sigma
training for both UTEP and SRI, and cost reduction initiatives for SRI.
Pries is also a founding faculty member of Practical Project Management.
Additionally, in concert with Jon Quigley, Pries was a cofounder and principal with Value Transformation, LLC, a training, testing, cost improvement, and product development consultancy. Pries also holds Texas
teacher certifications in:
xvii
xviii • Authors
•
•
•
•
•
•
•
•
•
•
•
Mathematics (8–12)
Mathematics (4–8)
Technology education (6–12)
Technology applications (EC–12)
Physics (8–12)
Generalist (4–8)
English Language Arts and Reading (8–12)
History (8–12)
Computer Science (8–12)
Science (8–12)
Special education (EC–12)
He trained for Introduction to Engineering Design and Computer
Science and Software Engineering with Project Lead the Way. He currently teaches biotechnology, computer science and software engineering,
and introduction to engineering design at the beautiful Parkland High
School in the Ysleta Independent School District of El Paso, Texas.
Pries authored or coauthored the following books:
• Six Sigma for the Next Millennium: A CSSBB Guidebook (Quality
Press, 2005)
• Six Sigma for the New Millennium: A CSSBB Guidebook, Second
Edition (Quality Press, 2009)
• Project Management of Complex and Embedded Systems: Ensuring
Product Integrity and Program Quality (CRC Press, 2008), with Jon
M. Quigley
• Scrum Project Management (CRC Press, 2010), with Jon M. Quigley
• Testing Complex and Embedded Systems (CRC Press, 2010), with Jon
M. Quigley
• Total Quality Management for Project Management (CRC Press,
2012), with Jon M. Quigley
• Reducing Process Costs with Lean, Six Sigma, and Value Engineering
Techniques (CRC Press, 2012), with Jon M. Quigley
• A School Counselor’s Guide to Ethics (Counselor Connection Press,
2012), with Janise G. Pries
• A School Counselor’s Guide to Techniques (Counselor Connection
Press, 2012), with Janise G. Pries
• A School Counselor’s Guide to Group Counseling (Counselor
Connection Press, 2012), with Janise G. Pries
Authors • xix
• A School Counselor’s Guide to Practicum (Counselor Connection
Press, 2013), with Janise G. Pries
• A School Counselor’s Guide to Counseling Theories (Counselor
Connection Press, 2013), with Janise G. Pries
• A School Counselor’s Guide to Assessment, Appraisal, Statistics, and
Research (Counselor Connection Press, 2013), with Janise G. Pries
Robert Dunnigan is a manager with The Kratos Group and is based in
Dallas, Texas. He holds a bachelor of science in psychology and in sociology with an anthropology emphasis from North Dakota State University.
He also holds a master of business administration from INSEAD, “the
business school for the world,” where he attended the Singapore campus.
As a Peace Corps volunteer, Robert served over 3 years in Honduras
developing agribusiness opportunities. As a consultant, he later worked
on the Afghanistan Small and Medium Enterprise Development project
in Afghanistan, where he traveled the country with his Afghan colleagues
and friends seeking opportunities to develop a manufacturing sector in
the country.
Robert is an American Society for Quality certified Six Sigma Black Belt
and a Scrum Alliance certified Scrum Master.
1
Introduction
SO WHAT IS BIG DATA?
As a manager, you are expected to operate as a factotum. You need to be
an industrial/organizational psychologist, a logician, a bean counter, and
a representative of your company to the outside world. In other words,
you are somewhat of a generalist who can dive into specifics. The specific
technologies you encounter are becoming more complex, yet the differences between them and their predecessors are becoming more nuanced.
You may have already guided your firm’s transition to other new technologies. Think of the Internet. In the decade and a half before this book was
written, Internet presence went from being optional to being mandatory
for most businesses. In the past decade, Internet presence went from being
unidirectional to conversational. Once, your firm could hang out its online
shingle with either information about its physical location, hours, and
offerings if it were a brick-and-mortar business or else your offerings and
an automated payment system if it were an online business. Firms ranging
from Barnes & Noble to your corner pizza chain bridged these worlds.
A new buzzword arrived: Web 2.0. Despite much hyperbolic rhetoric,
this designation described the real phenomenon of a reciprocal online
world. An disgruntled representative of your company responding by
the archetypical Web 2.0 technology called social media could cause real
damage to your firm. Two news stories involving Twitter broke as this
introduction was in its final stages of refinement.
First, Brendan Eich, the new CEO of the software organization Mozilla
(creator of the Firefox browser), stepped down after news surfaced indicating he had donated money in support of Proposition 8, an anti–gay
marriage initiative in California, some 6 years before (in 2008). An uproar
erupted—largely on Twitter—which led Mr. Eich to resign. Voices in
1
2 • Big Data Analytics
Mr. Eich’s defense from across the political spectrum—including Andrew
Sullivan, the respected conservative columnist who is himself gay and
a proponent for gay marriage rights, and Conor Friedersdorf of The
Atlantic, who was also an outspoken opponent of Proposition 8—did not
save Mr. Eich’s job. He was ousted.
The second Twitter story began with a tweeted complaint from a customer with the Twitter handle @ElleRafter. US Airways responded with
the typical reaction of a company facing such a complaint in the public
forum of Twitter. They invited @ElleRafter to provide more information,
along with a link. Unlike the typical Twitter response, however, the US
Airways tweet included a pornographic photo involving the use of a toy
US Airways aircraft. This does not appear to have been a premeditated
act by the US Airways representative involved—but it caused substantial
humiliating press coverage for the company.
As the Internet spread and matured, it became a necessary forum for
communication, as well as a dangerous tool whose potential for good or
bad can pull in others by surprise or cause self-inflicted harm. Just as World
War I generals were left to figure out how technology changed the field of
battle, shifting the advantage from the offense to the defense, Internet technology left managers trying to cope with a new landscape filled with both
promise and threats. Now, there is another new buzzword: big data.
So, what is big data? Is it a fad? Is it empty jargon? Is it just a new name for
growing capacity of the same databases that have been a part of our lives for
decades? Or, is it something qualitatively different? What are the promises
of big data? From which direction should a manager anticipate threats?
The tendency of the media to hype new and barely understood phenomena makes it difficult to evaluate new technologies, along with the nature
and extent of their significance. This book argues that big data is new and
possesses strategic significance. The argument the authors make about big
data is about how it builds on understandable developments in technology
and is itself comprehensible. Although it is comprehensible, it is not easy
to use and it can deliver misleading or incorrect results. However, these
erroneous results are not often random. They result from certain statistical and data-related phenomena. Knowing these phenomena are real and
understanding how they function enable you as a manager to become a
better user of your big data system.
Like cell phones and e-mail, big data is a recent phenomenon that has
emerged as a part of the panorama of our daily lives. When you shop
online, catch up with friends on Facebook, conduct web searches, read
Introduction • 3
articles referencing database searches, and receive unsolicited coupons,
you interact with big data. Many readers, as participants in a store’s loyalty
program, possess a key fob featuring a bar code on one side and the logo
of a favorite store on the other. One of the primary rationales of these programs, aside from decreasing your incentive to shop elsewhere, is to gather
data on the company’s most important customers. Every time you swipe
your key fob or enter your phone number into the keypad of the credit card
machine while you are checking out at the cash register, you are tying a
piece of identifying data (who you are) with which items you purchased,
how many items you purchased, what time of day you were shopping, and
other data. From these, analysts can determine whether you shop by brand
or buy whatever is on sale, whether you are purchasing different items from
before (suggesting a life change), and whether you have stopped making
your large purchases in the store and now only drop in for quick items such
as milk or sugar. In the latter case, that is a sign you switched to another
retailer for the bulk of your shopping and coupons or some other intervention may be in order. Stores have long collected customer data, long before
the age of big data, but they now possess the ability to pull in a greater variety of data and conduct more powerful analyses of the data.
Big data influences us less obviously—it informs the obscure underpinnings of our society, such as manufacturing, transportation, and energy.
Any industry developing enormous quantities of diverse data is ready
for big data. In fact, these industries probably use big data already. The
technological revolution occurring in data analytics enables more precise
allocation of resources in our evolving economy—much as the revolution
in navigational technology, from the superseded sextant to modern GPS
devices, enabled ships to navigate open seas.
Big data is much like the Internet—it has drawbacks, but its net value is
positive. The debate on big data, like political debate, tends toward misleading absolutes and false dichotomies. The truth, as in the case of political debates, almost never lies in those absolutes. Like a car, you do not start
up a big data solution and let it motor along unguided—you drive it, you
guide it, and you extract value from it.
Data itself is now an asset, one for companies to secure and hoard, much
as the Federal Reserve Bank of New York stockpiles gold (though, for the
sake of accuracy, the Federal Reserve only stores gold for countries other
than the United States). Companies invest in systems to organize and
extract value from their data, just as they would a piece of land or reserve
of raw materials. Data are bought and sold. Some companies, including