Paul Zikopoulos
Director - IM WW Technical Professionals, WW Competitive Database,
WW Big Data Tiger Team
What is Big Data?
© 2009 IBM Corporation
Agenda
What is Big Data?
What makes Big Data different?
What can you do with Big Data?
Big Data use cases
The IBM Big Data platform
Getting started
2
© 2012 IBM Corporation
New IM Technology Trends
Information Integration and Governance & Big Data
Trusted
Relevant
Governed
Transactional
& Collaborative
Applications
Analyze
Integrate
Business Analytics
Applications
Content
Big Data
Manage
Master
Data
Cubes
Streams
Data
External
Information
Sources
Data
Warehouses
Content
Information
Governance
Streaming
Information
Govern
Security &
Privacy
Quality
3
Standards
Lifecycle
© 2012 IBM Corporation
4
© 2012 IBM Corporation
…by the end of 2011, this was about
30 billion and growing even faster
In 2005 there were 1.3 billion RFID
tags in circulation…
An increasingly sensor-enabled and instrumented
business environment generates HUGE volumes of
data with MACHINE SPEED characteristics…
5
1 BILLION lines of code
EACH engine generating 10 TB every 30 minutes!
© 2012 IBM Corporation
350B
Transactions/Year
Meter Reads
every 15 min.
120M – meter reads/month
6
3.65B – meter reads/day
© 2012 IBM Corporation
In August of 2010, Adam
Savage, of “Myth Busters,”
took a photo of his vehicle
using his smartphone. He
then posted the photo to his
Twitter account including the
phrase “Off to work.”
Since the photo was taken by
his smartphone, the image
contained metadata revealing
the exact geographical
location the photo was taken
By simply taking and posting
a photo, Savage revealed the
exact location of his home,
the vehicle he drives, and the
time he leaves for work
7
© 2012 IBM Corporation
The Social Layer in a Instrumented Interconnected World
30 billion RFID
12+ TBs
tags today
(1.3B in 2005)
devices
sold
annually
2+
billion
25+ TBs of
log data
every day
76 million smart
meters in 2009…
200M by 2014
8
camera
phones
world
wide
100s of
millions
of GPS
enabled
data every day
? TBs of
of tweet data
every day
4.6
billion
people
on the
Web by
end 2011
© 2012 IBM Corporation
Twitter Tweets per Second Record Breakers of 2011
9
© 2012 IBM Corporation
Can a Social Media Persona be Monetized?
10
© 2012 IBM Corporation
Extract Intent, Life Events, Micro Segmentation Attributes
Chloe
Name, Birthday, Family
Tom Sit
Not Relevant - Noise
Tina Mu
Monetizable Intent
Jo Jobs
Not Relevant - Noise
11
Location
Wishful Thinking
Relocation
SPAMbots
Monetizable Intent
© 2012 IBM Corporation
1.8 ZB
1 ZB
1 ZB=1T GB
4Trillion
8GB
iPods
12
© 2012 IBM Corporation
What is “BIG DATA”?
All kinds of data
Large volumes
Valuable insight, but difficult to extract
Often extremely time sensitive
13
© 2012 IBM Corporation
What makes big data technology different?
Jobs distributed across affordable hardware.
Manages and analyzes all kinds of data.
Analyzes data in native format.
14
© 2012 IBM Corporation
Big Data Includes Any of the following Characteristics
Extracting insight from an immense volume, variety and velocity of data,
in context, beyond what was previously possible
Variety:
Manage the complexity of
data in many different
structures, ranging from
relational, to logs,
to raw text
Velocity: Streaming data and large
volume data movement
Volume: Scale from Terabytes to
Petabytes (1K TBs) to
Zetabytes (1B TBs)
15
© 2012 IBM Corporation
What can you do with big data?
Analyze a Variety of Information
Analyze Information in Motion
Social media/sentiment
analysis
Geospatial analysis
Brand strategy
Scientific research
Epidemic early warning
system
Market analysis
Video analysis
Audio analysis
Smart Grid management
Multimodal surveillance
Real-time promotions
Cyber security
ICU monitoring
Options trading
Click-stream analysis
CDR processing
IT log analysis
RFID tracking & analysis
Discovery & Experimentation
Analyze Extreme Volumes
of Information
Transaction analysis to create insight-based
product/service offerings
Fraud modeling & detection
Risk modeling & management
Social media/sentiment analysis
Environmental analysis
Manage
and Plan
Sentiment analysis
Brand strategy
Scientific research
Ad-hoc analysis
Model development
Hypothesis testing
Transaction analysis to create
insight-based product/service
offerings
Operational analytics – BI reporting
Planning and forecasting analysis
Predictive analysis
…
16
© 2012 IBM Corporation
Applications for Big Data Analytics
Smarter Healthcare
Multi-channel
sales
Finance
Log Analysis
Homeland Security
Traffic Control
Telecom
Search Quality
Fraud and Risk
Retail: Churn, NBO
Manufacturing
17
Trading Analytics
© 2012 IBM Corporation
What can you do with big data?
Financial Services
Fraud detection
Risk management
360° View of the Customer
Transportation
Weather and traffic
impact on logistics
and fuel
consumption
Health & Life Sciences
Epidemic early warning
system
ICU monitoring
Remote healthcare
monitoring
Telecommunications
CDR processing
Churn prediction
Geomapping / marketing
Network monitoring
18
Utilities
Weather impact analysis on
power generation
Transmission monitoring
Smart grid management
IT
Transition log
analysis for multiple
transactional
systems
Cybersecurity
Retail
360° View of the Customer
Click-stream analysis
Real-time promotions
Law Enforcement
Real-time multimodal surveillance
Situational awareness
Cyber security detection
© 2012 IBM Corporation
The Big Data Conundrum
The economies of deletion have changed….
– Leading us into new opportunities and challenges
The percentage of available data an enterprise can analyze is
decreasing proportionately to the available to that enterprise
Quite simply, this means as enterprises, we are getting
“more naive” about our business over time
Data AVAILABLE to
an organization
Data an organization
can PROCESS
19
© 2012 IBM Corporation
Public wind data is available on 284km
x 284 km grids (2.5o LAT/LONG)
More data means more accurate and
richer models (adding hundreds of
variables)
- Vestas wind library at 2.5 PB: to grow to
over 6 PB in the near-term
- Granularity 27km x 27km grids: driving to
9x9, 3x3 to 10m x 10m simulations
Reduced turbine placement
identification from weeks to hours
20
20
Perspective: The Vestas Wind library,
as HD TV would take 70 years ©to
watch
2012 IBM Corporation
Optimize building energy
consumption with centralized
monitoring and control of
building monitoring system
Automates preventive and
corrective maintenance of
building corrective systems
Uses Streams, InfoSphere
BigInsights and Cognos
21
21
-
Log Analytics
Energy Bill Forecasting
Energy consumption optimization
Detection of anomalous usage
Presence-aware energy mgt.
Policy enforcement © 2012 IBM Corporation
Supply Chain Recommendation for Natural Disasters
Capture market
data to calculate
cost of
stock outs
(high volume)
Capture weather sensor data, analyses hurricane
predicted path
22
Estimate
impact on
inventories
Compute shipping
and logistics costs
Make
recommendations
and notify
DHTML Result
rendering
© 2012 IBM Corporation
Correlate combined risk and
impending weather threats to
optimize inventory and
determine supply chain
recommendations
Dynamically updated
risk assessment
for assets in
projected path
Real-time projections
of hurricane path
23
© 2012 IBM Corporation
Bigger and Bigger Volumes of Data
Retailers collect click-stream data from Web site interactions and loyalty card-drive transaction data
– This traditional POS information is used by retailer for shopping basket analysis, inventory
replenishment, +++
– But data is being provided to suppliers for customer buying analysis
Healthcare has traditionally been dominated by paper-based systems, but this information is getting
digitized
Science is increasingly dominated by big science initiatives
– Large-scale experiments generate over 15 PB of data a year and can’t be stored within the data center;
then sent to laboratories
Financial services are seeing larger volumes through smaller trading sizes, increased market
volatility, and technological improvements in automated and algorithmic trading
Improved instrument and sensory technology
– Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per year or consider
Oil and Gas industry
24
© 2012 IBM Corporation
Monetizing Relationships, Not Just Transactions
Calling Network
Amy Bearn
How valuable is Amy to my mobile
phone network? How likely is she to
switch carriers? How many other
customers will follow
Retailer
32, Married, mother of 3,
Accountant
Telco Score: 91
CPG Score: 76
Fashion Score: 88
Telco
company
Merged Network
Social Network
25
Public
Database
How valuable is Amy to my retail
sales? Who does she influence?
What do they spend?
© 2012 IBM Corporation