Tải bản đầy đủ (.pdf) (53 trang)

Big data too big to ignore

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (13.78 MB, 53 trang )

Big Data
Too Big To Ignore


Geert

!   Big Data Consultant and Manager
!   Currently finishing a 3rd Big Data project
!   IBM & Cloudera Certified
!   IBM & Microsoft Big Data Partner
2


Agenda
!   Defining Big Data
!   Introduction to Hadoop

3


Our Vision

Volume
Big Data


Big Data

Velocity



Our Vision

Volume

Variety


Big Data Technical Drivers

7


Big Data Business Drivers
Do More

ANALYTICS

with Less

COSTS

8


McKinsey
Forrester
Research

Gartner


9


Transformation of Online Marketing

BLOGS.FORBES.COM/DAVEFEINLEIB
10


Transformation of Customer Service

BLOGS.FORBES.COM/DAVEFEINLEIB
11


Big Data Definition

Big Data Technologies allow you to implement
Use Cases which Legacy Technologies can’t.

12


Implementing Big Data
Our Vision on Data


Current Situation

14



Our Vision #1

Focus on Data not on Derived Data

15


Our Vision #2

Data is immutable

16


Our Vision #3

Query = function (all data)

17


Concept

18


Introducing
The Hadoop Ecosystem



Context: Performance Gap Trend

20


Context: Exponential for Decades
!   Abundance of

-  computing & storage
-  generated data (estimated 8ZB in
-  things

15)

!   More data provides greater value
!   Traditional data doesn t scale well
!   It s time for a new approach!

21


New Hardware Approach
Traditional

Big Data

!  Exotic HW


!  Commodity HW

-  big central servers
-  SAN
-  RAID
!  Hardware reliability
!  Limited scalability
!  Expensive

-  racks of pizza boxes
-  Ethernet
-  JBOD
!  Unreliable HW
!  Scales further
!  Cost effective

22


New Software Approach
Traditional

Big Data

!  Monolotic

!  Distributed

-  Centralized
-  RDBMS

!  Schema first

-  storage & compute nodes
!  Raw data
!  Open source

!  Proprietary

23


Hadoop
!   De facto big data industry standard (batch)
!   Vendor adoption

-  IBM, Microsoft, Oracle, EMC, ...
!   A collection of projects at Apache

-  HDFS, MapReduce, Hive, Pig, Hbase, Flume, Oozie, ...
!   Main components

-  HDFS
-  MapReduce
!   Cluster

-  Set of machines running HDFS and MapReduce
24


Distributions

!  

Cloudera

- 
- 
- 

www.cloudera.com
Cloudera Enterprise subscription
Currently CDH3
§  Linux package
§  Virtual machine
§  Cloud

- 

Stack
§  hadoop, hbase, hive, pig, mahout, flume, ...

- 
- 

Cloudera SCM
Connectors for Teradata, Netezza, Microstrategy and Quest
25


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×