Tải bản đầy đủ (.pptx) (36 trang)

Thuyết trình big data

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.57 MB, 36 trang )

Big Data
GVGD: TS. Nguyễn Đức Thái

NHÓM 1


Memory storage…

Computer Memory: 640K Ought to
be Enough for Anyone


How much data?
 7 billion people
 Google processes 100 PB/day; 3 million servers
 Facebook has 300 PB + 500 TB/day; 35% of
world’s photos
 YouTube 1000 PB video storage; 4 billion
views/day
 Twitter processes 124 billion tweets/year
 SMS messages – 6.1T per year
 US Cell Calls – 2.2T minutes per year
 US Credit cards - 1.4B Cards; 20B
transactions/year
3


Contents

1. Big Data Overview
2. Big Data Technology Today


3. SQL vs NoSQL
4. Big Data Security
5. Big data trends
6. Demo with MongoDB & Ref docs


1. Big Data Overview (tt)

“Big data is not a single technology
but a combination of old and new
tech-nologies that helps companies
gain actionable insight”.
(“Big Data For DummiesPublished by John Wiley & Sons, Inc. ”
book reference)


1. Big Data Overview (tt)


Characteristics of Big Data


Sources of Big Data

Social Media
Website

ERP

Network Switches

RFID


Examining Big Data Types
 Structured Data


Structured Data(…)
Computer- or machine-generated:
Machine-generated data generally
refers to data that is created by a
machine without human intervention.
(Sensor data, Web log data, Point-ofsale data, Financial data…)
Human-generated: This is data that
humans, in interaction with
computers, supply (Input data, Clickstream data, Gaming-related data…)


Examining Big Data Types
 Unstructured Data


Unstructured Data(…)
Unstructured data is everywhere
Machine-generated unstructured
data: Satellite images, Scientific data,
Photographs and video, Radar or sonar
data…
 Human-generated unstructured
data:Text internal to your company,

Social media data, Mobile data…


Managing different data types


Managing different data types
Integrating data types into a big data
environment need:
Connectors: enable you to pull data
in from various big data sources
Metadata is the definitions,
mappings, and other characteristics
used to describe how to find, access,
and use a company’s data (and
software) components


What will we do with Big Data?
Analysis &
Processing

Analysis

• Querying
• Statistic
• Modeling
• Data Mining
• Text
analytics


Processing

• Data storage
• Data transfer
• Data
monitoring


Quiz….?

How to store and
handle Big Data?


2. Big Data Technology Today
 Storage…NoSQL Database


2.Big Data Technology Today(tt)
 Processing


2.Big Data Technology Today(tt)
 The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models.



2.Big Data Technology Today(tt)
 Instead of treating
memory as a cache,
why not treat it as a
primary data store?
 Facebook keeps 80% of its
data in Memory (Stanford
research)
 RAM is 100-1000x faster
than Disk (Random seek)
• Disk - 5 -10ms
• RAM – x0.001msec

Events
FACEBOOK

Memory Grid
Data Grid

FACEBOOK
Data Grid

FACEBOOK
Data Grid

20


2.Big Data Technology Today(tt)
 Transfer data:



2.Big Data Technology Today(tt)
 Open-source software framework from
Apache Hadoop
 Google MapReduce
 GFS (Google File System)

 HDFS
 Map/Reduce


3. SQL vs NoSQL

File

SQL
DBMS

Data
storage

NoSQL


3. SQL vs NoSQL (…)
A relational database is a set of tables
containing data fitted into predefined
categories.


Each table contains one or more data
categories in columns.
Each row contains a unique instance of
data for the categories defined by the
columns.


3. SQL vs NoSQL (…)
 Key-value stores. As the name implies, a
key-value store is a system that stores
values indexed for retrieval by keys.

Some of the market
leaders:
Riak
Amazon Dynamo
Voldermort


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×