Big Data
GVGD: TS. Nguyễn Đức Thái
NHÓM 1
Memory storage…
Computer Memory: 640K Ought to
be Enough for Anyone
How much data?
7 billion people
Google processes 100 PB/day; 3 million servers
Facebook has 300 PB + 500 TB/day; 35% of
world’s photos
YouTube 1000 PB video storage; 4 billion
views/day
Twitter processes 124 billion tweets/year
SMS messages – 6.1T per year
US Cell Calls – 2.2T minutes per year
US Credit cards - 1.4B Cards; 20B
transactions/year
3
Contents
1. Big Data Overview
2. Big Data Technology Today
3. SQL vs NoSQL
4. Big Data Security
5. Big data trends
6. Demo with MongoDB & Ref docs
1. Big Data Overview (tt)
“Big data is not a single technology
but a combination of old and new
tech-nologies that helps companies
gain actionable insight”.
(“Big Data For DummiesPublished by John Wiley & Sons, Inc. ”
book reference)
1. Big Data Overview (tt)
Characteristics of Big Data
Sources of Big Data
Social Media
Website
ERP
Network Switches
RFID
Examining Big Data Types
Structured Data
Structured Data(…)
Computer- or machine-generated:
Machine-generated data generally
refers to data that is created by a
machine without human intervention.
(Sensor data, Web log data, Point-ofsale data, Financial data…)
Human-generated: This is data that
humans, in interaction with
computers, supply (Input data, Clickstream data, Gaming-related data…)
Examining Big Data Types
Unstructured Data
Unstructured Data(…)
Unstructured data is everywhere
Machine-generated unstructured
data: Satellite images, Scientific data,
Photographs and video, Radar or sonar
data…
Human-generated unstructured
data:Text internal to your company,
Social media data, Mobile data…
Managing different data types
Managing different data types
Integrating data types into a big data
environment need:
Connectors: enable you to pull data
in from various big data sources
Metadata is the definitions,
mappings, and other characteristics
used to describe how to find, access,
and use a company’s data (and
software) components
What will we do with Big Data?
Analysis &
Processing
Analysis
• Querying
• Statistic
• Modeling
• Data Mining
• Text
analytics
Processing
• Data storage
• Data transfer
• Data
monitoring
Quiz….?
How to store and
handle Big Data?
2. Big Data Technology Today
Storage…NoSQL Database
2.Big Data Technology Today(tt)
Processing
2.Big Data Technology Today(tt)
The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models.
2.Big Data Technology Today(tt)
Instead of treating
memory as a cache,
why not treat it as a
primary data store?
Facebook keeps 80% of its
data in Memory (Stanford
research)
RAM is 100-1000x faster
than Disk (Random seek)
• Disk - 5 -10ms
• RAM – x0.001msec
Events
FACEBOOK
Memory Grid
Data Grid
FACEBOOK
Data Grid
FACEBOOK
Data Grid
20
2.Big Data Technology Today(tt)
Transfer data:
2.Big Data Technology Today(tt)
Open-source software framework from
Apache Hadoop
Google MapReduce
GFS (Google File System)
HDFS
Map/Reduce
3. SQL vs NoSQL
File
SQL
DBMS
Data
storage
NoSQL
3. SQL vs NoSQL (…)
A relational database is a set of tables
containing data fitted into predefined
categories.
Each table contains one or more data
categories in columns.
Each row contains a unique instance of
data for the categories defined by the
columns.
3. SQL vs NoSQL (…)
Key-value stores. As the name implies, a
key-value store is a system that stores
values indexed for retrieval by keys.
Some of the market
leaders:
Riak
Amazon Dynamo
Voldermort