Tải bản đầy đủ (.pdf) (21 trang)

RAPIDS pitch deck QUESTIONS FACING EVERY AI

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.02 MB, 21 trang )

RAPIDS
Pitch Deck


6 QUESTIONS FACING EVERY AI ENTERPRISE
Top Challenges for AI, Big Data, and Enterprise Transformation

DATA DELUGE

PROLONGED TRAINING TIME

Is your data doubling each year?

Is ML training prohibitively long, delaying
time-to-predictions?

COMPLEX WORKLOADS

DELAYED INTELLIGENCE

Is Spark workloads creating relentless
infrastructure sprawl?

Are you an intelligent enterprise needing
real time predictive analytics?

TEDIOUS DATA PREP
Do you have oceans of data, that take
lifetimes to wrangle?

$



SHRINKING BUDGET
Is your CAPEX budget shrinking amidst
escalating infrastructure demand?

2


MACHINE LEARNING CHALLENGES
Days

30+
Hours to
Build GBDT

(Gradient Boosted Tree Regression)

SLOW PROCESSES

Data Transformation

Weeks
Feature Engineering

Months
Scoring Pipelines

MODEL COMPLEXITY

$3M+

More Servers and Infrastructure
Yielding Diminishing Returns

ESCALATING TCO

3


GPU-ACCELERATED DATA SCIENCE
Use Cases in Every Industry
CONSUMER INTERNET

OIL & GAS

Ad Personalization

Sensor Data Tag Mapping

Click Through Rate Optimization

Anomaly Detection

Churn Reduction

Robust Fault Prediction

FINANCIAL SERVICES

MANUFACTURING


Claim fraud

Remaining Useful Life Estimation

Customer service chatbots/routing

Failure Prediction

Risk evaluation

Demand Forecasting

HEALTHCARE

TELCO

Improve Clinical Care

Detect Network/Security Anomalies

Drive Operational Efficiency

Forecasting Network Performance

Speed Up Drug Discovery

Network Resource Optimization (SON)

RETAIL


AUTOMOTIVE

Supply Chain & Inventory Management

Personalization & Intelligent Customer Interactions

Price Management / Markdown Optimization

Connected Vehicle Predictive Maintenance

Promotion Prioritization And Ad Targeting

Forecasting, Demand, & Capacity Planning
4


ML WORKFLOW STIFLES INNOVATION
Wrangle Data

Data
Sources

ETL

Data
Lake

Data Preparation

Train


Train

Deploy

Evaluate

Predictions

Time-consuming, inefficient workflow that wastes data science productivity

5


DAY IN THE LIFE OF A DATA SCIENTIST
ANOTHER…

@*#! Forgot to Add
a Feature

GET A COFFEE

Train Model
Validate

Start Data Prep
Workflow
GET A COFFEE

Restart Data Prep

Workflow

12

Test Model

Start
GET A COFFEE

12

Experiment with
Optimizations and
Repeat

Switch to Decaf

Configure Data Prep
Workflow

9

CPU
POWERED
WORKFLOW

3

9


GPU
POWERED
WORKFLOW

3

Find Unexpected Null
Values Stored as String…
Dataset
Downloads
Overnight

6

6

Dataset
Downloads
Overnight

Restart Data Prep
Workflow Again
Stay Late

Dataset Collection

Go Home on Time

Analysis


Data Prep

Train

Inference
6


DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA

DATA

PREDICTIONS

DATA PREPARATION
GPUs accelerated compute for in-memory data preparation
Simplified implementation using familiar data science tools
Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development)
7


DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA

DATA

PREDICTIONS

MODEL TRAINING

GPU-acceleration of today’s most popular ML algorithms
XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD …

8


DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA

DATA

PREDICTIONS

VISUALIZATION
Effortless exploration of datasets, billions of records in milliseconds
Dynamic interaction with data = faster ML model development
Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS
9


TRADITIONAL
DATA SCIENCE
CLUSTER
Workload Profile:
Fannie Mae Mortgage Data:


192GB data set




16 years, 68 quarters



34.7 Million single family mortgage loans



1.85 Billion performance records



XGBoost training set: 50 features

300 Servers | $3M | 180 kW
10


GPU-ACCELERATED
MACHINE
LEARNING
CLUSTER
DGX-2 and RAPIDS for
Predictive Analytics
1 DGX-2 | 10 kW
1/8 the Cost | 1/15 the Space
1/18 the Power
End-to-End
20 CPU Nodes

30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
0

2,000

4,000

6,000

8,000

10,000
11


RAPIDS: DELIVERING DATA SCIENCE VALUE

Maximized Productivity

Top Model Accuracy

Lowest TCO

Oak Ridge
National Labs


Global
Retail Giant

Streaming Media
Company

215x

$1B

$1.5M

Speedup Using RAPIDS
with XGBoost

Potential Saving with
4% Error Rate Reduction

Infrastructure
Cost Saving

12


PILLARS OF RAPIDS PERFORMANCE
CUDA Architecture

NVLink/NVSwitch

Integrated Software

PYTHON
6x
NVLink
DASK

NVSwitch

DL
FRAMEWORKS

RAPIDS
cuDF

cuML

cuDNN

CUDA
APACHE ARROW on GPU Memory

Massively Parallel Processing

High Speed Connecting between
GPUs for Distributed Algorithms

Fully Integrated Software and
Hardware for Instant Productivity

13



FASTER SPEEDS, REAL WORLD BENEFITS
cuIO/cuDF —
Load and Data Preparation
20 CPU Nodes

cuML — XGBoost
2,741

30 CPU Nodes
715

100 CPU Nodes

20 CPU Nodes

2,290

30 CPU Nodes

1,675

50 CPU Nodes

End-to-End

1,956

50 CPU Nodes


379

20 CPU Nodes
30 CPU Nodes

1,999

100 CPU Nodes

50 CPU Nodes
100 CPU Nodes

1,948

DGX-2

42

DGX-2

169

DGX-2

5x DGX-1

19

5x DGX-1


157

5x DGX-1

0

1,000

2,000

3,000

0

500

1,000

1,500

2,000

2,500

0

2,000

4,000


6,000

8,000

10,000

Time in seconds — Shorter is better
cuIO / cuDF (Load and Data Preparation)

Data Conversion

XGBoost

Benchmark

CPU Cluster Configuration

DGX Cluster Configuration

200GB CSV dataset; Data preparation
includes joins, variable transformations.

CPU nodes (61 GiB of memory, 8 vCPUs,
64-bit platform), Apache Spark

5x DGX-1 on InfiniBand network
14


SELECTING THE RIGHT RAPIDS SOLUTION

Unparalleled Data Science Performance and Productivity
ML Enthusiast

Machine Learning Developer

Data Center Machine Learning

Data Science Workstations

Shared infrastructure for Data Science Teams

TITAN RTX

Quadro Workstation

DGX Station

DGX-1 / HGX-1 / OEM

DGX-2 / HGX-2 / OEM

Benefit

PC solution, easy to
acquire, deploy and get
started experimenting

Enterprise workstation for
experienced data
scientists


Enterprise ML
workgroups, largest
memory on a workstation

Enterprise server, proven
8-way configuration,
modular approach for
scale, multi-node training

Largest compute and
memory capacity in single
node, fastest training
solution

GPU Memory

48GB

64GB

128GB

256GB

512GB

GPU Fabric

2-way

NVLINK

2-way
NVLINK

4-way
NVLINK

8-way
NVLINK

16-way
NVSWITCH

End-to-end portfolio optimized for RAPIDS

15


WIDESPREAD SUPPORT FOR RAPIDS
Open Source
Community

Enterprise Data Science
Platforms

Deep Learning
Integration

Startups


RAPIDS

GPU Servers

Storage Partners

* Spark and Hadoop support coming soon

16


TRANSFORMING RETAIL WITH RAPIDS
Inventory Forecast

180x

speedup using RAPIDS
with cuDF

10 stores
1 million rows

600 stores
60 million rows

“My previous bottleneck was I/O. …15 seconds to pull in data for 10 stores (about 1 Million rows).
With RAPIDS, we can pull in data for about 600 stores (60 Million rows) in less than 5 seconds. … just
plain awesome.”
— A mid-market specialty retailer with 4800 stores

17


TRANSFORM STREAMING MEDIA
RECOMMENDATION SYSTEM WITH RAPIDS

$1.5M

Infrastructure
Cost Saving with 24x
Speed-up on XGBoost

Hundreds of CPUs

1 GPU

Increase customer retention | Higher customer satisfaction | Increase revenue

“I got 24x speedup using RAPIDS XGBOOST and can now replace hundreds of CPU nodes running
my biggest ML workload on a single node with 8 GPUs. You made XGBOOST too fast!?”
— Streaming Media Company
18


PREDICT EPIDEMIC DISEASE
IN HEALTHCARE WITH RAPIDS

80x

speedup on

GPU-accelerated XGBoost

Days on CPUs

Hours on GPU

“Early precaution of epidemic disease is now possible with 80x faster training time on RAPIDS.”
— Dr. Jian Zong Wang, Vice Chief Engineer and Senior AI Director
(from the Largest Insurance and Internet Finance Company in China)
19


FOR MORE INFORMATION

www.nvidia.com/datascience

www.rapids.ai
20




×