RAPIDS
Pitch Deck
6 QUESTIONS FACING EVERY AI ENTERPRISE
Top Challenges for AI, Big Data, and Enterprise Transformation
DATA DELUGE
PROLONGED TRAINING TIME
Is your data doubling each year?
Is ML training prohibitively long, delaying
time-to-predictions?
COMPLEX WORKLOADS
DELAYED INTELLIGENCE
Is Spark workloads creating relentless
infrastructure sprawl?
Are you an intelligent enterprise needing
real time predictive analytics?
TEDIOUS DATA PREP
Do you have oceans of data, that take
lifetimes to wrangle?
$
SHRINKING BUDGET
Is your CAPEX budget shrinking amidst
escalating infrastructure demand?
2
MACHINE LEARNING CHALLENGES
Days
30+
Hours to
Build GBDT
(Gradient Boosted Tree Regression)
SLOW PROCESSES
Data Transformation
Weeks
Feature Engineering
Months
Scoring Pipelines
MODEL COMPLEXITY
$3M+
More Servers and Infrastructure
Yielding Diminishing Returns
ESCALATING TCO
3
GPU-ACCELERATED DATA SCIENCE
Use Cases in Every Industry
CONSUMER INTERNET
OIL & GAS
Ad Personalization
Sensor Data Tag Mapping
Click Through Rate Optimization
Anomaly Detection
Churn Reduction
Robust Fault Prediction
FINANCIAL SERVICES
MANUFACTURING
Claim fraud
Remaining Useful Life Estimation
Customer service chatbots/routing
Failure Prediction
Risk evaluation
Demand Forecasting
HEALTHCARE
TELCO
Improve Clinical Care
Detect Network/Security Anomalies
Drive Operational Efficiency
Forecasting Network Performance
Speed Up Drug Discovery
Network Resource Optimization (SON)
RETAIL
AUTOMOTIVE
Supply Chain & Inventory Management
Personalization & Intelligent Customer Interactions
Price Management / Markdown Optimization
Connected Vehicle Predictive Maintenance
Promotion Prioritization And Ad Targeting
Forecasting, Demand, & Capacity Planning
4
ML WORKFLOW STIFLES INNOVATION
Wrangle Data
Data
Sources
ETL
Data
Lake
Data Preparation
Train
Train
Deploy
Evaluate
Predictions
Time-consuming, inefficient workflow that wastes data science productivity
5
DAY IN THE LIFE OF A DATA SCIENTIST
ANOTHER…
@*#! Forgot to Add
a Feature
GET A COFFEE
Train Model
Validate
Start Data Prep
Workflow
GET A COFFEE
Restart Data Prep
Workflow
12
Test Model
Start
GET A COFFEE
12
Experiment with
Optimizations and
Repeat
Switch to Decaf
Configure Data Prep
Workflow
9
CPU
POWERED
WORKFLOW
3
9
GPU
POWERED
WORKFLOW
3
Find Unexpected Null
Values Stored as String…
Dataset
Downloads
Overnight
6
6
Dataset
Downloads
Overnight
Restart Data Prep
Workflow Again
Stay Late
Dataset Collection
Go Home on Time
Analysis
Data Prep
Train
Inference
6
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
DATA
PREDICTIONS
DATA PREPARATION
GPUs accelerated compute for in-memory data preparation
Simplified implementation using familiar data science tools
Python drop-in Pandas replacement built on CUDA C++. GPU-accelerated Spark (in development)
7
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
DATA
PREDICTIONS
MODEL TRAINING
GPU-acceleration of today’s most popular ML algorithms
XGBoost, PCA, Kalman, K-means, k-NN, DBScan, tSVD …
8
DATA SCIENCE WORKFLOW WITH RAPIDS
Open Source, End-to-end GPU-accelerated Workflow Built On CUDA
DATA
PREDICTIONS
VISUALIZATION
Effortless exploration of datasets, billions of records in milliseconds
Dynamic interaction with data = faster ML model development
Data visualization ecosystem (Graphistry & OmniSci), integrated with RAPIDS
9
TRADITIONAL
DATA SCIENCE
CLUSTER
Workload Profile:
Fannie Mae Mortgage Data:
•
192GB data set
•
16 years, 68 quarters
•
34.7 Million single family mortgage loans
•
1.85 Billion performance records
•
XGBoost training set: 50 features
300 Servers | $3M | 180 kW
10
GPU-ACCELERATED
MACHINE
LEARNING
CLUSTER
DGX-2 and RAPIDS for
Predictive Analytics
1 DGX-2 | 10 kW
1/8 the Cost | 1/15 the Space
1/18 the Power
End-to-End
20 CPU Nodes
30 CPU Nodes
50 CPU Nodes
100 CPU Nodes
DGX-2
5x DGX-1
0
2,000
4,000
6,000
8,000
10,000
11
RAPIDS: DELIVERING DATA SCIENCE VALUE
Maximized Productivity
Top Model Accuracy
Lowest TCO
Oak Ridge
National Labs
Global
Retail Giant
Streaming Media
Company
215x
$1B
$1.5M
Speedup Using RAPIDS
with XGBoost
Potential Saving with
4% Error Rate Reduction
Infrastructure
Cost Saving
12
PILLARS OF RAPIDS PERFORMANCE
CUDA Architecture
NVLink/NVSwitch
Integrated Software
PYTHON
6x
NVLink
DASK
NVSwitch
DL
FRAMEWORKS
RAPIDS
cuDF
cuML
cuDNN
CUDA
APACHE ARROW on GPU Memory
Massively Parallel Processing
High Speed Connecting between
GPUs for Distributed Algorithms
Fully Integrated Software and
Hardware for Instant Productivity
13
FASTER SPEEDS, REAL WORLD BENEFITS
cuIO/cuDF —
Load and Data Preparation
20 CPU Nodes
cuML — XGBoost
2,741
30 CPU Nodes
715
100 CPU Nodes
20 CPU Nodes
2,290
30 CPU Nodes
1,675
50 CPU Nodes
End-to-End
1,956
50 CPU Nodes
379
20 CPU Nodes
30 CPU Nodes
1,999
100 CPU Nodes
50 CPU Nodes
100 CPU Nodes
1,948
DGX-2
42
DGX-2
169
DGX-2
5x DGX-1
19
5x DGX-1
157
5x DGX-1
0
1,000
2,000
3,000
0
500
1,000
1,500
2,000
2,500
0
2,000
4,000
6,000
8,000
10,000
Time in seconds — Shorter is better
cuIO / cuDF (Load and Data Preparation)
Data Conversion
XGBoost
Benchmark
CPU Cluster Configuration
DGX Cluster Configuration
200GB CSV dataset; Data preparation
includes joins, variable transformations.
CPU nodes (61 GiB of memory, 8 vCPUs,
64-bit platform), Apache Spark
5x DGX-1 on InfiniBand network
14
SELECTING THE RIGHT RAPIDS SOLUTION
Unparalleled Data Science Performance and Productivity
ML Enthusiast
Machine Learning Developer
Data Center Machine Learning
Data Science Workstations
Shared infrastructure for Data Science Teams
TITAN RTX
Quadro Workstation
DGX Station
DGX-1 / HGX-1 / OEM
DGX-2 / HGX-2 / OEM
Benefit
PC solution, easy to
acquire, deploy and get
started experimenting
Enterprise workstation for
experienced data
scientists
Enterprise ML
workgroups, largest
memory on a workstation
Enterprise server, proven
8-way configuration,
modular approach for
scale, multi-node training
Largest compute and
memory capacity in single
node, fastest training
solution
GPU Memory
48GB
64GB
128GB
256GB
512GB
GPU Fabric
2-way
NVLINK
2-way
NVLINK
4-way
NVLINK
8-way
NVLINK
16-way
NVSWITCH
End-to-end portfolio optimized for RAPIDS
15
WIDESPREAD SUPPORT FOR RAPIDS
Open Source
Community
Enterprise Data Science
Platforms
Deep Learning
Integration
Startups
RAPIDS
GPU Servers
Storage Partners
* Spark and Hadoop support coming soon
16
TRANSFORMING RETAIL WITH RAPIDS
Inventory Forecast
180x
speedup using RAPIDS
with cuDF
10 stores
1 million rows
600 stores
60 million rows
“My previous bottleneck was I/O. …15 seconds to pull in data for 10 stores (about 1 Million rows).
With RAPIDS, we can pull in data for about 600 stores (60 Million rows) in less than 5 seconds. … just
plain awesome.”
— A mid-market specialty retailer with 4800 stores
17
TRANSFORM STREAMING MEDIA
RECOMMENDATION SYSTEM WITH RAPIDS
$1.5M
Infrastructure
Cost Saving with 24x
Speed-up on XGBoost
Hundreds of CPUs
1 GPU
Increase customer retention | Higher customer satisfaction | Increase revenue
“I got 24x speedup using RAPIDS XGBOOST and can now replace hundreds of CPU nodes running
my biggest ML workload on a single node with 8 GPUs. You made XGBOOST too fast!?”
— Streaming Media Company
18
PREDICT EPIDEMIC DISEASE
IN HEALTHCARE WITH RAPIDS
80x
speedup on
GPU-accelerated XGBoost
Days on CPUs
Hours on GPU
“Early precaution of epidemic disease is now possible with 80x faster training time on RAPIDS.”
— Dr. Jian Zong Wang, Vice Chief Engineer and Senior AI Director
(from the Largest Insurance and Internet Finance Company in China)
19
FOR MORE INFORMATION
www.nvidia.com/datascience
www.rapids.ai
20