© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Solutions Architect, Amazon Web Services Japan Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE Ryosuke Iwanaga October 2016
Agenda • Recommendation and DSSTNE • Data science productivity with AWS Note: Details are not the actual Amazon case, but general pattern
Recommendation and DSSTNE
Product Recommendations What are people who bought items A, B, C … Z most likely to purchase next?
Input and Output Input Purchase history for each customer Output Possibility to buy each products for each customer
Machine Learning for Recommendation Lots of algorithms Matrix Factorization Logistic Regression Naïve Bayes etc. => Neural Network
Neural Networks for Product Recommendations Output (10K-10M) Input (10K-10M) Hidden (100-1K)
This Is A Huge Sparse Data Problem l Uncompressed sparse data either eats a lot of memory or it eats a lot of bandwidth uploading it to the GPU l Naively running networks with uncompressed sparse data leads to lots of multiplications of zero by zero. This wastes memory, power, and time l Product Recommendation Networks can have billions of parameters that cannot fit in a single GPU so summarizing...
Framework Requirements (2014) l Efficient support for large input and output layers l Efficient handling of sparse data (i.e. don't store zero) l Automagic multi-GPU support for large networks and scaling l Avoids multiplying zero and/or by zero l 24 hour or less training and recommendations turnaround l Human-readable descriptions of networks
DSSTNE: Deep Sparse Scalable Tensor Network Engine* l A Neural Network framework released into OSS by Amazon l Optimized for large sparse data problems and for fully connected layers l Extremely efficient model-parallel multi-GPU support l 100% Deterministic Execution l Full SM 3.x, 5.x, and 6.x support (Kepler or better GPUs) l Distributed training support OOTB (~20 lines of MPI calls) *”Destiny”
Describes Neural Networks As JSON Objects{ "Version" : 0.7, "Name" : "AE", "Kind" : "FeedForward", "SparsenessPenalty" : { "p" : 0.5, "beta" : 2.0 }, "ShuffleIndices" : false, "Denoising" : { "p" : 0.2 }, "ScaledMarginalCrossEntropy" : { "oneTarget" : 1.0, "zeroTarget" : 0.0, "oneScale" : 1.0, "zeroScale" : 1.0 }, "Layers" : [ { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, { "Name" : "Hidden", "Kind" : "Hidden", "Type" : "FullyConnected", "N" : 128, "Activation" : "Sigmoid", "Sparse" : true }, { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true } ], "ErrorFunction" : "ScaledMarginalCrossEntropy" }
Summary for DSSTNE Very efficient performance for sparse fully-connected NN Multiple GPU by Model parallel and Data parallel Declare NN by human readable format JSON definition 100% Deterministic execution
Data science productivity with AWS
Productivity Agile iteration is the most important for productivity design=>train=>predict=>evaluate=>design=>… Training: GPU (DSSTNE and others) Pre/Post process: CPU How to unify these different workload? Data scientists don't want to use too much tools
What are Containers? OS virtualization Process isolation Images AutomationServer Guest OS Bins/Libs Bins/Libs App2App1
Deep Learning meets Docker(Container) A lot of Deep Learning frameworks DSSTNE, Caffe, Theano, TensorFlow, etc. To compare each framework using the same input and output Containerize each framework Just swap the container image and configuration No more worry about setup machines!
Spark moves at interactive speed join filter groupBy Stage 3 Stage 1 Stage 2 A: B: C: D: E: F: = cached partition= RDD map • Massively parallel • Uses DAGs instead of map- reduce for execution • Minimizes I/O by storing data in DataFrames in memory • Partitioning-aware to avoid network-intensive shuffle
Apache Zeppelin notebook to develop queries
Architecture
Control CPU cluster and GPU cluster Both CPU and GPU jobs are submitted via Spark driver CPU jobs: Normal Spark tasks running on Amazon EMR GPU jobs: Spark submits jobs to Amazon ECS Not only DSSTNE but also other DL frameworks by Docker
Amazon EMR
Why EMR? Automation Decouple Elastic Integration Low-costCurrent
Why EMR? Automation EC2 Provisioning Cluster Setup Hadoop Configuration Installing ApplicationsJob submissionMonitoring and Failure Handling
Why EMR? Decoupled Architecture Separate compute and storage Resize and shutdown with no data loss Point multiple clusters ad the same data on Amazon S3 Easily evolve infrastructure as technology evolves HDFS for iterative and disk I/O intensive workloads Save with spot and reserved instances
Why EMR? Decouple Storage and Compute Amazon Kinesis (Streams, Firehose) Hadoop Jobs Persistent Cluster – Interactive Queries (Spark-SQL | Presto | Impala) Transient Cluster - Batch Jobs (X hours nightly) – Add/Remove Nodes ETL Jobs Hive External Metastore i.e Amazon RDS Workload specific clusters (Different sizes, Different Versions) Amazon S3 for Storage create external table t_name(..)... location s3://bucketname/path-to-file/
EMR 5.0 - Applications
Amazon ECS
Amazon EC2 Container Service (ECS) Container Management at Any Scale Flexible Container Placement Integration with the AWS Platform
Components of Amazon ECS Task Actual containers running on Instances Task Definition Definition of containers and environment for task Cluster Fleet of EC2 instances on which tasks run Manager Manage cluster resource and state of tasks Scheduler Place tasks considering cluster status Agent Coordinate EC2 instances and Manager
How Amazon ECS runs Task Scheduler ManagerCluster Task Definition Task Agent
Integration with Spark and ECS Install AWS SDK for Java on the EMR cluster Create Task Definition for each Deep Learning framework Call RunTask API ECS Scheduler will try to find enough space to run it
Training: Model parallel
Prediction: Data parallel
Why AWS? Scalability Fully-managed services GPU instances
Summary
Amazon Personalization runs on AWS Spark and Zeppelin for the single interface for data scientists DSSTNE helps running DL on a huge amount of sparse NN Using Amazon EMR for CPU and Amazon ECS for GPU You can do it!
Thank you!

Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE

  • 1.
    © 2016, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Solutions Architect, Amazon Web Services Japan Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE Ryosuke Iwanaga October 2016
  • 2.
    Agenda • Recommendation andDSSTNE • Data science productivity with AWS Note: Details are not the actual Amazon case, but general pattern
  • 3.
  • 4.
    Product Recommendations What arepeople who bought items A, B, C … Z most likely to purchase next?
  • 5.
    Input and Output Input Purchasehistory for each customer Output Possibility to buy each products for each customer
  • 6.
    Machine Learning forRecommendation Lots of algorithms Matrix Factorization Logistic Regression Naïve Bayes etc. => Neural Network
  • 7.
    Neural Networks forProduct Recommendations Output (10K-10M) Input (10K-10M) Hidden (100-1K)
  • 8.
    This Is AHuge Sparse Data Problem l Uncompressed sparse data either eats a lot of memory or it eats a lot of bandwidth uploading it to the GPU l Naively running networks with uncompressed sparse data leads to lots of multiplications of zero by zero. This wastes memory, power, and time l Product Recommendation Networks can have billions of parameters that cannot fit in a single GPU so summarizing...
  • 9.
    Framework Requirements (2014) lEfficient support for large input and output layers l Efficient handling of sparse data (i.e. don't store zero) l Automagic multi-GPU support for large networks and scaling l Avoids multiplying zero and/or by zero l 24 hour or less training and recommendations turnaround l Human-readable descriptions of networks
  • 10.
    DSSTNE: Deep SparseScalable Tensor Network Engine* l A Neural Network framework released into OSS by Amazon l Optimized for large sparse data problems and for fully connected layers l Extremely efficient model-parallel multi-GPU support l 100% Deterministic Execution l Full SM 3.x, 5.x, and 6.x support (Kepler or better GPUs) l Distributed training support OOTB (~20 lines of MPI calls) *”Destiny”
  • 11.
    Describes Neural NetworksAs JSON Objects{ "Version" : 0.7, "Name" : "AE", "Kind" : "FeedForward", "SparsenessPenalty" : { "p" : 0.5, "beta" : 2.0 }, "ShuffleIndices" : false, "Denoising" : { "p" : 0.2 }, "ScaledMarginalCrossEntropy" : { "oneTarget" : 1.0, "zeroTarget" : 0.0, "oneScale" : 1.0, "zeroScale" : 1.0 }, "Layers" : [ { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, { "Name" : "Hidden", "Kind" : "Hidden", "Type" : "FullyConnected", "N" : 128, "Activation" : "Sigmoid", "Sparse" : true }, { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true } ], "ErrorFunction" : "ScaledMarginalCrossEntropy" }
  • 12.
    Summary for DSSTNE Veryefficient performance for sparse fully-connected NN Multiple GPU by Model parallel and Data parallel Declare NN by human readable format JSON definition 100% Deterministic execution
  • 13.
  • 14.
    Productivity Agile iteration isthe most important for productivity design=>train=>predict=>evaluate=>design=>… Training: GPU (DSSTNE and others) Pre/Post process: CPU How to unify these different workload? Data scientists don't want to use too much tools
  • 16.
    What are Containers? OSvirtualization Process isolation Images AutomationServer Guest OS Bins/Libs Bins/Libs App2App1
  • 17.
    Deep Learning meetsDocker(Container) A lot of Deep Learning frameworks DSSTNE, Caffe, Theano, TensorFlow, etc. To compare each framework using the same input and output Containerize each framework Just swap the container image and configuration No more worry about setup machines!
  • 19.
    Spark moves atinteractive speed join filter groupBy Stage 3 Stage 1 Stage 2 A: B: C: D: E: F: = cached partition= RDD map • Massively parallel • Uses DAGs instead of map- reduce for execution • Minimizes I/O by storing data in DataFrames in memory • Partitioning-aware to avoid network-intensive shuffle
  • 20.
    Apache Zeppelin notebookto develop queries
  • 21.
  • 22.
    Control CPU clusterand GPU cluster Both CPU and GPU jobs are submitted via Spark driver CPU jobs: Normal Spark tasks running on Amazon EMR GPU jobs: Spark submits jobs to Amazon ECS Not only DSSTNE but also other DL frameworks by Docker
  • 23.
  • 24.
    Why EMR? Automation DecoupleElastic Integration Low-costCurrent
  • 25.
    Why EMR? Automation EC2Provisioning Cluster Setup Hadoop Configuration Installing ApplicationsJob submissionMonitoring and Failure Handling
  • 26.
    Why EMR? DecoupledArchitecture Separate compute and storage Resize and shutdown with no data loss Point multiple clusters ad the same data on Amazon S3 Easily evolve infrastructure as technology evolves HDFS for iterative and disk I/O intensive workloads Save with spot and reserved instances
  • 27.
    Why EMR? DecoupleStorage and Compute Amazon Kinesis (Streams, Firehose) Hadoop Jobs Persistent Cluster – Interactive Queries (Spark-SQL | Presto | Impala) Transient Cluster - Batch Jobs (X hours nightly) – Add/Remove Nodes ETL Jobs Hive External Metastore i.e Amazon RDS Workload specific clusters (Different sizes, Different Versions) Amazon S3 for Storage create external table t_name(..)... location s3://bucketname/path-to-file/
  • 28.
    EMR 5.0 -Applications
  • 29.
  • 30.
    Amazon EC2 ContainerService (ECS) Container Management at Any Scale Flexible Container Placement Integration with the AWS Platform
  • 31.
    Components of AmazonECS Task Actual containers running on Instances Task Definition Definition of containers and environment for task Cluster Fleet of EC2 instances on which tasks run Manager Manage cluster resource and state of tasks Scheduler Place tasks considering cluster status Agent Coordinate EC2 instances and Manager
  • 32.
    How Amazon ECSruns Task Scheduler ManagerCluster Task Definition Task Agent
  • 33.
    Integration with Sparkand ECS Install AWS SDK for Java on the EMR cluster Create Task Definition for each Deep Learning framework Call RunTask API ECS Scheduler will try to find enough space to run it
  • 34.
  • 37.
  • 40.
  • 41.
  • 42.
    Amazon Personalization runson AWS Spark and Zeppelin for the single interface for data scientists DSSTNE helps running DL on a huge amount of sparse NN Using Amazon EMR for CPU and Amazon ECS for GPU You can do it!
  • 43.