NEURAL NETWORKS AND DEEP LEARNING BY ASIM JALIS GALVANIZE
WHO AM I?
ASIM JALIS Galvanize/Zipfian, Data Engineering Cloudera, Microso!, Salesforce MS in Computer Science from University of Virginia https://www.linkedin.com/in/asimjalis
WHAT IS GALVANIZE’S DATA ENGINEERING PROGRAM?
DO YOU WANT TO . . . Play with terabytes of data Build data applications using Spark, Hadoop, Hive, Kafka, Storm, HBase Use Data Science algorithms at scale
WHAT IS INVOLVED? Learn concepts in interactive lectures Develop skills in hands-on labs Design and build your Capstone Project Show project to SF tech companies at Hiring Day
FOR MORE INFORMATION Check out Talk to me http://galvanize.com asim.jalis@galvanize.com
INTRO
WHAT IS THIS TALK ABOUT? What are Neural Networks and how do they work? What is Deep Learning? What is the difference? How can we build neural networks in Apache Spark?
HOW MANY PEOPLE HERE ARE FAMILIAR WITH NEURAL NETWORKS?
HOW MANY PEOPLE HERE ARE FAMILIAR WITH CONVOLUTION NEURAL NETWORKS?
HOW MANY PEOPLE HERE ARE FAMILIAR WITH DEEP LEARNING?
HOW MANY PEOPLE HERE ARE FAMILIAR WITH APACHE SPARK AND MLLIB?
NEURAL NETWORKS
WHAT IS A NEURON?
Receives signal on synapse When trigger sends signal on axon
Mathematical abstraction Inspired by biological neuron Either on or off based on sum of input
Neuron is a mathematical function Adds up (weighted) inputs Applies the sigmoid function This determines if it fires or not
WHAT ARE NEURAL NETWORKS? Biologically inspired machine learning algorithm Mathematical neurons arranged in layers Accumulate signals from the previous layer Fire when signal reaches threshold
HOW MANY NEURONS SHOULD I HAVE IN MY NETWORK?
HOW MANY INPUT LAYER NEURONS SHOULD WE HAVE?
The number of inputs or features
HOW MANY OUTPUT LAYER NEURONS SHOULD WE HAVE?
The number of classes we are classifying the input into.
HOW MANY HIDDEN LAYER NEURONS SHOULD WE HAVE?
SIMPLEST OPTION IS TO USE 0.
SINGLE LAYER PERCEPTRON
WHAT ARE THE DOWNSIDES OF NO HIDDEN LAYERS? Only works if data is linearly separable. Identical to logistic regression.
MULTILAYER PERCEPTRON For most realistic classification tasks you will need a hidden layer. Rule of thumb: Number of hidden layers equals one Number of neurons in hidden layer is mean of size of input and output layers.
HOW DO WE USE THIS THING?
NEURAL NETWORK WORKFLOW Split labeled data into train and test sets Train with labeled data Test and compare prediction with actual labels
HOW DO WE TRAIN IT?
FEED FORWARD Also called forward propagation or forward prop Initialize inputs Weigh inputs into hidden layer, sum, apply sigmoid Calculate activation of hidden layer Weight inputs into output layer, sum, apply sigmoid Calculate activation of output layer
BACK PROPAGATION Use forward prop to calculate the error Error is function of all network weights Adjust weights using gradient descent Repeat with next record Keep going over training set until convergence
WHAT IS GRADIENT DESCENT?
HOW DO YOU FIND THE MINIMUM IN AN N-DIMENSIONAL SPACE? Take a step in the steepest direction. Steepest direction is vector sum of all derivatives.
PUTTING ALL THIS TOGETHER
Use forward prop to activate Use back prop to train Then use forward prop to test
WHY NOT HAVE MULTIPLE LAYERS?
DOWNSIDE OF MULTIPLE LAYERS Number of weights is a product of the layer sizes The mathematics quickly becomes intractable Particularly when your input is an image with tens of thousands of pixels
APACHE SPARK MLLIB
WHAT IS SPARK
Framework for processing data across a cluster By sending the code to the data And executing the code where the data lives
WHAT IS MLLIB? Library for Machine Learning. Builds on top of Spark RDDs. Provides RDDs for Machine Learning. Implements common Machine Learning algorithms.
DEMO USING APACHE TOREE
WHAT IS APACHE TOREE? Like IPython Notebook but for Spark/Scala. Jupyter kernel for Spark/Scala.
HOW CAN I INSTALL TOREE? Use pip to install IPython or Jupyter. Install Apache Spark by downloading tgz file and expanding. SPARK_HOME=$HOME/spark-1.6.0 pip install toree jupyter toree install --spark_home=$SPARK_HOME
HOW CAN I RUN A TOREE NOTEBOOK jupyter notebook Visit Create new notebook. Set kernel to Toree. sc in notebook should print Spark Context. http://localhost:8888
NEURAL NETWORK CONSTRUCTION
HOW CAN I FIGURE OUT HOW MANY LAYERS? To figure out how many layers to use and what topology to use you have to rely on standard machine learning techniques. Use cross-validation. In general k-fold cross validation. 10-fold cross validation is popular.
WHAT IS 10-FOLD CROSS VALIDATION OR K-FOLD CROSS VALIDATION?
Split your data into 10 (or in general k) equal-sized subsets. Train model on 9 of them, set one aside for cross- validation. Validate model on 10th and remember your error rate. Repeat by setting aside each one of the 10. Average the 10 error rates. Then repeat for the next model. Choose the model with the lowest error rate.
HOW DO I DEPLOY MY NEURAL NETWORK INTO PRODUCTION? There are two phases. The training phase can be run on the back-end servers. Cross-validate your model and its hyper-parameters on the back-end. Then deploy the model to the front-end servers, browsers, devices. The front-end only uses forward prop and is always fast.
DEEP LEARNING
WHAT IS DEEP LEARNING? Deep Learning is a learning method that can train the system with more than 2 or 3 non-linear hidden layers.
WHAT IS DEEP LEARNING? Machine learning techniques which enable unsupervised feature learning and pattern analysis/classification. The essence of deep learning is to compute representations of the data. Higher-level features are defined from lower-level ones.
HOW IS DEEP LEARNING DIFFERENT FROM REGULAR NEURAL NETWORKS? Training neural networks requires applying gradient descent on millions of dimensions. This is intractable for large networks. Deep learning places constraints on neural networks. This allows them to be solvable iteratively. The constraints are generic.
WHAT IS THE BIG DEAL ABOUT IT? AlexNet submitted to the ImageNet ILSVRC challenge in 2012 is partly responsible for the renaissance. Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used Deep Learning techniques. They combined this with GPUs, some other techniques. The result was a neural network that could classify images of cats and dogs. It had an error 16% compared to 26% for the runner up.
ILYA SUTSKEVER, ALEX KRIZHEVSKY, GEOFFREY HINTON
WHAT ARE THE DIFFERENT KINDS OF DEEP ARCHITECTURES? Generative Discriminative Hybrid
WHAT ARE GENERATIVE ARCHITECTURES Extract features from data Find common features in unlabelled data Like Principal Component Analysis Unsupervised: no labels required
WHAT ARE DISCRIMINATIVE ARCHITECTURES Classify inputs into classes Require labels Require supervised training
WHAT ARE HYBRID ARCHITECTURES? STEP 1 Combination of generative and discriminative Extract features using generative network Use unsupervised learning STEP 2 Train discriminative network on extracted features Use supervised learning
WHAT ARE AUTO-ENCODERS? An auto-encoder is a learning algorithm. It applies backpropagation and sets the target values to be equal to its inputs. In other words it trains itself to do the identity transformation.
WHY DOES IT DO THIS? By placing constraints on it, like restricting the number of hidden neurons, it can find a good representation of the data.
IS THE AUTO-ENCODER SUPERVISED OR UNSUPERVISED?
It is unsupervised. The data is unlabeled. Auto-encoders are similar to PCA (Principal Component Analysis). PCA is a technique for reducing the dimensions of data.
WHAT ARE CONVOLUTION NEURAL NETWORKS? Feedforward neural networks. Connection pattern inspired by visual cortex.
CONVOLUTION NEURAL NETWORKS The convolution layer’s parameters are a set of learnable filters. Every filter is small along width and height. During the forward pass, each filter slides across the width and height of the input, producing a 2-dimensional activation map. As we slide across the input we compute the dot product between the filter and the input.
CONVOLUTION NEURAL NETWORKS Intuitively, the network learns filters that activate when they see a specific type of feature anywhere. In this way it creates translation invariance.
WHAT IS A POOLING LAYER? The pooling layer reduces the resolution of the image further. It tiles the output area with 2x2 mask and takes the maximum activation value of the area.
DOES SPARK SUPPORT DEEP LEARNING? Not directly yet https://issues.apache.org/jira/browse/SPARK-2352
WHAT ARE SOME MAJOR DEEP LEARNING PLATFORMS?
Theano: Low-level GPU-enabled tensor library. Lasagne, Blocks: NN libraries that make Theano easier to use. Torch7: NN library. Uses Lua for binding. Used by Facebook and Google. Caffe: NN library by Berkeley AMPLab. Pylearn2: ML library based on Theano by University of Toronto. Google DeepMind. cuDNN: NN library by Nvidia based on CUDA. Can be used with Torch7, Caffe. Chainer: NN library that uses CUDA. TensorFlow: NN library from Google.
WHAT LANGUAGE ARE THESE IN? All the frameworks support Python. Except Torch7 which uses Lua for its binding language.
WHAT CAN I DO ON SPARK? SparkNet: Integrates running Caffe with Spark. Sparkling Water: Integrates H2O with Spark. DeepLearning4J: Built on top of Spark. TensorFlow on Spark (experimental)
QUESTIONS
GALVANIZE DATA ENGINEERING

Neural Networks, Spark MLlib, Deep Learning

  • 1.
    NEURAL NETWORKS ANDDEEP LEARNING BY ASIM JALIS GALVANIZE
  • 2.
  • 3.
    ASIM JALIS Galvanize/Zipfian, DataEngineering Cloudera, Microso!, Salesforce MS in Computer Science from University of Virginia https://www.linkedin.com/in/asimjalis
  • 4.
    WHAT IS GALVANIZE’SDATA ENGINEERING PROGRAM?
  • 6.
    DO YOU WANTTO . . . Play with terabytes of data Build data applications using Spark, Hadoop, Hive, Kafka, Storm, HBase Use Data Science algorithms at scale
  • 7.
    WHAT IS INVOLVED? Learnconcepts in interactive lectures Develop skills in hands-on labs Design and build your Capstone Project Show project to SF tech companies at Hiring Day
  • 8.
    FOR MORE INFORMATION Checkout Talk to me http://galvanize.com asim.jalis@galvanize.com
  • 9.
  • 10.
    WHAT IS THISTALK ABOUT? What are Neural Networks and how do they work? What is Deep Learning? What is the difference? How can we build neural networks in Apache Spark?
  • 11.
    HOW MANY PEOPLEHERE ARE FAMILIAR WITH NEURAL NETWORKS?
  • 12.
    HOW MANY PEOPLEHERE ARE FAMILIAR WITH CONVOLUTION NEURAL NETWORKS?
  • 13.
    HOW MANY PEOPLEHERE ARE FAMILIAR WITH DEEP LEARNING?
  • 14.
    HOW MANY PEOPLEHERE ARE FAMILIAR WITH APACHE SPARK AND MLLIB?
  • 15.
  • 16.
    WHAT IS ANEURON?
  • 17.
    Receives signal onsynapse When trigger sends signal on axon
  • 18.
    Mathematical abstraction Inspired bybiological neuron Either on or off based on sum of input
  • 19.
    Neuron is amathematical function Adds up (weighted) inputs Applies the sigmoid function This determines if it fires or not
  • 20.
    WHAT ARE NEURALNETWORKS? Biologically inspired machine learning algorithm Mathematical neurons arranged in layers Accumulate signals from the previous layer Fire when signal reaches threshold
  • 22.
    HOW MANY NEURONSSHOULD I HAVE IN MY NETWORK?
  • 23.
    HOW MANY INPUTLAYER NEURONS SHOULD WE HAVE?
  • 24.
    The number ofinputs or features
  • 25.
    HOW MANY OUTPUTLAYER NEURONS SHOULD WE HAVE?
  • 26.
    The number ofclasses we are classifying the input into.
  • 27.
    HOW MANY HIDDENLAYER NEURONS SHOULD WE HAVE?
  • 28.
  • 29.
  • 30.
    WHAT ARE THEDOWNSIDES OF NO HIDDEN LAYERS? Only works if data is linearly separable. Identical to logistic regression.
  • 31.
    MULTILAYER PERCEPTRON For mostrealistic classification tasks you will need a hidden layer. Rule of thumb: Number of hidden layers equals one Number of neurons in hidden layer is mean of size of input and output layers.
  • 32.
    HOW DO WEUSE THIS THING?
  • 33.
    NEURAL NETWORK WORKFLOW Splitlabeled data into train and test sets Train with labeled data Test and compare prediction with actual labels
  • 34.
    HOW DO WETRAIN IT?
  • 35.
    FEED FORWARD Also calledforward propagation or forward prop Initialize inputs Weigh inputs into hidden layer, sum, apply sigmoid Calculate activation of hidden layer Weight inputs into output layer, sum, apply sigmoid Calculate activation of output layer
  • 36.
    BACK PROPAGATION Use forwardprop to calculate the error Error is function of all network weights Adjust weights using gradient descent Repeat with next record Keep going over training set until convergence
  • 37.
  • 39.
    HOW DO YOUFIND THE MINIMUM IN AN N-DIMENSIONAL SPACE? Take a step in the steepest direction. Steepest direction is vector sum of all derivatives.
  • 40.
  • 41.
    Use forward propto activate Use back prop to train Then use forward prop to test
  • 42.
    WHY NOT HAVEMULTIPLE LAYERS?
  • 43.
    DOWNSIDE OF MULTIPLELAYERS Number of weights is a product of the layer sizes The mathematics quickly becomes intractable Particularly when your input is an image with tens of thousands of pixels
  • 44.
  • 45.
  • 47.
    Framework for processingdata across a cluster By sending the code to the data And executing the code where the data lives
  • 48.
    WHAT IS MLLIB? Libraryfor Machine Learning. Builds on top of Spark RDDs. Provides RDDs for Machine Learning. Implements common Machine Learning algorithms.
  • 50.
  • 51.
    WHAT IS APACHETOREE? Like IPython Notebook but for Spark/Scala. Jupyter kernel for Spark/Scala.
  • 52.
    HOW CAN IINSTALL TOREE? Use pip to install IPython or Jupyter. Install Apache Spark by downloading tgz file and expanding. SPARK_HOME=$HOME/spark-1.6.0 pip install toree jupyter toree install --spark_home=$SPARK_HOME
  • 53.
    HOW CAN IRUN A TOREE NOTEBOOK jupyter notebook Visit Create new notebook. Set kernel to Toree. sc in notebook should print Spark Context. http://localhost:8888
  • 54.
  • 55.
    HOW CAN IFIGURE OUT HOW MANY LAYERS? To figure out how many layers to use and what topology to use you have to rely on standard machine learning techniques. Use cross-validation. In general k-fold cross validation. 10-fold cross validation is popular.
  • 56.
    WHAT IS 10-FOLDCROSS VALIDATION OR K-FOLD CROSS VALIDATION?
  • 57.
    Split your datainto 10 (or in general k) equal-sized subsets. Train model on 9 of them, set one aside for cross- validation. Validate model on 10th and remember your error rate. Repeat by setting aside each one of the 10. Average the 10 error rates. Then repeat for the next model. Choose the model with the lowest error rate.
  • 58.
    HOW DO IDEPLOY MY NEURAL NETWORK INTO PRODUCTION? There are two phases. The training phase can be run on the back-end servers. Cross-validate your model and its hyper-parameters on the back-end. Then deploy the model to the front-end servers, browsers, devices. The front-end only uses forward prop and is always fast.
  • 59.
  • 60.
    WHAT IS DEEPLEARNING? Deep Learning is a learning method that can train the system with more than 2 or 3 non-linear hidden layers.
  • 61.
    WHAT IS DEEPLEARNING? Machine learning techniques which enable unsupervised feature learning and pattern analysis/classification. The essence of deep learning is to compute representations of the data. Higher-level features are defined from lower-level ones.
  • 62.
    HOW IS DEEPLEARNING DIFFERENT FROM REGULAR NEURAL NETWORKS? Training neural networks requires applying gradient descent on millions of dimensions. This is intractable for large networks. Deep learning places constraints on neural networks. This allows them to be solvable iteratively. The constraints are generic.
  • 63.
    WHAT IS THEBIG DEAL ABOUT IT? AlexNet submitted to the ImageNet ILSVRC challenge in 2012 is partly responsible for the renaissance. Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used Deep Learning techniques. They combined this with GPUs, some other techniques. The result was a neural network that could classify images of cats and dogs. It had an error 16% compared to 26% for the runner up.
  • 64.
  • 66.
    WHAT ARE THEDIFFERENT KINDS OF DEEP ARCHITECTURES? Generative Discriminative Hybrid
  • 67.
    WHAT ARE GENERATIVE ARCHITECTURES Extractfeatures from data Find common features in unlabelled data Like Principal Component Analysis Unsupervised: no labels required
  • 68.
    WHAT ARE DISCRIMINATIVE ARCHITECTURES Classifyinputs into classes Require labels Require supervised training
  • 69.
    WHAT ARE HYBRID ARCHITECTURES? STEP1 Combination of generative and discriminative Extract features using generative network Use unsupervised learning STEP 2 Train discriminative network on extracted features Use supervised learning
  • 70.
    WHAT ARE AUTO-ENCODERS? Anauto-encoder is a learning algorithm. It applies backpropagation and sets the target values to be equal to its inputs. In other words it trains itself to do the identity transformation.
  • 72.
    WHY DOES ITDO THIS? By placing constraints on it, like restricting the number of hidden neurons, it can find a good representation of the data.
  • 73.
  • 74.
    It is unsupervised. Thedata is unlabeled. Auto-encoders are similar to PCA (Principal Component Analysis). PCA is a technique for reducing the dimensions of data.
  • 75.
    WHAT ARE CONVOLUTION NEURALNETWORKS? Feedforward neural networks. Connection pattern inspired by visual cortex.
  • 77.
    CONVOLUTION NEURAL NETWORKS The convolutionlayer’s parameters are a set of learnable filters. Every filter is small along width and height. During the forward pass, each filter slides across the width and height of the input, producing a 2-dimensional activation map. As we slide across the input we compute the dot product between the filter and the input.
  • 78.
    CONVOLUTION NEURAL NETWORKS Intuitively, thenetwork learns filters that activate when they see a specific type of feature anywhere. In this way it creates translation invariance.
  • 79.
    WHAT IS APOOLING LAYER? The pooling layer reduces the resolution of the image further. It tiles the output area with 2x2 mask and takes the maximum activation value of the area.
  • 81.
    DOES SPARK SUPPORTDEEP LEARNING? Not directly yet https://issues.apache.org/jira/browse/SPARK-2352
  • 82.
    WHAT ARE SOMEMAJOR DEEP LEARNING PLATFORMS?
  • 83.
    Theano: Low-level GPU-enabledtensor library. Lasagne, Blocks: NN libraries that make Theano easier to use. Torch7: NN library. Uses Lua for binding. Used by Facebook and Google. Caffe: NN library by Berkeley AMPLab. Pylearn2: ML library based on Theano by University of Toronto. Google DeepMind. cuDNN: NN library by Nvidia based on CUDA. Can be used with Torch7, Caffe. Chainer: NN library that uses CUDA. TensorFlow: NN library from Google.
  • 84.
    WHAT LANGUAGE ARETHESE IN? All the frameworks support Python. Except Torch7 which uses Lua for its binding language.
  • 85.
    WHAT CAN IDO ON SPARK? SparkNet: Integrates running Caffe with Spark. Sparkling Water: Integrates H2O with Spark. DeepLearning4J: Built on top of Spark. TensorFlow on Spark (experimental)
  • 86.
  • 87.