1Confidential Apache Kafka + Machine Learning Analytic Models Applied to Real Time Stream Processing Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de
2Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
3Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
4Apache Kafka and Machine Learning Machine Learning ... allows computers to find hidden insights without being explicitly programmed where to look.
5Apache Kafka and Machine Learning Real World Examples of Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Your Company The Next Disruption: Google Beats Go Champion
6Apache Kafka and Machine Learning Leverage Machine Learning to Analyze and Act on Critical Business Moments Seconds Minutes Hours Price Optimization Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Windows of Opportunity
7Apache Kafka and Machine Learning How to realize these use cases?
8Apache Kafka and Machine Learning Big Data Analytics Volume (terabytes, petabytes) Variety (social networks, blog posts, logs, sensors, etc.) Velocity („real time“) Value
9Apache Kafka and Machine Learning Big Data Analytics for Actionable Insights From Insight to Action (continuously closed loop)
10Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data	Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
11Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
12Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data	Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
13Apache Kafka and Machine Learning Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Writing source code is not the time-consuming task! !
14Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
15Apache Kafka and Machine Learning Data Access Find insights to create added business value by correlating various data sources!
16Apache Kafka and Machine Learning Data Preparation http://www.slideshare.net/odsc/feature-engineering Data Preparation
17Apache Kafka and Machine Learning Exploratory Data Analysis © Copyright 2000-2017 TIBCO Software Inc. • Scripting • Visual Analytics • Machine Learning
18Apache Kafka and Machine Learning Model Building A model is a simplification of the truth that helps you with decision making.
19Apache Kafka and Machine Learning Model Execution (Coding) Apply Model to New Data
20Apache Kafka and Machine Learning Model Execution (Tooling) Apply Model to New Data
21Apache Kafka and Machine Learning Model Validation https://genome.tugraz.at/proclassify/help/pages/XV.html Cross-Validation Procedure
22Apache Kafka and Machine Learning Frameworks and Tooling?
23Apache Kafka and Machine Learning Languages, Frameworks and Tools Many more …. Portable Format for Analytics (PFA)
24Apache Kafka and Machine Learning Live Demos with Open Source Technologies Development of Analytic Models with R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
25Apache Kafka and Machine Learning Live Demo Use Case: Customer Churn Prediction Machine Learning Algorithm: Generalized Linear Model (GLM) using Logistic Regression Technology: Open Source R
26Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Gradient Boosted Machines (GBM) using Decision Trees Technology: H2O.ai
27Apache Kafka and Machine Learning Live Demo Use Case: Predictive Maintenance (Anomaly Detection in Telco Networks) Deep Learning Algorithm: Artificial Neural Networks (ANN) using Autoencoders Technology: TensorFlow + Python API
28Apache Kafka and Machine Learning Live Demo Use Case: Classification (Prediction of Titanic Survivors) Deep Learning Algorithm: Recurrent Neural Networks (RNN) Technology: RapidMiner
29Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
30Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
31Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data	Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
32Apache Kafka and Machine Learning Definition of Stream Processsing Data at Rest Data in Motion
33Apache Kafka and Machine Learning Key Concepts
34Apache Kafka and Machine Learning Key Concepts
35Apache Kafka and Machine Learning Key Concepts
36Apache Kafka and Machine Learning Stream Processing Use Cases • Real Time Applications • Stateful Streaming Analytics • Stateless “Real Time ETL”
37Apache Kafka and Machine Learning Event Processing Windows Various Options for Windowing (Fixed, Sliding, Session, …)
38Apache Kafka and Machine Learning How to apply analytic models to real time processing without redevelopment?
39Apache Kafka and Machine Learning Application of Analytic Models to Real Time without Redevelopment Stream Processing H20.ai R Python Spark ML MATLAB SAS PMML
40Apache Kafka and Machine Learning Streaming Analytics - Processing Pipeline APIs Adapters / Channels Integration Messaging Stream Ingest Transformation Aggregation Enrichment Filtering Stream Preprocessing Process Management Analytics (Real Time) Applications & APIs Analytics / DW Reporting Stream Outcomes • Contextual Rules • Windowing • Patterns • Analytics • Machine Learning • … Stream Analytics Index / SearchNormalization Applying an Analytic Model is just a piece of the puzzle!
41Apache Kafka and Machine Learning Frameworks and Tooling?
42Apache Kafka and Machine Learning Frameworks and Products OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK Azure Microsoft Stream Analytics
43Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing?
44Apache Kafka and Machine Learning When to use Kafka Streams for Stream Processing? No need for a Big Data cluster Deploy in your existing infrastructure Kafka manages scalability / fail-over Focus on development of business logic in your department
45Apache Kafka and Machine Learning Kafka Streams Map, filter, aggregate, apply analytic model, „any business logic“ Input Stream (Kafka Topic) Kafka Cluster Output Stream (Kafka Topic) Kafka Cluster Stream Processing Microservice (Kafka Streams) Deployed anywhere: Docker, Kubernetes, Mesos, Java App, …
46Apache Kafka and Machine Learning A complete streaming microservices, ready for production at large-scale Word Count App configuration Define processing (here: WordCount) Start processing
47Apache Kafka and Machine Learning Confluent Platform: the Free, Open-Source Streaming Platform Open Source ExternalCommercial Confluent Platform Monitoring Analytics Custom Apps Transformations Real-time Applications … CRM Data Warehouse Database Hadoop Data Integration … Control Center Auto-data Balancing Multi-Data Center Replication 24/7 Support Supported Connectors Clients Schema Registry REST Proxy Apache Kafka Kafka Connect Kafka Streams Kafka Core Database Changes Log Events loT Data Web Events …
48Apache Kafka and Machine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data	Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
49Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATAANALYTICS Oracle DB CoaP IoT Kafka Java Client ….. HP Vertica Data Integration F L U M E H2O.ai, Spark, TensorFlow Batch Real Time Confluent REST Proxy MQTT IoT iPhone App Kafka Go Client C K O A N F N K E A C T H I V E Grafana Kafka Java EE Web App Hadoop C K O A N F N K E A C T Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
50Apache Kafka and Machine Learning Live Demos with Open Source Technologies Development of Analytic Models with Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
51Apache Kafka and Machine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Any! (in our example, H2O.ai GBM) Streaming Platform: Apache Kafka Core, Kafka Connect, Kafka Streams, Confluent Schema Registry
52Apache Kafka and Machine Learning H2O.ai Model + Kafka Streams Filter Map 1) Create H2O ML model 2) Configure Kafka Streams Application 3) Apply H2O ML model to Streaming Data 4) Start Kafka Streams App
53Apache Kafka and Machine Learning End-to-End Stream Monitoring and Alerting Confluent Control Center Data Stream Monitoring and Alerting Multi-cluster monitoring and management Kafka Connect Configuration • Message delivery? • Delays? • Where got it stuck? • Lost messages? • Broker issues? • Performance? http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
54Apache Kafka and Machine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
55Apache Kafka and Machine Learning Let’s improve the analytic model continuously…
56Apache Kafka and Machine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment Online Training Continuously train and improve the model with every new event
57Apache Kafka and Machine Learning Online Model Training of Analytic Models How to improve models? 1.Manual Update 2.Automated Batch 3.Real Time
58Apache Kafka and Machine Learning STREAMING PLATFORM BIG DATAANALYTICS F L U M E H2O.ai, Spark, TensorFlow H I V E Kafka Hadoop Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Get new Input Event via Kafka Topic 2) Improve Model in Big Data Cluster 3) Update deployed Model via Kafka Topic 4) Leverage Improved Model for new Events
59Apache Kafka and Machine Learning Caveats for Online Model Training • Processes and infrastructure not ready • Validation needed before production • Slows down the system • Only a few ML implementations supported • Many use cases do not need it
60Apache Kafka and Machine Learning Key Take-Aways Ø Insights are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
61Apache Kafka and Machine Learning Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me!

Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

  • 1.
    1Confidential Apache Kafka +Machine Learning Analytic Models Applied to Real Time Stream Processing Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.kai-waehner.de
  • 2.
    2Apache Kafka andMachine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 3.
    3Apache Kafka andMachine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 4.
    4Apache Kafka andMachine Learning Machine Learning ... allows computers to find hidden insights without being explicitly programmed where to look.
  • 5.
    5Apache Kafka andMachine Learning Real World Examples of Machine Learning Spam Detection Search Results + Product Recommendation Picture Detection (Friends, Locations, Products) Your Company The Next Disruption: Google Beats Go Champion
  • 6.
    6Apache Kafka andMachine Learning Leverage Machine Learning to Analyze and Act on Critical Business Moments Seconds Minutes Hours Price Optimization Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Windows of Opportunity
  • 7.
    7Apache Kafka andMachine Learning How to realize these use cases?
  • 8.
    8Apache Kafka andMachine Learning Big Data Analytics Volume (terabytes, petabytes) Variety (social networks, blog posts, logs, sensors, etc.) Velocity („real time“) Value
  • 9.
    9Apache Kafka andMachine Learning Big Data Analytics for Actionable Insights From Insight to Action (continuously closed loop)
  • 10.
    10Apache Kafka andMachine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 11.
    11Apache Kafka andMachine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 12.
    12Apache Kafka andMachine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 13.
    13Apache Kafka andMachine Learning Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Writing source code is not the time-consuming task! !
  • 14.
    14Apache Kafka andMachine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
  • 15.
    15Apache Kafka andMachine Learning Data Access Find insights to create added business value by correlating various data sources!
  • 16.
    16Apache Kafka andMachine Learning Data Preparation http://www.slideshare.net/odsc/feature-engineering Data Preparation
  • 17.
    17Apache Kafka andMachine Learning Exploratory Data Analysis © Copyright 2000-2017 TIBCO Software Inc. • Scripting • Visual Analytics • Machine Learning
  • 18.
    18Apache Kafka andMachine Learning Model Building A model is a simplification of the truth that helps you with decision making.
  • 19.
    19Apache Kafka andMachine Learning Model Execution (Coding) Apply Model to New Data
  • 20.
    20Apache Kafka andMachine Learning Model Execution (Tooling) Apply Model to New Data
  • 21.
    21Apache Kafka andMachine Learning Model Validation https://genome.tugraz.at/proclassify/help/pages/XV.html Cross-Validation Procedure
  • 22.
    22Apache Kafka andMachine Learning Frameworks and Tooling?
  • 23.
    23Apache Kafka andMachine Learning Languages, Frameworks and Tools Many more …. Portable Format for Analytics (PFA)
  • 24.
    24Apache Kafka andMachine Learning Live Demos with Open Source Technologies Development of Analytic Models with R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
  • 25.
    25Apache Kafka andMachine Learning Live Demo Use Case: Customer Churn Prediction Machine Learning Algorithm: Generalized Linear Model (GLM) using Logistic Regression Technology: Open Source R
  • 26.
    26Apache Kafka andMachine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Gradient Boosted Machines (GBM) using Decision Trees Technology: H2O.ai
  • 27.
    27Apache Kafka andMachine Learning Live Demo Use Case: Predictive Maintenance (Anomaly Detection in Telco Networks) Deep Learning Algorithm: Artificial Neural Networks (ANN) using Autoencoders Technology: TensorFlow + Python API
  • 28.
    28Apache Kafka andMachine Learning Live Demo Use Case: Classification (Prediction of Titanic Survivors) Deep Learning Algorithm: Recurrent Neural Networks (RNN) Technology: RapidMiner
  • 29.
    29Apache Kafka andMachine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 30.
    30Apache Kafka andMachine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment
  • 31.
    31Apache Kafka andMachine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 32.
    32Apache Kafka andMachine Learning Definition of Stream Processsing Data at Rest Data in Motion
  • 33.
    33Apache Kafka andMachine Learning Key Concepts
  • 34.
    34Apache Kafka andMachine Learning Key Concepts
  • 35.
    35Apache Kafka andMachine Learning Key Concepts
  • 36.
    36Apache Kafka andMachine Learning Stream Processing Use Cases • Real Time Applications • Stateful Streaming Analytics • Stateless “Real Time ETL”
  • 37.
    37Apache Kafka andMachine Learning Event Processing Windows Various Options for Windowing (Fixed, Sliding, Session, …)
  • 38.
    38Apache Kafka andMachine Learning How to apply analytic models to real time processing without redevelopment?
  • 39.
    39Apache Kafka andMachine Learning Application of Analytic Models to Real Time without Redevelopment Stream Processing H20.ai R Python Spark ML MATLAB SAS PMML
  • 40.
    40Apache Kafka andMachine Learning Streaming Analytics - Processing Pipeline APIs Adapters / Channels Integration Messaging Stream Ingest Transformation Aggregation Enrichment Filtering Stream Preprocessing Process Management Analytics (Real Time) Applications & APIs Analytics / DW Reporting Stream Outcomes • Contextual Rules • Windowing • Patterns • Analytics • Machine Learning • … Stream Analytics Index / SearchNormalization Applying an Analytic Model is just a piece of the puzzle!
  • 41.
    41Apache Kafka andMachine Learning Frameworks and Tooling?
  • 42.
    42Apache Kafka andMachine Learning Frameworks and Products OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK Azure Microsoft Stream Analytics
  • 43.
    43Apache Kafka andMachine Learning When to use Kafka Streams for Stream Processing?
  • 44.
    44Apache Kafka andMachine Learning When to use Kafka Streams for Stream Processing? No need for a Big Data cluster Deploy in your existing infrastructure Kafka manages scalability / fail-over Focus on development of business logic in your department
  • 45.
    45Apache Kafka andMachine Learning Kafka Streams Map, filter, aggregate, apply analytic model, „any business logic“ Input Stream (Kafka Topic) Kafka Cluster Output Stream (Kafka Topic) Kafka Cluster Stream Processing Microservice (Kafka Streams) Deployed anywhere: Docker, Kubernetes, Mesos, Java App, …
  • 46.
    46Apache Kafka andMachine Learning A complete streaming microservices, ready for production at large-scale Word Count App configuration Define processing (here: WordCount) Start processing
  • 47.
    47Apache Kafka andMachine Learning Confluent Platform: the Free, Open-Source Streaming Platform Open Source ExternalCommercial Confluent Platform Monitoring Analytics Custom Apps Transformations Real-time Applications … CRM Data Warehouse Database Hadoop Data Integration … Control Center Auto-data Balancing Multi-Data Center Replication 24/7 Support Supported Connectors Clients Schema Registry REST Proxy Apache Kafka Kafka Connect Kafka Streams Kafka Core Database Changes Log Events loT Data Web Events …
  • 48.
    48Apache Kafka andMachine Learning Streaming Platform Big Data Analytics Database IoT Device Streaming Producer ….. DWH Data Integration C O N N E C T C O N N E C T Data Lake Model Building Batch Real Time Stream Processing REST Interface IoT Device Mobile App Streaming Consumer C O N N E C T C O N N E C T BI Tool Messaging Web Application Model Schema Registry / Governance 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 49.
    49Apache Kafka andMachine Learning STREAMING PLATFORM BIG DATAANALYTICS Oracle DB CoaP IoT Kafka Java Client ….. HP Vertica Data Integration F L U M E H2O.ai, Spark, TensorFlow Batch Real Time Confluent REST Proxy MQTT IoT iPhone App Kafka Go Client C K O A N F N K E A C T H I V E Grafana Kafka Java EE Web App Hadoop C K O A N F N K E A C T Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Data Producer 2) Analytics Platform 3) Streaming Platform 4) Data Consumer
  • 50.
    50Apache Kafka andMachine Learning Live Demos with Open Source Technologies Development of Analytic Models with Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
  • 51.
    51Apache Kafka andMachine Learning Live Demo Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Any! (in our example, H2O.ai GBM) Streaming Platform: Apache Kafka Core, Kafka Connect, Kafka Streams, Confluent Schema Registry
  • 52.
    52Apache Kafka andMachine Learning H2O.ai Model + Kafka Streams Filter Map 1) Create H2O ML model 2) Configure Kafka Streams Application 3) Apply H2O ML model to Streaming Data 4) Start Kafka Streams App
  • 53.
    53Apache Kafka andMachine Learning End-to-End Stream Monitoring and Alerting Confluent Control Center Data Stream Monitoring and Alerting Multi-cluster monitoring and management Kafka Connect Configuration • Message delivery? • Delays? • Where got it stuck? • Lost messages? • Broker issues? • Performance? http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
  • 54.
    54Apache Kafka andMachine Learning Agenda 1) Machine Learning in the Real World 2) Building an Analytic Model 3) Applying an Analytic Model in Real Time 4) Online Training of Models
  • 55.
    55Apache Kafka andMachine Learning Let’s improve the analytic model continuously…
  • 56.
    56Apache Kafka andMachine Learning Analytical Pipeline 1. Data Access 2. Data Preparation 3. Exploratory Data Analysis 4. Model Building 5. Model Execution 6. Model Validation 7. Deployment Online Training Continuously train and improve the model with every new event
  • 57.
    57Apache Kafka andMachine Learning Online Model Training of Analytic Models How to improve models? 1.Manual Update 2.Automated Batch 3.Real Time
  • 58.
    58Apache Kafka andMachine Learning STREAMING PLATFORM BIG DATAANALYTICS F L U M E H2O.ai, Spark, TensorFlow H I V E Kafka Hadoop Confluent Schema Registry Kafka Streams H2O.ai Mesos Kafka Streams TensorFlow Kubernetes Avro Avro 1) Get new Input Event via Kafka Topic 2) Improve Model in Big Data Cluster 3) Update deployed Model via Kafka Topic 4) Leverage Improved Model for new Events
  • 59.
    59Apache Kafka andMachine Learning Caveats for Online Model Training • Processes and infrastructure not ready • Validation needed before production • Slows down the system • Only a few ML implementations supported • Many use cases do not need it
  • 60.
    60Apache Kafka andMachine Learning Key Take-Aways Ø Insights are hidden in Historical Data on Big Data Platforms Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
  • 61.
    61Apache Kafka andMachine Learning Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me!