Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O)
The document outlines a simplified machine learning architecture utilizing an event streaming platform, highlighting its applications in various industries such as automotive for improving customer experience, predictive maintenance, and fraud detection. It discusses the integration of Apache Kafka as a scalable, technology-agnostic infrastructure for machine learning, supporting real-time data processing and model deployment. Key takeaways emphasize leveraging Kafka's ecosystem and addressing technical debt in machine learning systems.
Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O)
1.
1 Simplified Machine LearningArchitecture with an Event Streaming Platform Kai Waehner | Technology Evangelist, Confluent contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
2.
2 Machine Learning toImprove Traditional and to Build New Use Cases Seconds Minutes Hours Windows of Opportunity Real Time Tracking Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Autonomous Driving Face Recognition Robotics Speech Translation Video Generation Supply Chain Optimization Strategic Planning
3.
3 Global Automotive Company BuildsConnected Car Infrastructure Digital Transformation • Improve customer experience • Increase revenue • Reduce risk Time Today 2 years in the future3 years ago Project begins Connected car infrastructure in production for first use cases Improved processes leveraging machine learning (predictive maintenance, cross-selling)
4.
4 Streaming Analytics for PredictiveMaintenance at Scale IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Human Intelligence
5.
5 Machine Learning (ML) ...allowscomputers to find hidden insights without being explicitly programmed where to look. Machine Learning • Decision Trees • Naïve Bayes • Clustering • Neural Networks • Etc. Deep Learning • CNN • RNN • Transformer • Autoencoder • Etc.
6.
6 Streaming Analytics for PredictiveMaintenance at Scale IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Analytics Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Potential Detect Data Processing Analytics Platform Train Analytic Model Consume Data Preprocess Data Analytic Model Deploy Analytic Model
12 A Streaming Platform isthe Underpinning of an Event-driven Architecture Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps
13.
13 Apache Kafka atScale at Tech Giants > 4.5 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big data
14.
14Business Value perUse Case Business Value Improve Customer Experience (CX) Increase Revenue (make money) Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in- car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $↔ Example Case Studies (of many)
19 SELECT car_id, event_id,car_model_id, sensor_input FROM car_sensor c LEFT JOIN car_models m ON c.car_model_id = m.car_model_id WHERE m.car_model_type ='Audi_A8'; Preprocessing with KSQL
20.
20 Data Ingestion intoa Data Store for Model Training (and Consumption by other Decoupled Applications) Connect Preprocessed Data Batch Near Real Time Real Time
23 Direct streaming ingestion formodel training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model BModel A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io
27 “CREATE STREAM AnomalyDetectionAS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, KSQL and TensorFlow
28.
28 Streaming Analytics with Kafkaand TensorFlow MQTT Proxy Elastic Search Grafana Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential Detect KSQL TensorFlow Train Analytic Model Consume Data Preprocess Data Analytic Model Deploy Analytic Model
31 Key Takeaways Don’t underestimatethe Hidden Technical Debt in Machine Learning Systems Leverage the Apache Kafka Open Source Ecosystem as scalable and flexible Event Streaming Platform Use Streaming Machine Learning with Kafka and TensorFlow IO to simplify your Big Data Architecture
32.
3232 11. November 2019 SteigenbergerFrankfurter Hof 13. November 2019 NOVOTEL Zürich City West Ben Stopford Office of the CTO Confluent Axel Löhn Senior Project Manager Deutsche Bahn Kai Waehner, Technologist Confluent Ralph Debusmann IoT Solution Architect Bosch Power Tools cnfl.io/cse19frankfurt cnfl.io/cse19zurich