1 Best Practices for Streaming IoT Data with MQTT and Apache Kafka® Kai Waehner, Technology Evangelist, Confluent Dominik Obermaier, CTO, HiveMQ
2 Speakers Kai Waehner Technology Evangelist, Confluent kai.waehner@confluent.io @KaiWaehner Dominik Obermaier CTO, HiveMQ dominik.obermaier@hivemq.com @dobermai
4 Agenda • Use Case • Architecture • Live Demo • Best Practices • Next steps
5 Agenda • Use Case • Architecture • Live Demo • Best Practices • Next steps
6 Global Automotive Company Builds Connected Car Infrastructure Use Cases: • Connected Car Infrastructure (Cars, Traffic Lights, Cloud Services, etc.) • Real Time Analytics (Predictive Maintenance, etc.) • Continuous Services / Sales • Partner Integration (Car workshop, gas station, food market, etc.) • …
77 Other Components Kafka Ecosystem (3) Read Data TensorFlow I/O (5) Deploy Model (2) Preprocess Data (8a) Alert Car Real Time Kafka App TensorFlow Serving gRPC Car Sensor HiveMQ MQTT Broker MQTT Connector Kafka Connect or Confluent Proxy or HiveMQ Plugin Kafka Cluster Kafka Connect Grafana Elastic Search KSQL Tensor Flow (04) Train Model Real Time Kafka Streams Application (Java / Scala) Tensor Flow Real Time Edge Computing (C / librdkafka) Tensor Flow Lite (1) Ingest Data (8b) Alert Driver (e.g. mobile App) (6a) Consume Car Data (6b) All Data (7) Potential Defect
8 Agenda • Use Case • Architecture • Live Demo • Best Practices • Next steps
9 Cloud Native Infrastructure Benefits • Scalable • Flexible • Agile • Elastic • Automated • Etc.
10 • IoT-specific features for bad network / connectivity • Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSocket) • Built on top of TCP/IP for constrained devices and unreliable networks • Many (open source) broker implementations • Many (open source) client libraries MQTT - Publish / subscribe messaging protocol
11 MQTT Trade-Offs Pros • Lightweight • All programming languages supported • Built for poor connectivity / high latency scenarios (e.g. mobile networks!) • High scalability and availability * • ISO Standard • Most popular IoT protocol Cons • Only pub/sub, not stream processing • Asynchronous processing (clients can be offline for long time) • No reprocessing of events * Depending on Broker Implementation
12 A Streaming Platform is the Underpinning of an Event-driven Architecture Ubiquitous connectivity Globally scalable platform for all event producers and consumers Immediate data access Data accessible to all consumers in real time Single system of record Persistent storage to enable reprocessing of past events Continuous queries Stream processing capabilities for in-line data transformation Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps
13 Kafka Trade-Offs (from IoT perspective) Pros • Stream processing, not just pub/sub • High throughput • Large scale • High availability • Long term storage and buffering • Reprocessing of events • Good integration to rest of the enterprise Cons • Not built for tens of thousands connections • Requires stable network and good infrastructure • No IoT-specific features like keep alive, last will or testament
14 (De facto) Standards for Processing IoT Data A Match Made In Heaven + =
15 Agenda • Use Case • Architecture • Live Demo • Best Practices • Next steps
16 Advanced Demo
17 MVP
18 Live Demo End-to-End Integration and Data Processing for 100000 Connected Cars
19 https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference or http://bit.ly/kafka-mqtt-ml-demo => Try it out in 30 minutes! Demo 100.000 Connected Cars (Kafka + MQTT + TensorFlow)
20 Agenda • Use Case • Architecture • Live Demo • Best Practices • Next steps
21 Typical Journey Value Maturity (Investment & time) 2 Enterprise Streaming Pilot / Early Production Pub + Sub Store Process 5 Central Nervous System 1 Developer Interest Pre-Streaming 4 Global Streaming 3 SLA Ready, Integrated Streaming Projects Platform
22 Start Small, but prepare for Scalability from Beginning 1. Use cloud native and scalable components • Confluent Platform is cloud native and built for scale • HiveMQ is cloud native and built for scale 2. Don’t deep dive too much in the beginning – but understand options • HiveMQ Kafka Extension? • Confluent MQTT connectors? • Customer Integration? 3. Plan for Enterprise-readiness • Security • Monitoring • Operations tooling • Bi-directional communication
23 Choose the right tool stack and infrastructure Understand Trade-Offs and choose the right options for deployments • Edge • On Premise • Cloud Use the best tools for the job • Confluent Platform for Event Streaming • HiveMQ for MQTT messaging and connectivity
24 Separation of concerns 1. Devices 2. Gateway 3. Integration 4. Data Streaming 5. Consumer Apps Decouple tasks • Source integration • Data processing • Business logic • Sink integration • Analytics • …
25 Different data for different use cases • Database, Data Lake • Search • Real time, Near Real Time, Batch • Streaming, Request-Response • CQRS, Event Sourcing • Machine Learning There is no single MASTER DATA EVENT…
26 Agenda • Use Case • Architecture • Live Demo • Best Practices • Next steps
27 The HiveMQ Platform
28 The HiveMQ Platform – Open Source and Enterprise-grade
29 Confluent Platform Operations and Security Development & Stream Processing Support,Services,Training&Partners Apache Kafka Security plugins | Role-Based Access Control Control Center | Replicator | Auto Data Balancer | Operator Connectors Clients | REST Proxy MQTT Proxy | Schema Registry KSQL Connect Continuous Commit Log Streams Complete Event Streaming Platform Mission-critical Reliability Freedom of Choice Datacenter Public Cloud Confluent Cloud Self-Managed Software Fully-Managed Service
30 Confluent Cloud Cloud-Native Confluent Platform Fully-Managed Service Available on the leading public clouds with mission-critical SLAs and consumption-based pricing. Serverless Kafka characteristics: Pay-as-you-go, elastic auto-scaling, abstracting infrastructure (topics not brokers) Spend your time on your applications!
31 Next steps… Try out the demo in 30 minutes: https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference http://bit.ly/kafka-mqtt-ml-demo Check out the documentation and blog posts • HiveMQ and Apache Kafka - Streaming IoT Data and MQTT Messages: https://www.hivemq.com/blog/streaming-iot-data-and-mqtt-messages-to-apache-kafka/ • Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT: https://www.confluent.io/blog/iot-with-kafka-connect-mqtt-and-rest-proxy Contact us for questions or any other feedback: • Website, Email, Slack, Phone, … • Dominik: dominik@hivemq.com , Kai: kai.waehner@confluent.io
32 Questions? Feedback? Kai Waehner Technology Evangelist kai.waehner@confluent.io LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de Please contact us! Dominik Obermaier CTO HiveMQ dominik.obermaier@hivemq.com www.linkedin.com/in/dobermai www.hivemq.com www.twitter.com/dobermai

Best Practices for Streaming IoT Data with MQTT and Apache Kafka®

  • 1.
    1 Best Practices forStreaming IoT Data with MQTT and Apache Kafka® Kai Waehner, Technology Evangelist, Confluent Dominik Obermaier, CTO, HiveMQ
  • 2.
    2 Speakers Kai Waehner Technology Evangelist,Confluent kai.waehner@confluent.io @KaiWaehner Dominik Obermaier CTO, HiveMQ dominik.obermaier@hivemq.com @dobermai
  • 3.
    4 Agenda • Use Case •Architecture • Live Demo • Best Practices • Next steps
  • 4.
    5 Agenda • Use Case •Architecture • Live Demo • Best Practices • Next steps
  • 5.
    6 Global Automotive Company BuildsConnected Car Infrastructure Use Cases: • Connected Car Infrastructure (Cars, Traffic Lights, Cloud Services, etc.) • Real Time Analytics (Predictive Maintenance, etc.) • Continuous Services / Sales • Partner Integration (Car workshop, gas station, food market, etc.) • …
  • 6.
    77 Other Components Kafka Ecosystem (3)Read Data TensorFlow I/O (5) Deploy Model (2) Preprocess Data (8a) Alert Car Real Time Kafka App TensorFlow Serving gRPC Car Sensor HiveMQ MQTT Broker MQTT Connector Kafka Connect or Confluent Proxy or HiveMQ Plugin Kafka Cluster Kafka Connect Grafana Elastic Search KSQL Tensor Flow (04) Train Model Real Time Kafka Streams Application (Java / Scala) Tensor Flow Real Time Edge Computing (C / librdkafka) Tensor Flow Lite (1) Ingest Data (8b) Alert Driver (e.g. mobile App) (6a) Consume Car Data (6b) All Data (7) Potential Defect
  • 7.
    8 Agenda • Use Case •Architecture • Live Demo • Best Practices • Next steps
  • 8.
    9 Cloud Native Infrastructure Benefits •Scalable • Flexible • Agile • Elastic • Automated • Etc.
  • 9.
    10 • IoT-specific featuresfor bad network / connectivity • Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSocket) • Built on top of TCP/IP for constrained devices and unreliable networks • Many (open source) broker implementations • Many (open source) client libraries MQTT - Publish / subscribe messaging protocol
  • 10.
    11 MQTT Trade-Offs Pros • Lightweight •All programming languages supported • Built for poor connectivity / high latency scenarios (e.g. mobile networks!) • High scalability and availability * • ISO Standard • Most popular IoT protocol Cons • Only pub/sub, not stream processing • Asynchronous processing (clients can be offline for long time) • No reprocessing of events * Depending on Broker Implementation
  • 11.
    12 A Streaming Platformis the Underpinning of an Event-driven Architecture Ubiquitous connectivity Globally scalable platform for all event producers and consumers Immediate data access Data accessible to all consumers in real time Single system of record Persistent storage to enable reprocessing of past events Continuous queries Stream processing capabilities for in-line data transformation Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps
  • 12.
    13 Kafka Trade-Offs (fromIoT perspective) Pros • Stream processing, not just pub/sub • High throughput • Large scale • High availability • Long term storage and buffering • Reprocessing of events • Good integration to rest of the enterprise Cons • Not built for tens of thousands connections • Requires stable network and good infrastructure • No IoT-specific features like keep alive, last will or testament
  • 13.
    14 (De facto) Standardsfor Processing IoT Data A Match Made In Heaven + =
  • 14.
    15 Agenda • Use Case •Architecture • Live Demo • Best Practices • Next steps
  • 15.
  • 16.
  • 17.
    18 Live Demo End-to-End Integrationand Data Processing for 100000 Connected Cars
  • 18.
  • 19.
    20 Agenda • Use Case •Architecture • Live Demo • Best Practices • Next steps
  • 20.
    21 Typical Journey Value Maturity (Investment& time) 2 Enterprise Streaming Pilot / Early Production Pub + Sub Store Process 5 Central Nervous System 1 Developer Interest Pre-Streaming 4 Global Streaming 3 SLA Ready, Integrated Streaming Projects Platform
  • 21.
    22 Start Small, butprepare for Scalability from Beginning 1. Use cloud native and scalable components • Confluent Platform is cloud native and built for scale • HiveMQ is cloud native and built for scale 2. Don’t deep dive too much in the beginning – but understand options • HiveMQ Kafka Extension? • Confluent MQTT connectors? • Customer Integration? 3. Plan for Enterprise-readiness • Security • Monitoring • Operations tooling • Bi-directional communication
  • 22.
    23 Choose the righttool stack and infrastructure Understand Trade-Offs and choose the right options for deployments • Edge • On Premise • Cloud Use the best tools for the job • Confluent Platform for Event Streaming • HiveMQ for MQTT messaging and connectivity
  • 23.
    24 Separation of concerns 1.Devices 2. Gateway 3. Integration 4. Data Streaming 5. Consumer Apps Decouple tasks • Source integration • Data processing • Business logic • Sink integration • Analytics • …
  • 24.
    25 Different data fordifferent use cases • Database, Data Lake • Search • Real time, Near Real Time, Batch • Streaming, Request-Response • CQRS, Event Sourcing • Machine Learning There is no single MASTER DATA EVENT…
  • 25.
    26 Agenda • Use Case •Architecture • Live Demo • Best Practices • Next steps
  • 26.
  • 27.
    28 The HiveMQ Platform– Open Source and Enterprise-grade
  • 28.
    29 Confluent Platform Operations andSecurity Development & Stream Processing Support,Services,Training&Partners Apache Kafka Security plugins | Role-Based Access Control Control Center | Replicator | Auto Data Balancer | Operator Connectors Clients | REST Proxy MQTT Proxy | Schema Registry KSQL Connect Continuous Commit Log Streams Complete Event Streaming Platform Mission-critical Reliability Freedom of Choice Datacenter Public Cloud Confluent Cloud Self-Managed Software Fully-Managed Service
  • 29.
    30 Confluent Cloud Cloud-Native ConfluentPlatform Fully-Managed Service Available on the leading public clouds with mission-critical SLAs and consumption-based pricing. Serverless Kafka characteristics: Pay-as-you-go, elastic auto-scaling, abstracting infrastructure (topics not brokers) Spend your time on your applications!
  • 30.
    31 Next steps… Try outthe demo in 30 minutes: https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference http://bit.ly/kafka-mqtt-ml-demo Check out the documentation and blog posts • HiveMQ and Apache Kafka - Streaming IoT Data and MQTT Messages: https://www.hivemq.com/blog/streaming-iot-data-and-mqtt-messages-to-apache-kafka/ • Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT: https://www.confluent.io/blog/iot-with-kafka-connect-mqtt-and-rest-proxy Contact us for questions or any other feedback: • Website, Email, Slack, Phone, … • Dominik: dominik@hivemq.com , Kai: kai.waehner@confluent.io
  • 31.
    32 Questions? Feedback? Kai Waehner TechnologyEvangelist kai.waehner@confluent.io LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de Please contact us! Dominik Obermaier CTO HiveMQ dominik.obermaier@hivemq.com www.linkedin.com/in/dobermai www.hivemq.com www.twitter.com/dobermai