BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Kafka Connect & Streams the Ecosystem around Kafka Guido Schmutz – 29.11.2017 @gschmutz guidoschmutz.wordpress.com
Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Kafka Connect & Streams - the Ecosystem around Kafka
Our company. Kafka Connect & Streams - the Ecosystem around Kafka Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. Kafka Connect & Streams - the Ecosystem around Kafka 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
Agenda 1. What is Apache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
Demo Example Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver console consumer 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka 27,	Walter,	Ward,	Y,	24-JUL-85,	2017-10-02	15:19:00 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012}
What is Apache Kafka? Kafka Connect & Streams - the Ecosystem around Kafka
Apache Kafka History 2012 2013 2014 2015 2016 2017 Cluster	mirroring data	compression Intra-cluster replication 0.7 0.8 0.9 Data	Processing (Streams	API) 0.10 Data	Integration (Connect	API) 0.11 2018 Exactly	Once Semantics Performance Improvements KSQL	Developer Preview Kafka Connect & Streams - the Ecosystem around Kafka 1.0 JBOD	Support Support	Java	9
Apache Kafka - Unix Analogy $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt Kafka	Connect	API Kafka	Connect	APIKafka	Streams	API Kafka	Core	(Cluster) Adapted	from:	Confluent KSQL Kafka Connect & Streams - the Ecosystem around Kafka
Kafka High Level Architecture The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. The data • Data is stored in topics. • Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Kafka Connect & Streams - the Ecosystem around Kafka
Kafka – Distributed Log at the Core At the heart of Apache Kafka sits a distributed log collection of messages, appended sequentially to a file service ‘seeks’ to the position of the last message it read, then scans sequentially, reading messages in order log-structured character makes Kafka well suited to performing the role of an Event Store in Event Sourcing Event Hub 01	02	03	04	05	06	07	08	09	10	11	12	13	14	15	16	17	18	19	20	21	22 Reads are a single seek & scan Writes are append only Kafka Connect & Streams - the Ecosystem around Kafka
Scale-Out Architecture Kafka Connect & Streams - the Ecosystem around Kafka topic consists of many partitions producer load load-balanced over all partitions consumer can consume with as many threads as there are partitions Producer 1 Consumer 1 Broker 1 Producer 2 Producer 3 Broker 2 Broker 3 Consumer 2 Consumer 3 Consumer 4 Consumer	Group	1 Consumer	Group	2 Kafka	Cluster
Strong Ordering Guarantees most business systems need strong ordering guarantees messages that require relative ordering need to be sent to the same partition supply same key for all messages that require a relative order To maintain global ordering use a single partition topic Producer 1 Consumer 1 Broker 1 Broker 2 Broker 3 Consumer 2 Consumer 3 Key-1 Key-2 Key-3 Key-4 Key-5 Key-6 Key-3 Key-1 Kafka Connect & Streams - the Ecosystem around Kafka
Durable and Highly Available Messaging Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2Consumer 2 Kafka Connect & Streams - the Ecosystem around Kafka P1 P0 P0 P0 P1 P1 P1 P0 P0 P0 P1 P1
Durable and Highly Available Messaging (II) Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2 Consumer 2 Kafka Connect & Streams - the Ecosystem around Kafka P1 P0 P0 P0 P1 P1 P1 P0 P0 P0 P1 P1
Replay-ability – Logs never forget by keeping events in a log, we have a version control system for our data if you were to deploy a faulty program, the system might become corrupted, but it would always be recoverable sequence of events provides an audit point, so that you can examine exactly what happened rewind and reply events, once service is back and bug is fixed Event Hub 01	02	03	04	05	06	07	08	09	10	11	12	13	14	15	16	17	18	19	20	21	22 Replay Rewind Service Logic State Kafka Connect & Streams - the Ecosystem around Kafka
Hold Data for Long-Term – Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Kafka Connect & Streams - the Ecosystem around Kafka
Keep Topics in Compacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction Kafka Connect & Streams - the Ecosystem around Kafka V1 V2 V3 V4 V5 V6 V7 V8 V9V10 V11 K1 K3 K4 K5K2 K6
How to get a Kafka environent Kafka Connect & Streams - the Ecosystem around Kafka On Premises • Bare Metal Installation • Docker • Mesos / Kubernetes • Hadoop Distributions Cloud • Oracle Event Hub Cloud Service • Azure HDInsight Kafka • Confluent Cloud • …
Demo (I) Truck-2 truck position Truck-1 Truck-3 console consumer 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 Testdata-Generator	by	Hortonworks Kafka Connect & Streams - the Ecosystem around Kafka
Demo (I) – Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 –list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Kafka Connect & Streams - the Ecosystem around Kafka
Demo (I) – Run Producer and Kafka-Console-Consumer Kafka Connect & Streams - the Ecosystem around Kafka
Demo (I) – Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps); ProducerRecord<String, String> record = new ProducerRecord<>("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {} Kafka Connect & Streams - the Ecosystem around Kafka
Demo (II) – devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
Demo (II) – devices send to MQTT instead of Kafka Kafka Connect & Streams - the Ecosystem around Kafka
Demo (II) - devices send to MQTT instead of Kafka – how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect - Overview Source Connector Sink Connector Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect – Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect • some useful transforms provided out-of-the-box • Easily implement your own Optionally deploy 1+ transforms with each connector • Modify messages produced by source connector • Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: • InsertField • ReplaceField • MaskField • ValueToKey • ExtractField • TimestampRouter • RegexRouter • SetSchemaMetaData • Flatten • TimestampConverter Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect – Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Source:	http://www.confluent.io/product/connectors Confluent	supported	Connectors Certified	Connectors Community	Connectors Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 console consumer Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III) – Create MQTT Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "connect.mqtt.kcql": "INSERT INTO truck_position SELECT * FROM truck/+/position", "name": "MqttSourceConnector", "connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01", "connect.mqtt.converter.throw.on.error": "true", "connect.mqtt.hosts": "tcp://mosquitto:1883" } }' Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III) – Call REST API and Kafka Console Consumer Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 console consumer what	about	some analytics	? Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Streams Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Streams - Overview • Designed as a simple and lightweight library in Apache Kafka • no external dependencies on systems other than Apache Kafka • Part of open source Apache Kafka, introduced in 0.10+ • Leverages Kafka as its internal messaging layer • Supports fault-tolerant local state • Event-at-a-time processing (not microbatch) with millisecond latency • Windowing with out-of-order data using a Google DataFlow-like model Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Stream DSL and Processor Topology KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Stream DSL and Processor Topology KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Streams Cluster Processor Topology Kafka Cluster input-1 input-2 store	(changelog) output 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Cluster Processor Topology input-1 Partition	0 Partition	1 Partition	2 Partition	3 input-2 Partition	0 Partition	1 Partition	2 Partition	3 Kafka Streams 1 Kafka Streams 2 Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Cluster Processor Topology input-1 Partition	0 Partition	1 Partition	2 Partition	3 input-2 Partition	0 Partition	1 Partition	2 Partition	3 Kafka Streams 1 Kafka Streams 2 Kafka Streams 3 Kafka Streams 4 Kafka Connect & Streams - the Ecosystem around Kafka
Stream vs. Table Event Stream State Stream (Change Log Stream) 2017-10-02T20:18:46 11,Normal,41.87,-87.67 2017-10-02T20:18:55 11,Normal,40.38,-89.17 2017-10-02T20:18:59 21,Normal,42.23,-91.78 2017-10-02T20:19:01 21,Normal,41.71,-91.32 2017-10-02T20:19:02 11,Normal,38.65,-90.2 2017-10-02T20:19:23 21,Normal41.71,-91.32 11 2017-10-02T20:18:46,11,Normal,41.87,-87.67 11 2017-10-02T20:18:55,11,Normal,40.38,-89.17 21 2017-10-02T20:18:59,	21,Normal,42.23,-91.78 21 2017-10-02T20:19:01,21,Normal,41.71,-91.32 11 2017-10-02T20:19:02,11,Normal,38.65,-90.2 21 2017-10-02T20:19:23,21,Normal41.71,-91.32 Kafka Connect & Streams - the Ecosystem around Kafka KStream KTable
Kafka Streams: Key Features Kafka Connect & Streams - the Ecosystem around Kafka • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees
Demo (IV) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_danger ous_driving dangerous_ driving console consumer 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
Demo (IV) - Create Stream final KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> source = builder.stream(stringSerde, stringSerde, "truck_position"); KStream<String, TruckPosition> positions = source.map((key,value) -> new KeyValue<>(key, TruckPosition.create(value))); KStream<String, TruckPosition> filtered = positions.filter(TruckPosition::filterNonNORMAL); filtered.map((key,value) -> new KeyValue<>(key,value._originalRecord)) .to("dangerous_driving"); Kafka Connect & Streams - the Ecosystem around Kafka
KSQL Kafka Connect & Streams - the Ecosystem around Kafka
KSQL: a Streaming SQL Engine for Apache Kafka • Enables stream processing with zero coding required • The simples way to process streams of data in real-time • Powered by Kafka and Kafka Streams: scalable, distributed, mature • All you need is Kafka – no complex deployments • available as Developer preview! • STREAM and TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream • join STREAM and TABLE Kafka Connect & Streams - the Ecosystem around Kafka
KSQL Deployment Models Standalone Mode Cluster Mode Source:	Confluent Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver 27,	Walter,	Ward,	Y,	24-JUL-85,	2017-10-02	15:19:00 console consumer 2016-06-02	14:39:56.605|98|27|803014426| Wichita to	Little Rock	Route2| Normal|38.65|90.21|5187297736652502631 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012} Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql> Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Create Stream ksql> CREATE STREAM dangerous_driving_s (ts VARCHAR, truckid VARCHAR, driverid BIGINT, routeid BIGINT, routename VARCHAR, eventtype VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationid VARCHAR) WITH (kafka_topic='dangerous_driving', value_format='DELIMITED'); Message ---------------- Stream created Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Create Stream ksql> describe dangerous_driving_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT ROUTENAME | VARCHAR(STRING) EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING) Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Create Stream ksql> SELECT * FROM dangerous_driving_s; 1511166635385 | 11 | 2017-11-20T09:30:35 | 83 | 11 | 371182829 | Memphis to Little Rock | Unsafe following distance | 41.11 | -88.42 | 70159356601042621421511166652725 | 11 | 2017-11-20T09:30:52 | 83 | 11 | 371182829 | Memphis to Little Rock | Lane Departure | 38.65 | -90.2 | 70159356601042621421511166667645 | 10 | 2017-11-20T09:31:07 | 77 | 10 | 160779139 | Des Moines to Chicago Route 2 | Overspeed | 37.09 | -94.23 | 70159356601042621421511166670385 | 11 | 2017-11-20T09:31:10 | 83 | 11 | 371182829 | Memphis to Little Rock | Lane Departure | 41.48 | -88.07 | 70159356601042621421511166674175 | 25 | 2017-11-20T09:31:14 | 64 | 25 | 1090292248 | Peoria to Ceder Rapids Route 2 | Unsafe following distance | 36.84 | -89.54 | 70159356601042621421511166686315 | 15 | 2017-11-20T09:31:26 | 90 | 15 | 1927624662 | Springfield to KC Via Columbia | Lane Departure | 35.19 | -90.04 | 70159356601042621421511166686925 | 11 | 2017-11-20T09:31:26 | 83 | 11 | 371182829 | Memphis to Little Rock | Unsafe following distance | 40.38 | -89.17 | 7015935660104262142 Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) – Create JDBC Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"trucking_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }' Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) – Create JDBC Connect through REST API Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Create Table with Driver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='trucking_driver', value_format='JSON'); Message ---------------- Table created Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON') AS SELECT driverid, first_name, last_name, truckid, routeid,routename, eventtype FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverid = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Memphis to Little Rock Route 2 | Unsafe tail distance 1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Joplin to Kansas City | Lane Departure 1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Saint Louis to Chicago Route2 | Unsafe tail distance Kafka Connect & Streams - the Ecosystem around Kafka
Kafka and "Big Data" / "Fast Data" Ecosystem Kafka Connect & Streams - the Ecosystem around Kafka
Kafka and the Big Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks • Apache Spark Streaming • Apache Flink • Apache Storm • Apache Apex • Apache NiFi • StreamSets • Oracle Stream Analytics • Oracle Service Bus • Oracle GoldenGate • Oracle Event Hub Cloud Service • Debezium CDC • … Additional	Info:	https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Kafka Connect & Streams - the Ecosystem around Kafka
Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
Hadoop Clusterd Hadoop Cluster Big Data Cluster Traditional Big Data Architecture BI	Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search	/	Explore Online	&	Mobile Apps Search NoSQL Parallel Batch Processing Distributed Filesystem • Machine	Learning • Graph	Algorithms • Natural	Language	Processing Kafka Connect & Streams - the Ecosystem around Kafka
Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – handle event stream data BI	Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps SQL Search	/	Explore Online	&	Mobile Apps Search Data Flow NoSQL Parallel Batch Processing Distributed Filesystem • Machine	Learning • Graph	Algorithms • Natural	Language	Processing Kafka Connect & Streams - the Ecosystem around Kafka
Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Results Parallel Batch Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI	Tools Enterprise Data Warehouse Search	/	Explore Online	&	Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Kafka Connect & Streams - the Ecosystem around Kafka
Container Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Parallel Batch ProcessingDistributed Filesystem Microservice NoSQLRDBMS SQL Search BI	Tools Enterprise Data Warehouse Search	/	Explore Online	&	Mobile Apps File Import / SQL Import Weather Data {	} API Event Hub Event Hub Event Hub Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect & Streams - the Ecosystem around Kafka Technology on its own won't help you. You need to know how to use it properly.

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

  • 1.
    BASEL BERN BRUGGDÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Kafka Connect & Streams the Ecosystem around Kafka Guido Schmutz – 29.11.2017 @gschmutz guidoschmutz.wordpress.com
  • 2.
    Guido Schmutz Working atTrivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Kafka Connect & Streams - the Ecosystem around Kafka
  • 3.
    Our company. Kafka Connect& Streams - the Ecosystem around Kafka Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 4.
    COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600specialists and IT experts in your region. Kafka Connect & Streams - the Ecosystem around Kafka 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 5.
    Agenda 1. What isApache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
  • 6.
    Demo Example Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver console consumer 2016-06-02 14:39:56.605|98|27|803014426| Wichita to LittleRock Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka 27, Walter, Ward, Y, 24-JUL-85, 2017-10-02 15:19:00 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012}
  • 7.
    What is ApacheKafka? Kafka Connect & Streams - the Ecosystem around Kafka
  • 8.
    Apache Kafka History 20122013 2014 2015 2016 2017 Cluster mirroring data compression Intra-cluster replication 0.7 0.8 0.9 Data Processing (Streams API) 0.10 Data Integration (Connect API) 0.11 2018 Exactly Once Semantics Performance Improvements KSQL Developer Preview Kafka Connect & Streams - the Ecosystem around Kafka 1.0 JBOD Support Support Java 9
  • 9.
    Apache Kafka -Unix Analogy $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt Kafka Connect API Kafka Connect APIKafka Streams API Kafka Core (Cluster) Adapted from: Confluent KSQL Kafka Connect & Streams - the Ecosystem around Kafka
  • 10.
    Kafka High LevelArchitecture The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. The data • Data is stored in topics. • Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Kafka Connect & Streams - the Ecosystem around Kafka
  • 11.
    Kafka – DistributedLog at the Core At the heart of Apache Kafka sits a distributed log collection of messages, appended sequentially to a file service ‘seeks’ to the position of the last message it read, then scans sequentially, reading messages in order log-structured character makes Kafka well suited to performing the role of an Event Store in Event Sourcing Event Hub 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Reads are a single seek & scan Writes are append only Kafka Connect & Streams - the Ecosystem around Kafka
  • 12.
    Scale-Out Architecture Kafka Connect& Streams - the Ecosystem around Kafka topic consists of many partitions producer load load-balanced over all partitions consumer can consume with as many threads as there are partitions Producer 1 Consumer 1 Broker 1 Producer 2 Producer 3 Broker 2 Broker 3 Consumer 2 Consumer 3 Consumer 4 Consumer Group 1 Consumer Group 2 Kafka Cluster
  • 13.
    Strong Ordering Guarantees mostbusiness systems need strong ordering guarantees messages that require relative ordering need to be sent to the same partition supply same key for all messages that require a relative order To maintain global ordering use a single partition topic Producer 1 Consumer 1 Broker 1 Broker 2 Broker 3 Consumer 2 Consumer 3 Key-1 Key-2 Key-3 Key-4 Key-5 Key-6 Key-3 Key-1 Kafka Connect & Streams - the Ecosystem around Kafka
  • 14.
    Durable and HighlyAvailable Messaging Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2Consumer 2 Kafka Connect & Streams - the Ecosystem around Kafka P1 P0 P0 P0 P1 P1 P1 P0 P0 P0 P1 P1
  • 15.
    Durable and HighlyAvailable Messaging (II) Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2 Consumer 2 Kafka Connect & Streams - the Ecosystem around Kafka P1 P0 P0 P0 P1 P1 P1 P0 P0 P0 P1 P1
  • 16.
    Replay-ability – Logsnever forget by keeping events in a log, we have a version control system for our data if you were to deploy a faulty program, the system might become corrupted, but it would always be recoverable sequence of events provides an audit point, so that you can examine exactly what happened rewind and reply events, once service is back and bug is fixed Event Hub 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Replay Rewind Service Logic State Kafka Connect & Streams - the Ecosystem around Kafka
  • 17.
    Hold Data forLong-Term – Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Kafka Connect & Streams - the Ecosystem around Kafka
  • 18.
    Keep Topics inCompacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction Kafka Connect & Streams - the Ecosystem around Kafka V1 V2 V3 V4 V5 V6 V7 V8 V9V10 V11 K1 K3 K4 K5K2 K6
  • 19.
    How to geta Kafka environent Kafka Connect & Streams - the Ecosystem around Kafka On Premises • Bare Metal Installation • Docker • Mesos / Kubernetes • Hadoop Distributions Cloud • Oracle Event Hub Cloud Service • Azure HDInsight Kafka • Confluent Cloud • …
  • 20.
    Demo (I) Truck-2 truck position Truck-1 Truck-3 console consumer 2016-06-02 14:39:56.605|98|27|803014426| Wichita to LittleRock Route2| Normal|38.65|90.21|5187297736652502631 Testdata-Generator by Hortonworks Kafka Connect & Streams - the Ecosystem around Kafka
  • 21.
    Demo (I) –Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 –list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Kafka Connect & Streams - the Ecosystem around Kafka
  • 22.
    Demo (I) –Run Producer and Kafka-Console-Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  • 23.
    Demo (I) –Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps); ProducerRecord<String, String> record = new ProducerRecord<>("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {} Kafka Connect & Streams - the Ecosystem around Kafka
  • 24.
    Demo (II) –devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 2016-06-02 14:39:56.605|98|27|803014426| Wichita to Little Rock Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
  • 25.
    Demo (II) –devices send to MQTT instead of Kafka Kafka Connect & Streams - the Ecosystem around Kafka
  • 26.
    Demo (II) -devices send to MQTT instead of Kafka – how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? 2016-06-02 14:39:56.605|98|27|803014426| Wichita to Little Rock Route2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
  • 27.
    Kafka Connect Kafka Connect& Streams - the Ecosystem around Kafka
  • 28.
    Kafka Connect -Overview Source Connector Sink Connector Kafka Connect & Streams - the Ecosystem around Kafka
  • 29.
    Kafka Connect –Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect • some useful transforms provided out-of-the-box • Easily implement your own Optionally deploy 1+ transforms with each connector • Modify messages produced by source connector • Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: • InsertField • ReplaceField • MaskField • ValueToKey • ExtractField • TimestampRouter • RegexRouter • SetSchemaMetaData • Flatten • TimestampConverter Kafka Connect & Streams - the Ecosystem around Kafka
  • 30.
    Kafka Connect –Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Source: http://www.confluent.io/product/connectors Confluent supported Connectors Certified Connectors Community Connectors Kafka Connect & Streams - the Ecosystem around Kafka
  • 31.
    Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-02 14:39:56.605|98|27|803014426| Wichitato Little Rock Route2| Normal|38.65|90.21|5187297736652502631 console consumer Kafka Connect & Streams - the Ecosystem around Kafka
  • 32.
    Demo (III) –Create MQTT Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "connect.mqtt.kcql": "INSERT INTO truck_position SELECT * FROM truck/+/position", "name": "MqttSourceConnector", "connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01", "connect.mqtt.converter.throw.on.error": "true", "connect.mqtt.hosts": "tcp://mosquitto:1883" } }' Kafka Connect & Streams - the Ecosystem around Kafka
  • 33.
    Demo (III) –Call REST API and Kafka Console Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  • 34.
    Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-02 14:39:56.605|98|27|803014426| Wichitato Little Rock Route2| Normal|38.65|90.21|5187297736652502631 console consumer what about some analytics ? Kafka Connect & Streams - the Ecosystem around Kafka
  • 35.
    Kafka Streams Kafka Connect& Streams - the Ecosystem around Kafka
  • 36.
    Kafka Streams -Overview • Designed as a simple and lightweight library in Apache Kafka • no external dependencies on systems other than Apache Kafka • Part of open source Apache Kafka, introduced in 0.10+ • Leverages Kafka as its internal messaging layer • Supports fault-tolerant local state • Event-at-a-time processing (not microbatch) with millisecond latency • Windowing with out-of-order data using a Google DataFlow-like model Kafka Connect & Streams - the Ecosystem around Kafka
  • 37.
    Kafka Stream DSLand Processor Topology KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
  • 38.
    Kafka Stream DSLand Processor Topology KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
  • 39.
    Kafka Streams Cluster ProcessorTopology Kafka Cluster input-1 input-2 store (changelog) output 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
  • 40.
  • 41.
    Kafka Cluster Processor Topology input-1 Partition 0 Partition 1 Partition 2 Partition 3 input-2 Partition 0 Partition 1 Partition 2 Partition 3 KafkaStreams 1 Kafka Streams 2 Kafka Streams 3 Kafka Streams 4 Kafka Connect & Streams - the Ecosystem around Kafka
  • 42.
    Stream vs. Table EventStream State Stream (Change Log Stream) 2017-10-02T20:18:46 11,Normal,41.87,-87.67 2017-10-02T20:18:55 11,Normal,40.38,-89.17 2017-10-02T20:18:59 21,Normal,42.23,-91.78 2017-10-02T20:19:01 21,Normal,41.71,-91.32 2017-10-02T20:19:02 11,Normal,38.65,-90.2 2017-10-02T20:19:23 21,Normal41.71,-91.32 11 2017-10-02T20:18:46,11,Normal,41.87,-87.67 11 2017-10-02T20:18:55,11,Normal,40.38,-89.17 21 2017-10-02T20:18:59, 21,Normal,42.23,-91.78 21 2017-10-02T20:19:01,21,Normal,41.71,-91.32 11 2017-10-02T20:19:02,11,Normal,38.65,-90.2 21 2017-10-02T20:19:23,21,Normal41.71,-91.32 Kafka Connect & Streams - the Ecosystem around Kafka KStream KTable
  • 43.
    Kafka Streams: KeyFeatures Kafka Connect & Streams - the Ecosystem around Kafka • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees
  • 44.
  • 45.
    Demo (IV) -Create Stream final KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> source = builder.stream(stringSerde, stringSerde, "truck_position"); KStream<String, TruckPosition> positions = source.map((key,value) -> new KeyValue<>(key, TruckPosition.create(value))); KStream<String, TruckPosition> filtered = positions.filter(TruckPosition::filterNonNORMAL); filtered.map((key,value) -> new KeyValue<>(key,value._originalRecord)) .to("dangerous_driving"); Kafka Connect & Streams - the Ecosystem around Kafka
  • 46.
    KSQL Kafka Connect &Streams - the Ecosystem around Kafka
  • 47.
    KSQL: a StreamingSQL Engine for Apache Kafka • Enables stream processing with zero coding required • The simples way to process streams of data in real-time • Powered by Kafka and Kafka Streams: scalable, distributed, mature • All you need is Kafka – no complex deployments • available as Developer preview! • STREAM and TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream • join STREAM and TABLE Kafka Connect & Streams - the Ecosystem around Kafka
  • 48.
    KSQL Deployment Models StandaloneMode Cluster Mode Source: Confluent Kafka Connect & Streams - the Ecosystem around Kafka
  • 49.
  • 50.
    Demo (V) -Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql> Kafka Connect & Streams - the Ecosystem around Kafka
  • 51.
    Demo (V) -Create Stream ksql> CREATE STREAM dangerous_driving_s (ts VARCHAR, truckid VARCHAR, driverid BIGINT, routeid BIGINT, routename VARCHAR, eventtype VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationid VARCHAR) WITH (kafka_topic='dangerous_driving', value_format='DELIMITED'); Message ---------------- Stream created Kafka Connect & Streams - the Ecosystem around Kafka
  • 52.
    Demo (V) -Create Stream ksql> describe dangerous_driving_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT ROUTENAME | VARCHAR(STRING) EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING) Kafka Connect & Streams - the Ecosystem around Kafka
  • 53.
    Demo (V) -Create Stream ksql> SELECT * FROM dangerous_driving_s; 1511166635385 | 11 | 2017-11-20T09:30:35 | 83 | 11 | 371182829 | Memphis to Little Rock | Unsafe following distance | 41.11 | -88.42 | 70159356601042621421511166652725 | 11 | 2017-11-20T09:30:52 | 83 | 11 | 371182829 | Memphis to Little Rock | Lane Departure | 38.65 | -90.2 | 70159356601042621421511166667645 | 10 | 2017-11-20T09:31:07 | 77 | 10 | 160779139 | Des Moines to Chicago Route 2 | Overspeed | 37.09 | -94.23 | 70159356601042621421511166670385 | 11 | 2017-11-20T09:31:10 | 83 | 11 | 371182829 | Memphis to Little Rock | Lane Departure | 41.48 | -88.07 | 70159356601042621421511166674175 | 25 | 2017-11-20T09:31:14 | 64 | 25 | 1090292248 | Peoria to Ceder Rapids Route 2 | Unsafe following distance | 36.84 | -89.54 | 70159356601042621421511166686315 | 15 | 2017-11-20T09:31:26 | 90 | 15 | 1927624662 | Springfield to KC Via Columbia | Lane Departure | 35.19 | -90.04 | 70159356601042621421511166686925 | 11 | 2017-11-20T09:31:26 | 83 | 11 | 371182829 | Memphis to Little Rock | Unsafe following distance | 40.38 | -89.17 | 7015935660104262142 Kafka Connect & Streams - the Ecosystem around Kafka
  • 54.
    Demo (V) –Create JDBC Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"trucking_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }' Kafka Connect & Streams - the Ecosystem around Kafka
  • 55.
    Demo (V) –Create JDBC Connect through REST API Kafka Connect & Streams - the Ecosystem around Kafka
  • 56.
    Demo (V) -Create Table with Driver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='trucking_driver', value_format='JSON'); Message ---------------- Table created Kafka Connect & Streams - the Ecosystem around Kafka
  • 57.
    Demo (V) -Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON') AS SELECT driverid, first_name, last_name, truckid, routeid,routename, eventtype FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverid = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Memphis to Little Rock Route 2 | Unsafe tail distance 1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Joplin to Kansas City | Lane Departure 1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Saint Louis to Chicago Route2 | Unsafe tail distance Kafka Connect & Streams - the Ecosystem around Kafka
  • 58.
    Kafka and "BigData" / "Fast Data" Ecosystem Kafka Connect & Streams - the Ecosystem around Kafka
  • 59.
    Kafka and theBig Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks • Apache Spark Streaming • Apache Flink • Apache Storm • Apache Apex • Apache NiFi • StreamSets • Oracle Stream Analytics • Oracle Service Bus • Oracle GoldenGate • Oracle Event Hub Cloud Service • Debezium CDC • … Additional Info: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Kafka Connect & Streams - the Ecosystem around Kafka
  • 60.
    Kafka in SoftwareArchitecture Kafka Connect & Streams - the Ecosystem around Kafka
  • 61.
    Hadoop Clusterd Hadoop Cluster BigData Cluster Traditional Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing Kafka Connect & Streams - the Ecosystem around Kafka
  • 62.
    Event Hub Event Hub Hadoop Clusterd Hadoop Cluster BigData Cluster Event Hub – handle event stream data BI Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps SQL Search / Explore Online & Mobile Apps Search Data Flow NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing Kafka Connect & Streams - the Ecosystem around Kafka
  • 63.
    Hadoop Clusterd Hadoop Cluster BigData Cluster Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Results Parallel Batch Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Kafka Connect & Streams - the Ecosystem around Kafka
  • 64.
    Container Hadoop Clusterd Hadoop Cluster BigData Cluster Event Hub – Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Parallel Batch ProcessingDistributed Filesystem Microservice NoSQLRDBMS SQL Search BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data { } API Event Hub Event Hub Event Hub Kafka Connect & Streams - the Ecosystem around Kafka
  • 65.
    Kafka Connect &Streams - the Ecosystem around Kafka Technology on its own won't help you. You need to know how to use it properly.