Confluent Platform Components Introduction Customer Success Engineering
Agenda 3 01 Confluent Platform What makes up the Confluent Platform 02 Basic Concepts Events, Distributed Commit Log, Event Streaming/Processing 03 Components Brokers, Zookeeper, Clients, REST Proxy, Schema Registry ,Connect, Kafka Streams, ksqlDB, Control Center, Health+ Features Multi-Region Clusters, Tiered Storage, Cluster Linking, Self Balancing clusters 04
Motivation 4
Destination
What is the Confluent Platform? An Enterprise Event Streaming Platform built around Apache Kafka
Dynamic Performance & Elasticity Elastic Scaling | Infinite Storage Self-Balancing Clusters | Tiered Storage Flexible DevOps Automation Confluent for K8s | Ansible Playbooks Marketplace Availability Management & Monitoring Cloud Data Flow | Metrics API Control Center | Health+ Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Admin REST APIs Global Resilience Multi AZ Clusters | 99.95% SLA | Replicator Multi-Region Clusters | Cluster Linking Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | BYOK | Private Networking Encryption | Audit Logs TCO / ROI Revenue / Cost / Risk Impact Complete Engagement Model Efficient Operations at Scale Unrestricted Developer Productivity Production-stage Prerequisites Partnership for Business Success Availability Everywhere Committer-driven Expertise Cloud service Software Fully Managed Cloud Service Self-managed Software Training Partners Enterprise Support Professional Services ARCHITECT OPERATOR DEVELOPER EXECUTIVE Apache Kafka Complete: Confluent completes Apache Kafka
Basic Concepts
Apache Kafka 10
Confluent Platform Components https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/ Application Sticky Load Balancer REST Proxy Proxy Kafka Brokers Broker + Rebalancer ZooKeeper Nodes ZK ZK ZK Proxy Broker + Rebalancer Broker + Rebalancer Broker + Rebalancer Schema Registry Leader Follower ZK ZK Confluent Control Center Application Clients KStreams pp Streams Kafka Connect Worker + Connectors or Replicator Microservice Worker + Connectors or Replicator ksqlDB ksqlDB Server ksqlDB Server
Apache Kafka is a Distributed Commit Log Process streams of events and produce new ones In real time, as they occur 110101 010111 001101 100010 Publish and subscribe to streams of events Similar to a message queue 110101 010111 001101 100010 Store streams of events In a fault tolerant way 110101 010111 001101 100010 12
Anatomy of a Kafka Topic 1 2 3 4 5 6 8 9 7 Partition 1 Old New 1 2 3 4 5 6 8 7 Partition 0 10 9 11 12 Partition 2 1 2 3 4 5 6 8 7 10 9 11 12 Writes 1 2 3 4 5 6 8 7 10 9 11 12 Producers Writes Consumer A (offset=4) Consumer B (offset=7) Reads
Components
Brokers & Zookeeper
Apache Kafka - scale out and failover 16 Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
Apache Zookeeper - cluster coordination 17 Broker 1 partition Broker 2 (controller) Broker 3 Broker 4 Zookeeper 2 partition partition Zookeeper 1 Zookeeper 3 (leader) partition partition partition partition Stores metadata: heartbeats, watches, controller elections, cluster/topic configs, permissions writes go to leader
Clients Smart Clients to dumb pipes
What happens inside a producer? 21 Producer Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No non-retriable exception success metadata Yes [Headers] [Key]
Make Kafka Widely Accessible to Developers Enable all developers to leverage Kafka throughout the organization with a wide variety of Confluent clients Confluent Clients Battle-tested and high performing producer and consumer APIs (plus admin client)
REST Proxy
Connect Any Application to Kafka REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Allows third-party apps to produce and consume messages Communicate via HTTP-connected devices Provides a RESTful interface to a Kafka cluster REST Proxy
29 Admin REST APIs Confluent Platform introduces REST APIs for administrative operations to simplify Kafka management Admin REST APIs add even greater flexibility in how you manage Kafka: Describe, list, and configure brokers Create, delete, describe, list, and configure topics Delete, describe, and list consumer groups Create, delete, describe, and list ACLs List partition reassignments Confluent offers several options to run admin operations, including Control Center, the CLI, and Kafka clients...
REST Proxy: Key Features API endpoints: • Produce messages: Topic : POST /topics/(string:topic_name) Partition : POST /topics/(string:topic_name)/partitions/(int:partition_id) • Consume messages (Note: requires stickyness to REST Proxy instance): Consumer Group : GET /consumers/(string:group_name)/instances/(string:instance)/records • Consumer group management: Add/Remove Instances: POST /consumers/(string:group_name), DELETE /consumers/(string:group_name)/instances/(string:instance) Commit/Get Offsets : POST or GET /consumers/(string:group_name)/instances/(string:instance)/offsets Modify Subscriptions: POST, GET or DELETE /consumers/(string:group_name)/instances/(string:instance)/subscription Modify Assignments : POST or GET /consumers/(string:group_name)/instances/(string:instance)/assignments Reposition : POST or GET /consumers/(string:group_name)/instances/(string:instance)/positions • Get Metadata: Topic : GET /topics, GET /topics/(string:topic_name) Partition : GET /topics/(string:topic_name)/partitions/(int:partition_id) Broker : GET /brokers • Admin functions (preview): Create Topic : POST /clusters/(string:cluster_id)/topics Delete Topic: DELETE /clusters/(string:cluster_id)/topics/(string:topic_name) List Topic Configs: Partition : GET /clusters/(string:cluster_id)/topics/(string:topic_name)/configs 30
Confluent Schema Registry Enforce Producer/Consumer compatibility
Many sources without a policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Even within a single application, different formats can be presented Incompatibly formatted message The Challenge of Data Compatibility at Scale App 03 App 02 App 01 32
Enable Application Development Compatibility App 1 ! Schema Registry Kafka topic ! Serializer App 1 Serializer Develop using standard schemas • Store and share a versioned history of all standard schemas • Validate data compatibility at the client level Reduce operational complexity • Avoid time-consuming coordination among developers to standardize on schemas Schema Registry
Schema Registry: Key Features • Manage schemas and enforce schema policies Define, per Kafka topic, a set of compatible schemas that are “allowed” Schemas can be defined by an admin or by clients at runtime Avro, Protobuf, and JSON schemas all supported • Automatic validation when data is written to a topic If the data doesn’t match the schema, the producer gets an error • Works transparently When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams • Integrates with Kafka Connect • Integrates with Kafka Streams • Supports high availability (within a datacenter) • Supports multi-datacenter deployments 34
Kafka Connect No/Low Code connectivity to many systems
Kafka Connect No-Code way of connecting known systems (databases, object storage, queues, etc) to Apache Kafka Some code can be written to do custom transforms and data conversions though maybe out of the box Single Message Transforms and Converters exist Kafka Connect Kafka Connect Data sources Data sinks
Kafka Cluster Kafka Connect Durable Data Pipelines Schema Registry Worker Integrate upstream and downstream systems with Apache Kafka® • Capture schema from sources, use schema to inform data sinks • Highly Available workers ensure data pipelines aren’t interrupted • Extensible framework API for building custom connectors Kafka Connect Worker Worker Worker
Instantly Connect Popular Data Sources & Sinks Data Diode 210+ pre-built connectors 90+ Confluent Supported 60+ Partner Supported, Confluent Verified
Confluent HUB Easily browse connectors by: • Source vs Sinks • Confluent vs Partner supported • Commercial vs Free • Available in Confluent Cloud confluent.io/hub Instantly Connect Popular Data Sources & Sinks
Kafka Connect Connectors are reusable components that know how to talk to specific sources and sinks.
Kafka Streams Build apps which with stream processing inside
Stream Processing by Analogy 46 Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt
Kafka Streams Scalable Stream Processing Build scalable, durable stream-processing services with the Kafka Streams Java Library • Simple functional API • Powerful Processing API • No Framework needed, it’s a Library, use it and deploy it as any other JVM Library builder.stream(inputTopic) .map((k, v) -> new KeyValue<>( (String) v.getAccountId(), (Integer) v.getTotalValue()) ) .groupByKey() .count() .toStream().to(outputTopic);
Where does the processing code run? 49 Brokers? Nope! App Streams API Same app, many instances App Streams API App Streams API
Leverages Consumer Group Protocol 50 App Streams API Same app, many instances App Streams API App Streams API
ksqlDB Stream processing using SQL and much more
Stream Processing in Kafka 52 Flexibility Simplicity Producer/Consume r Kafka Streams API ● subscribe() ● poll() ● send() ● flush() ● filter() ● map() ● join() ● aggregate() ksqlDB ● Select…from… ● Join…where… ● Group by..
ksqlDB provides one solution for capturing events, stream processing, and serving both push and pull queries Simplify Your Stream Processing Architecture DB APP APP DB PULL PUSH CONNECTORS STREAM PROCESSING STATE STORES ksqlDB 1 2 APP
Streaming app with 4 SQL statements 59 Serve lookups against materialized views Create materialized views Perform continuous transformations CREATE SOURCE CONNECTOR jdbcConnector WITH ( ‘connector.class’ = '...JdbcSourceConnector', ‘connection.url’ = '...', …); CREATE STREAM purchases AS SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') FROM pageviews; CREATE TABLE orders_by_country AS SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total FROM purchases WINDOW TUMBLING (SIZE 5 MINUTES) LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id GROUP BY country EMIT CHANGES; SELECT * FROM orders_by_country WHERE country='usa'; Capture data
Confluent Control Center
Confluent Control Center The simplest way to operate and build applications with Apache Kafka For Operators Centrally manage and monitor multi-cluster environments and security For Developers View messages, topics and schemas, manage connectors and build ksqlDB queries
Messages Browse messages, and search offsets or timestamps by partition Topics Create, edit, delete and view all topics in one place Schemas Create, edit and view topic schemas, and compare schema versions Accelerate Application Development and Integration
Adhere to Established Event Streaming SLAs Monitor and optimize system health • Broker and ZooKeeper uptime • Under replicated partitions • Out of sync replicas • Disk usage and distribution • Alerting Broker overview Cluster overview
Accelerate Application Development and Integration Simplify the developer’s mental model for ksqlDB • View a summary of all clusters • Develop and run queries • Support multiple ksqlDB clusters at a time Query editor
Health+
What is Health+? ● Intelligent alerts: manage Health+ intelligent alerts via Confluent Cloud’s UI ● Accelerated Confluent Support: Support uses the performance metadata to help you with questions or problems even faster. ● Monitoring dashboards: view all of your critical metrics in a single cloud-based dashboard. ● Confluent Telemetry Reporter: send performance metadata back to Confluent via Confluent Telemetry Reporter plugin.
Intelligent alerts There are 50+ alerts available today (with many more to come), including: ● Request handler idle percentage ● Network processor idle percentage ● Active controller count ● Offline partitions ● Unclean leader elections ● Under replicated partitions ● Under min in-sync replicas ● Disk usage ● Unused topics ● No metrics from cluster in one hour
How does it work ? ● Confluent Telemetry Reporter is a plugin that runs inside each Confluent Platform service (only brokers at the moment) to push metadata about the service to Confluent ● Data is sent over HTTP using an encrypted connection, once per minute by default ● _confluent-telemetry-metrics topic is where metrics are stored
Self Balancing clusters
72 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalances to improve Kafka’s performance, elasticity, and ease of operations Shrinkage Uneven load Expansion Rebalances are required regularly to optimize cluster performance:
73 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalances to improve Kafka’s performance, elasticity, and ease of operations Manual Rebalance Process: $ cat partitions-to-move.json { "partitions": [{ "topic": "foo", "partition": 1, "replicas": [1, 2, 4] }, ...], "version": 1 } $ kafka-reassign-partitions ... Confluent Platform: No complex math, no risk of human error Self-Balancing
Tiered Storage
75 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Event Streaming is storage-intensive: ... Micro- service ... SFDC App Splunk ... Device Logs Object Storage Main- frame ... Hadoop Data Stores 3rd Party Apps Custom Apps / Microservices Logs
76 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage allows Kafka to recognize two layers of storage: Brokers Cost-effective Object Storage Offload old data to object store
77 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage delivers three primary benefits that revolutionize the way our customers experience Kafka: Infinite data retention Reimagine what event streaming apps can do Reduced infrastructure costs Offload data to cost-effective object storage Platform elasticity Scale compute and storage independently
Multi-Region Clusters
80 Multi-Region Cluster (MRC) A cluster stretched across multiple regions that can replicate synchronously and asynchronously. It requires 3 data centers/regions minimum (at least for zookeeper). It is offset preserving and has automatic client failover with no custom code. Note: A rack can only have synchronous or asynchronous replicas for a topic, not both. But you can have multiple racks in a DC/Zone
Cluster Linking
84 Cluster Linking Cluster Linking allows you to directly connect clusters together and mirror topics from one cluster to another without the need for Connect. Cluster Linking makes it much easier to build multi-datacenter, multi-cluster, and hybrid cloud deployments.
Sharing data between independent clusters or migrating clusters presents two challenges: 1. Requires deploying a separate Connect cluster 2. Offsets are not preserved, so messages are at risk of being skipped or reread 85 Cluster Linking Cluster Linking simplifies hybrid cloud and multi-cloud deployments for Kafka 1 2 0 1 2 3 4 ... 4 5 6 7 8 ... Topic 1, DC 1: Topic 1, DC 2: DC 1: DC 2:
86 Cluster Linking Cluster Linking simplifies hybrid cloud and multi-cloud deployments for Kafka Cluster Linking requires no additional infrastructure and preserves offsets: Migrate clusters to Confluent Cloud
Questions? ableasdale@confluent.io
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

  • 1.
  • 2.
    Agenda 3 01 Confluent Platform What makesup the Confluent Platform 02 Basic Concepts Events, Distributed Commit Log, Event Streaming/Processing 03 Components Brokers, Zookeeper, Clients, REST Proxy, Schema Registry ,Connect, Kafka Streams, ksqlDB, Control Center, Health+ Features Multi-Region Clusters, Tiered Storage, Cluster Linking, Self Balancing clusters 04
  • 3.
  • 4.
  • 5.
    What is theConfluent Platform? An Enterprise Event Streaming Platform built around Apache Kafka
  • 6.
    Dynamic Performance &Elasticity Elastic Scaling | Infinite Storage Self-Balancing Clusters | Tiered Storage Flexible DevOps Automation Confluent for K8s | Ansible Playbooks Marketplace Availability Management & Monitoring Cloud Data Flow | Metrics API Control Center | Health+ Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Admin REST APIs Global Resilience Multi AZ Clusters | 99.95% SLA | Replicator Multi-Region Clusters | Cluster Linking Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | BYOK | Private Networking Encryption | Audit Logs TCO / ROI Revenue / Cost / Risk Impact Complete Engagement Model Efficient Operations at Scale Unrestricted Developer Productivity Production-stage Prerequisites Partnership for Business Success Availability Everywhere Committer-driven Expertise Cloud service Software Fully Managed Cloud Service Self-managed Software Training Partners Enterprise Support Professional Services ARCHITECT OPERATOR DEVELOPER EXECUTIVE Apache Kafka Complete: Confluent completes Apache Kafka
  • 7.
  • 8.
  • 9.
    Confluent Platform Components https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/ Application StickyLoad Balancer REST Proxy Proxy Kafka Brokers Broker + Rebalancer ZooKeeper Nodes ZK ZK ZK Proxy Broker + Rebalancer Broker + Rebalancer Broker + Rebalancer Schema Registry Leader Follower ZK ZK Confluent Control Center Application Clients KStreams pp Streams Kafka Connect Worker + Connectors or Replicator Microservice Worker + Connectors or Replicator ksqlDB ksqlDB Server ksqlDB Server
  • 10.
    Apache Kafka isa Distributed Commit Log Process streams of events and produce new ones In real time, as they occur 110101 010111 001101 100010 Publish and subscribe to streams of events Similar to a message queue 110101 010111 001101 100010 Store streams of events In a fault tolerant way 110101 010111 001101 100010 12
  • 11.
    Anatomy of aKafka Topic 1 2 3 4 5 6 8 9 7 Partition 1 Old New 1 2 3 4 5 6 8 7 Partition 0 10 9 11 12 Partition 2 1 2 3 4 5 6 8 7 10 9 11 12 Writes 1 2 3 4 5 6 8 7 10 9 11 12 Producers Writes Consumer A (offset=4) Consumer B (offset=7) Reads
  • 12.
  • 13.
  • 14.
    Apache Kafka -scale out and failover 16 Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 15.
    Apache Zookeeper -cluster coordination 17 Broker 1 partition Broker 2 (controller) Broker 3 Broker 4 Zookeeper 2 partition partition Zookeeper 1 Zookeeper 3 (leader) partition partition partition partition Stores metadata: heartbeats, watches, controller elections, cluster/topic configs, permissions writes go to leader
  • 16.
  • 17.
    What happens insidea producer? 21 Producer Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No non-retriable exception success metadata Yes [Headers] [Key]
  • 18.
    Make Kafka Widely Accessible toDevelopers Enable all developers to leverage Kafka throughout the organization with a wide variety of Confluent clients Confluent Clients Battle-tested and high performing producer and consumer APIs (plus admin client)
  • 19.
  • 20.
    Connect Any Application toKafka REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Allows third-party apps to produce and consume messages Communicate via HTTP-connected devices Provides a RESTful interface to a Kafka cluster REST Proxy
  • 21.
    29 Admin REST APIs ConfluentPlatform introduces REST APIs for administrative operations to simplify Kafka management Admin REST APIs add even greater flexibility in how you manage Kafka: Describe, list, and configure brokers Create, delete, describe, list, and configure topics Delete, describe, and list consumer groups Create, delete, describe, and list ACLs List partition reassignments Confluent offers several options to run admin operations, including Control Center, the CLI, and Kafka clients...
  • 22.
    REST Proxy: KeyFeatures API endpoints: • Produce messages: Topic : POST /topics/(string:topic_name) Partition : POST /topics/(string:topic_name)/partitions/(int:partition_id) • Consume messages (Note: requires stickyness to REST Proxy instance): Consumer Group : GET /consumers/(string:group_name)/instances/(string:instance)/records • Consumer group management: Add/Remove Instances: POST /consumers/(string:group_name), DELETE /consumers/(string:group_name)/instances/(string:instance) Commit/Get Offsets : POST or GET /consumers/(string:group_name)/instances/(string:instance)/offsets Modify Subscriptions: POST, GET or DELETE /consumers/(string:group_name)/instances/(string:instance)/subscription Modify Assignments : POST or GET /consumers/(string:group_name)/instances/(string:instance)/assignments Reposition : POST or GET /consumers/(string:group_name)/instances/(string:instance)/positions • Get Metadata: Topic : GET /topics, GET /topics/(string:topic_name) Partition : GET /topics/(string:topic_name)/partitions/(int:partition_id) Broker : GET /brokers • Admin functions (preview): Create Topic : POST /clusters/(string:cluster_id)/topics Delete Topic: DELETE /clusters/(string:cluster_id)/topics/(string:topic_name) List Topic Configs: Partition : GET /clusters/(string:cluster_id)/topics/(string:topic_name)/configs 30
  • 23.
    Confluent Schema Registry EnforceProducer/Consumer compatibility
  • 24.
    Many sources withouta policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Even within a single application, different formats can be presented Incompatibly formatted message The Challenge of Data Compatibility at Scale App 03 App 02 App 01 32
  • 25.
    Enable Application Development Compatibility App 1 ! Schema Registry Kafka topic ! Serializer App1 Serializer Develop using standard schemas • Store and share a versioned history of all standard schemas • Validate data compatibility at the client level Reduce operational complexity • Avoid time-consuming coordination among developers to standardize on schemas Schema Registry
  • 26.
    Schema Registry: KeyFeatures • Manage schemas and enforce schema policies Define, per Kafka topic, a set of compatible schemas that are “allowed” Schemas can be defined by an admin or by clients at runtime Avro, Protobuf, and JSON schemas all supported • Automatic validation when data is written to a topic If the data doesn’t match the schema, the producer gets an error • Works transparently When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams • Integrates with Kafka Connect • Integrates with Kafka Streams • Supports high availability (within a datacenter) • Supports multi-datacenter deployments 34
  • 27.
    Kafka Connect No/Low Codeconnectivity to many systems
  • 28.
    Kafka Connect No-Code wayof connecting known systems (databases, object storage, queues, etc) to Apache Kafka Some code can be written to do custom transforms and data conversions though maybe out of the box Single Message Transforms and Converters exist Kafka Connect Kafka Connect Data sources Data sinks
  • 29.
    Kafka Cluster Kafka Connect Durable Data Pipelines Schema Registry Worker Integrateupstream and downstream systems with Apache Kafka® • Capture schema from sources, use schema to inform data sinks • Highly Available workers ensure data pipelines aren’t interrupted • Extensible framework API for building custom connectors Kafka Connect Worker Worker Worker
  • 30.
    Instantly Connect PopularData Sources & Sinks Data Diode 210+ pre-built connectors 90+ Confluent Supported 60+ Partner Supported, Confluent Verified
  • 31.
    Confluent HUB Easily browseconnectors by: • Source vs Sinks • Confluent vs Partner supported • Commercial vs Free • Available in Confluent Cloud confluent.io/hub Instantly Connect Popular Data Sources & Sinks
  • 32.
    Kafka Connect Connectors arereusable components that know how to talk to specific sources and sinks.
  • 33.
    Kafka Streams Build appswhich with stream processing inside
  • 34.
    Stream Processing byAnalogy 46 Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt
  • 35.
    Kafka Streams Scalable Stream Processing Buildscalable, durable stream-processing services with the Kafka Streams Java Library • Simple functional API • Powerful Processing API • No Framework needed, it’s a Library, use it and deploy it as any other JVM Library builder.stream(inputTopic) .map((k, v) -> new KeyValue<>( (String) v.getAccountId(), (Integer) v.getTotalValue()) ) .groupByKey() .count() .toStream().to(outputTopic);
  • 36.
    Where does theprocessing code run? 49 Brokers? Nope! App Streams API Same app, many instances App Streams API App Streams API
  • 37.
    Leverages Consumer GroupProtocol 50 App Streams API Same app, many instances App Streams API App Streams API
  • 38.
  • 39.
    Stream Processing inKafka 52 Flexibility Simplicity Producer/Consume r Kafka Streams API ● subscribe() ● poll() ● send() ● flush() ● filter() ● map() ● join() ● aggregate() ksqlDB ● Select…from… ● Join…where… ● Group by..
  • 40.
    ksqlDB provides onesolution for capturing events, stream processing, and serving both push and pull queries Simplify Your Stream Processing Architecture DB APP APP DB PULL PUSH CONNECTORS STREAM PROCESSING STATE STORES ksqlDB 1 2 APP
  • 41.
    Streaming app with4 SQL statements 59 Serve lookups against materialized views Create materialized views Perform continuous transformations CREATE SOURCE CONNECTOR jdbcConnector WITH ( ‘connector.class’ = '...JdbcSourceConnector', ‘connection.url’ = '...', …); CREATE STREAM purchases AS SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') FROM pageviews; CREATE TABLE orders_by_country AS SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total FROM purchases WINDOW TUMBLING (SIZE 5 MINUTES) LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id GROUP BY country EMIT CHANGES; SELECT * FROM orders_by_country WHERE country='usa'; Capture data
  • 42.
  • 43.
    Confluent Control Center The simplestway to operate and build applications with Apache Kafka For Operators Centrally manage and monitor multi-cluster environments and security For Developers View messages, topics and schemas, manage connectors and build ksqlDB queries
  • 44.
    Messages Browse messages, andsearch offsets or timestamps by partition Topics Create, edit, delete and view all topics in one place Schemas Create, edit and view topic schemas, and compare schema versions Accelerate Application Development and Integration
  • 45.
    Adhere to Established EventStreaming SLAs Monitor and optimize system health • Broker and ZooKeeper uptime • Under replicated partitions • Out of sync replicas • Disk usage and distribution • Alerting Broker overview Cluster overview
  • 46.
    Accelerate Application Development and Integration Simplifythe developer’s mental model for ksqlDB • View a summary of all clusters • Develop and run queries • Support multiple ksqlDB clusters at a time Query editor
  • 47.
  • 48.
    What is Health+? ●Intelligent alerts: manage Health+ intelligent alerts via Confluent Cloud’s UI ● Accelerated Confluent Support: Support uses the performance metadata to help you with questions or problems even faster. ● Monitoring dashboards: view all of your critical metrics in a single cloud-based dashboard. ● Confluent Telemetry Reporter: send performance metadata back to Confluent via Confluent Telemetry Reporter plugin.
  • 49.
    Intelligent alerts Thereare 50+ alerts available today (with many more to come), including: ● Request handler idle percentage ● Network processor idle percentage ● Active controller count ● Offline partitions ● Unclean leader elections ● Under replicated partitions ● Under min in-sync replicas ● Disk usage ● Unused topics ● No metrics from cluster in one hour
  • 50.
    How does itwork ? ● Confluent Telemetry Reporter is a plugin that runs inside each Confluent Platform service (only brokers at the moment) to push metadata about the service to Confluent ● Data is sent over HTTP using an encrypted connection, once per minute by default ● _confluent-telemetry-metrics topic is where metrics are stored
  • 51.
  • 52.
    72 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalancesto improve Kafka’s performance, elasticity, and ease of operations Shrinkage Uneven load Expansion Rebalances are required regularly to optimize cluster performance:
  • 53.
    73 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalancesto improve Kafka’s performance, elasticity, and ease of operations Manual Rebalance Process: $ cat partitions-to-move.json { "partitions": [{ "topic": "foo", "partition": 1, "replicas": [1, 2, 4] }, ...], "version": 1 } $ kafka-reassign-partitions ... Confluent Platform: No complex math, no risk of human error Self-Balancing
  • 54.
  • 55.
    75 Tiered Storage Tiered Storageenables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Event Streaming is storage-intensive: ... Micro- service ... SFDC App Splunk ... Device Logs Object Storage Main- frame ... Hadoop Data Stores 3rd Party Apps Custom Apps / Microservices Logs
  • 56.
    76 Tiered Storage Tiered Storageenables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage allows Kafka to recognize two layers of storage: Brokers Cost-effective Object Storage Offload old data to object store
  • 57.
    77 Tiered Storage Tiered Storageenables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage delivers three primary benefits that revolutionize the way our customers experience Kafka: Infinite data retention Reimagine what event streaming apps can do Reduced infrastructure costs Offload data to cost-effective object storage Platform elasticity Scale compute and storage independently
  • 58.
  • 59.
    80 Multi-Region Cluster (MRC) A clusterstretched across multiple regions that can replicate synchronously and asynchronously. It requires 3 data centers/regions minimum (at least for zookeeper). It is offset preserving and has automatic client failover with no custom code. Note: A rack can only have synchronous or asynchronous replicas for a topic, not both. But you can have multiple racks in a DC/Zone
  • 60.
  • 61.
    84 Cluster Linking Cluster Linkingallows you to directly connect clusters together and mirror topics from one cluster to another without the need for Connect. Cluster Linking makes it much easier to build multi-datacenter, multi-cluster, and hybrid cloud deployments.
  • 62.
    Sharing data betweenindependent clusters or migrating clusters presents two challenges: 1. Requires deploying a separate Connect cluster 2. Offsets are not preserved, so messages are at risk of being skipped or reread 85 Cluster Linking Cluster Linking simplifies hybrid cloud and multi-cloud deployments for Kafka 1 2 0 1 2 3 4 ... 4 5 6 7 8 ... Topic 1, DC 1: Topic 1, DC 2: DC 1: DC 2:
  • 63.
    86 Cluster Linking Cluster Linking simplifieshybrid cloud and multi-cloud deployments for Kafka Cluster Linking requires no additional infrastructure and preserves offsets: Migrate clusters to Confluent Cloud
  • 64.