Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Conﬂuent Platform Components Introduction Customer Success Engineering

Agenda 3 01 Conﬂuent Platform What makes up the Conﬂuent Platform 02 Basic Concepts Events, Distributed Commit Log, Event Streaming/Processing 03 Components Brokers, Zookeeper, Clients, REST Proxy, Schema Registry ,Connect, Kafka Streams, ksqlDB, Control Center, Health+ Features Multi-Region Clusters, Tiered Storage, Cluster Linking, Self Balancing clusters 04

What is the Conﬂuent Platform? An Enterprise Event Streaming Platform built around Apache Kafka

Dynamic Performance & Elasticity Elastic Scaling | Infinite Storage Self-Balancing Clusters | Tiered Storage Flexible DevOps Automation Confluent for K8s | Ansible Playbooks Marketplace Availability Management & Monitoring Cloud Data Flow | Metrics API Control Center | Health+ Streaming Database ksqlDB Rich Pre-built Ecosystem Connectors | Hub | Schema Registry Multi-language Development Non-Java Clients | REST Proxy Admin REST APIs Global Resilience Multi AZ Clusters | 99.95% SLA | Replicator Multi-Region Clusters | Cluster Linking Data Compatibility Schema Registry | Schema Validation Enterprise-grade Security RBAC | BYOK | Private Networking Encryption | Audit Logs TCO / ROI Revenue / Cost / Risk Impact Complete Engagement Model Efficient Operations at Scale Unrestricted Developer Productivity Production-stage Prerequisites Partnership for Business Success Availability Everywhere Committer-driven Expertise Cloud service Software Fully Managed Cloud Service Self-managed Software Training Partners Enterprise Support Professional Services ARCHITECT OPERATOR DEVELOPER EXECUTIVE Apache Kafka Complete: Confluent completes Apache Kafka

Confluent Platform Components https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/ Application Sticky Load Balancer REST Proxy Proxy Kafka Brokers Broker + Rebalancer ZooKeeper Nodes ZK ZK ZK Proxy Broker + Rebalancer Broker + Rebalancer Broker + Rebalancer Schema Registry Leader Follower ZK ZK Confluent Control Center Application Clients KStreams pp Streams Kafka Connect Worker + Connectors or Replicator Microservice Worker + Connectors or Replicator ksqlDB ksqlDB Server ksqlDB Server

Apache Kafka is a Distributed Commit Log Process streams of events and produce new ones In real time, as they occur 110101 010111 001101 100010 Publish and subscribe to streams of events Similar to a message queue 110101 010111 001101 100010 Store streams of events In a fault tolerant way 110101 010111 001101 100010 12

Anatomy of a Kafka Topic 1 2 3 4 5 6 8 9 7 Partition 1 Old New 1 2 3 4 5 6 8 7 Partition 0 10 9 11 12 Partition 2 1 2 3 4 5 6 8 7 10 9 11 12 Writes 1 2 3 4 5 6 8 7 10 9 11 12 Producers Writes Consumer A (offset=4) Consumer B (offset=7) Reads

Apache Kafka - scale out and failover 16 Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4

Apache Zookeeper - cluster coordination 17 Broker 1 partition Broker 2 (controller) Broker 3 Broker 4 Zookeeper 2 partition partition Zookeeper 1 Zookeeper 3 (leader) partition partition partition partition Stores metadata: heartbeats, watches, controller elections, cluster/topic conﬁgs, permissions writes go to leader

Clients Smart Clients to dumb pipes

What happens inside a producer? 21 Producer Producer Record Topic [Partition] [Timestamp] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No non-retriable exception success metadata Yes [Headers] [Key]

Make Kafka Widely Accessible to Developers Enable all developers to leverage Kafka throughout the organization with a wide variety of Conﬂuent clients Conﬂuent Clients Battle-tested and high performing producer and consumer APIs (plus admin client)

Connect Any Application to Kafka REST Proxy Non-Java Applications Native Kafka Java Applications Schema Registry REST / HTTP Allows third-party apps to produce and consume messages Communicate via HTTP-connected devices Provides a RESTful interface to a Kafka cluster REST Proxy

29 Admin REST APIs Confluent Platform introduces REST APIs for administrative operations to simplify Kafka management Admin REST APIs add even greater flexibility in how you manage Kafka: Describe, list, and configure brokers Create, delete, describe, list, and configure topics Delete, describe, and list consumer groups Create, delete, describe, and list ACLs List partition reassignments Confluent offers several options to run admin operations, including Control Center, the CLI, and Kafka clients...

REST Proxy: Key Features API endpoints: • Produce messages: Topic : POST /topics/(string:topic_name) Partition : POST /topics/(string:topic_name)/partitions/(int:partition_id) • Consume messages (Note: requires stickyness to REST Proxy instance): Consumer Group : GET /consumers/(string:group_name)/instances/(string:instance)/records • Consumer group management: Add/Remove Instances: POST /consumers/(string:group_name), DELETE /consumers/(string:group_name)/instances/(string:instance) Commit/Get Offsets : POST or GET /consumers/(string:group_name)/instances/(string:instance)/offsets Modify Subscriptions: POST, GET or DELETE /consumers/(string:group_name)/instances/(string:instance)/subscription Modify Assignments : POST or GET /consumers/(string:group_name)/instances/(string:instance)/assignments Reposition : POST or GET /consumers/(string:group_name)/instances/(string:instance)/positions • Get Metadata: Topic : GET /topics, GET /topics/(string:topic_name) Partition : GET /topics/(string:topic_name)/partitions/(int:partition_id) Broker : GET /brokers • Admin functions (preview): Create Topic : POST /clusters/(string:cluster_id)/topics Delete Topic: DELETE /clusters/(string:cluster_id)/topics/(string:topic_name) List Topic Conﬁgs: Partition : GET /clusters/(string:cluster_id)/topics/(string:topic_name)/conﬁgs 30

Conﬂuent Schema Registry Enforce Producer/Consumer compatibility

Many sources without a policy causes mayhem in a centralized data pipeline Ensuring downstream systems can use the data is key to an operational stream pipeline Even within a single application, different formats can be presented Incompatibly formatted message The Challenge of Data Compatibility at Scale App 03 App 02 App 01 32

Enable Application Development Compatibility App 1 ! Schema Registry Kafka topic ! Serializer App 1 Serializer Develop using standard schemas • Store and share a versioned history of all standard schemas • Validate data compatibility at the client level Reduce operational complexity • Avoid time-consuming coordination among developers to standardize on schemas Schema Registry

Schema Registry: Key Features • Manage schemas and enforce schema policies Define, per Kafka topic, a set of compatible schemas that are “allowed” Schemas can be defined by an admin or by clients at runtime Avro, Protobuf, and JSON schemas all supported • Automatic validation when data is written to a topic If the data doesn’t match the schema, the producer gets an error • Works transparently When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams • Integrates with Kafka Connect • Integrates with Kafka Streams • Supports high availability (within a datacenter) • Supports multi-datacenter deployments 34

Kafka Connect No/Low Code connectivity to many systems

Kafka Connect No-Code way of connecting known systems (databases, object storage, queues, etc) to Apache Kafka Some code can be written to do custom transforms and data conversions though maybe out of the box Single Message Transforms and Converters exist Kafka Connect Kafka Connect Data sources Data sinks

Kafka Cluster Kafka Connect Durable Data Pipelines Schema Registry Worker Integrate upstream and downstream systems with Apache Kafka® • Capture schema from sources, use schema to inform data sinks • Highly Available workers ensure data pipelines aren’t interrupted • Extensible framework API for building custom connectors Kafka Connect Worker Worker Worker

Instantly Connect Popular Data Sources & Sinks Data Diode 210+ pre-built connectors 90+ Confluent Supported 60+ Partner Supported, Confluent Verified

Confluent HUB Easily browse connectors by: • Source vs Sinks • Confluent vs Partner supported • Commercial vs Free • Available in Confluent Cloud confluent.io/hub Instantly Connect Popular Data Sources & Sinks

Kafka Connect Connectors are reusable components that know how to talk to speciﬁc sources and sinks.

Kafka Streams Build apps which with stream processing inside

Stream Processing by Analogy 46 Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt

Kafka Streams Scalable Stream Processing Build scalable, durable stream-processing services with the Kafka Streams Java Library • Simple functional API • Powerful Processing API • No Framework needed, it’s a Library, use it and deploy it as any other JVM Library builder.stream(inputTopic) .map((k, v) -> new KeyValue<>( (String) v.getAccountId(), (Integer) v.getTotalValue()) ) .groupByKey() .count() .toStream().to(outputTopic);

Where does the processing code run? 49 Brokers? Nope! App Streams API Same app, many instances App Streams API App Streams API

Leverages Consumer Group Protocol 50 App Streams API Same app, many instances App Streams API App Streams API

ksqlDB Stream processing using SQL and much more

Stream Processing in Kafka 52 Flexibility Simplicity Producer/Consume r Kafka Streams API ● subscribe() ● poll() ● send() ● ﬂush() ● ﬁlter() ● map() ● join() ● aggregate() ksqlDB ● Select…from… ● Join…where… ● Group by..

ksqlDB provides one solution for capturing events, stream processing, and serving both push and pull queries Simplify Your Stream Processing Architecture DB APP APP DB PULL PUSH CONNECTORS STREAM PROCESSING STATE STORES ksqlDB 1 2 APP

Streaming app with 4 SQL statements 59 Serve lookups against materialized views Create materialized views Perform continuous transformations CREATE SOURCE CONNECTOR jdbcConnector WITH ( ‘connector.class’ = '...JdbcSourceConnector', ‘connection.url’ = '...', …); CREATE STREAM purchases AS SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') FROM pageviews; CREATE TABLE orders_by_country AS SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total FROM purchases WINDOW TUMBLING (SIZE 5 MINUTES) LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id GROUP BY country EMIT CHANGES; SELECT * FROM orders_by_country WHERE country='usa'; Capture data

Conﬂuent Control Center The simplest way to operate and build applications with Apache Kafka For Operators Centrally manage and monitor multi-cluster environments and security For Developers View messages, topics and schemas, manage connectors and build ksqlDB queries

Messages Browse messages, and search offsets or timestamps by partition Topics Create, edit, delete and view all topics in one place Schemas Create, edit and view topic schemas, and compare schema versions Accelerate Application Development and Integration

Adhere to Established Event Streaming SLAs Monitor and optimize system health • Broker and ZooKeeper uptime • Under replicated partitions • Out of sync replicas • Disk usage and distribution • Alerting Broker overview Cluster overview

Accelerate Application Development and Integration Simplify the developer’s mental model for ksqlDB • View a summary of all clusters • Develop and run queries • Support multiple ksqlDB clusters at a time Query editor

What is Health+? ● Intelligent alerts: manage Health+ intelligent alerts via Confluent Cloud’s UI ● Accelerated Confluent Support: Support uses the performance metadata to help you with questions or problems even faster. ● Monitoring dashboards: view all of your critical metrics in a single cloud-based dashboard. ● Confluent Telemetry Reporter: send performance metadata back to Confluent via Confluent Telemetry Reporter plugin.

Intelligent alerts There are 50+ alerts available today (with many more to come), including: ● Request handler idle percentage ● Network processor idle percentage ● Active controller count ● Ofﬂine partitions ● Unclean leader elections ● Under replicated partitions ● Under min in-sync replicas ● Disk usage ● Unused topics ● No metrics from cluster in one hour

How does it work ? ● Confluent Telemetry Reporter is a plugin that runs inside each Confluent Platform service (only brokers at the moment) to push metadata about the service to Confluent ● Data is sent over HTTP using an encrypted connection, once per minute by default ● _confluent-telemetry-metrics topic is where metrics are stored

72 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalances to improve Kafka’s performance, elasticity, and ease of operations Shrinkage Uneven load Expansion Rebalances are required regularly to optimize cluster performance:

73 Self-Balancing Clusters Self-Balancing Clusters automate partition rebalances to improve Kafka’s performance, elasticity, and ease of operations Manual Rebalance Process: $ cat partitions-to-move.json { "partitions": [{ "topic": "foo", "partition": 1, "replicas": [1, 2, 4] }, ...], "version": 1 } $ kafka-reassign-partitions ... Conﬂuent Platform: No complex math, no risk of human error Self-Balancing

75 Tiered Storage Tiered Storage enables inﬁnite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Event Streaming is storage-intensive: ... Micro- service ... SFDC App Splunk ... Device Logs Object Storage Main- frame ... Hadoop Data Stores 3rd Party Apps Custom Apps / Microservices Logs

76 Tiered Storage Tiered Storage enables inﬁnite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage allows Kafka to recognize two layers of storage: Brokers Cost-effective Object Storage Ofﬂoad old data to object store

77 Tiered Storage Tiered Storage enables infinite data retention and elastic scalability by decoupling the compute and storage layers in Kafka Tiered Storage delivers three primary benefits that revolutionize the way our customers experience Kafka: Infinite data retention Reimagine what event streaming apps can do Reduced infrastructure costs Offload data to cost-effective object storage Platform elasticity Scale compute and storage independently

80 Multi-Region Cluster (MRC) A cluster stretched across multiple regions that can replicate synchronously and asynchronously. It requires 3 data centers/regions minimum (at least for zookeeper). It is offset preserving and has automatic client failover with no custom code. Note: A rack can only have synchronous or asynchronous replicas for a topic, not both. But you can have multiple racks in a DC/Zone

84 Cluster Linking Cluster Linking allows you to directly connect clusters together and mirror topics from one cluster to another without the need for Connect. Cluster Linking makes it much easier to build multi-datacenter, multi-cluster, and hybrid cloud deployments.

Sharing data between independent clusters or migrating clusters presents two challenges: 1. Requires deploying a separate Connect cluster 2. Offsets are not preserved, so messages are at risk of being skipped or reread 85 Cluster Linking Cluster Linking simpliﬁes hybrid cloud and multi-cloud deployments for Kafka 1 2 0 1 2 3 4 ... 4 5 6 7 8 ... Topic 1, DC 1: Topic 1, DC 2: DC 1: DC 2:

86 Cluster Linking Cluster Linking simpliﬁes hybrid cloud and multi-cloud deployments for Kafka Cluster Linking requires no additional infrastructure and preserves offsets: Migrate clusters to Conﬂuent Cloud

Questions? ableasdale@conﬂuent.io

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

More Related Content

What's hot

Similar to Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

More from confluent

Recently uploaded

In this document

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME