Kafka replication apachecon_2013

Intra-cluster Replication for Apache Kafka Jun Rao

About myself • Engineer at LinkedIn since 2010 • Worked on Apache Kafka and Cassandra • Database researcher at IBM

Outline • Overview of Kafka • Kafka architecture • Kafka replication design • Performance • Q/A

What’s Kafka • A distributed pub/sub messaging system • Used in many places – LinkedIn, Twitter, Box, FourSquare … • What do people use it for? – log aggregation – real-time event processing – monitoring – queuing

Example Kafka Apps at LinkedIn

Kafka Deployment at LinkedIn Live data center Offline data center Live Live Live service service service interactive data (human, machine) Monitorin g Kafka Kafka Kafka Hadoop Hadoop Kafka Kafka Kafka Hadoop Per day stats • writes: 10+ billion messages (2+TB compressed data) • reads: 50+ billion messages

Kafka vs. Other Messaging Systems • Scale-out from groundup • Persistence to disks • High throughput (10s MB/sec per server) • Multi-subscription

Kafka Architecture Producer Producer Broker Broker Zookeeper Broker Broker Consumer Consumer

Terminologies • Topic = message stream • Topic has partitions – partitions distributed to brokers • Partition has a log on disk – message persisted in log – message addressed by offset

API • Producer messages = new List<KeyedMessage<K,V>>(); messages.add(newKeyedMessage(“topic1”, null, “msg1”); send(messages); • Consumer streams[] = Consumer.createMessageStream(“topic1”, 1); for(message: streams[0]) { // do something with message }

Deliver High Throughput • Simple storage logs in broker msg-1 msg-2 topic1:part1 topic2:part1 msg-3 msg-4 index segment-1 segment-1 msg-5 … … segment-2 segment-2 msg-n read() segment-n segment-n append() • Batched writes and reads • Zero-copy transfer from file to socket • Compression (batched)

Why Replication • Broker can go down – controlled: rolling restart for code/config push – uncontrolled: isolated broker failure • If broker down – some partitions unavailable – could be permanent data loss • Replication  higher availability and durability

CAP Theorem • Pick two from – consistency – availability – network partitioning

Kafka Replication: Pick CA • Brokers within a datacenter – i.e., network partitioning is rare • Strong consistency – replicas byte-wise identical • Highly available – typical failover time: < 10ms

Replicas and Layout • Partition has replicas • Replicas spread evenly among brokers logs logs logs logs topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part1 topic2-part2 topic1-part1 topic1-part2 broker 1 broker 2 broker 3 broker 4

Maintain Strongly Consistent Replicas • One of the replicas is leader • All writes go to leader • Leader propagates writes to followers in order • Leader decides when to commit message

Conventional Quorum-based Commit • Wait for majority of replicas (e.g. Zookeeper) • Plus: good latency • Minus: 2f+1 replicas  tolerate f failures – ideally want to tolerate 2f failures

Commit Messages in Kafka • Leader maintains in-sync-replicas (ISR) – initially, all replicas in ISR – message committed if received by ISR – follower fails  dropped from ISR – leader commits using new ISR • Benefit: f replicas  tolerate f-1 failures – latency less an issue within datacenter

Data Flow in Replication producer 2 ack 1 2 leader follower follower 3 commit 4 topic1-part1 topic1-part1 topic1-part1 consumer broker 1 broker 2 broker 3 When producer receives ack Latency Durabilityon failures no ack no network delay some data loss wait for leader 1 network roundtrip a few data loss wait for committed 2 network roundtrips no data loss Only committed messages exposed to consumers • independent of ack type chosen by producer

Extend to Multiple Partitions producer leader follower follower topic1-part1 topic1-part1 topic1-part1 producer leader follower follower producer topic2-part1 topic2-part1 topic2-part1 follower follower leader topic3-part1 topic3-part1 topic3-part1 broker 1 broker 2 broker 3 broker 4 • Leaders are evenly spread among brokers

Handling Follower Failures • Leader maintains last committed offset – propagated to followers – checkpointed to disk • When follower restarts – truncate log to last committed – fetch data from leader – fully caught up  added to ISR

Handling Leader Failure • Use an embedded controller (inspired by Helix) – detect broker failure via Zookeeper – on leader failure: elect new leader from ISR – committed messages not lost • Leader and ISR written to Zookeeper – for controller failover – expected to change infrequently

Example of Replica Recovery 1. ISR = {A,B,C}; Leader A commits message m1; L (A) F (B) F (C) m1 m1 m1 last committed m2 m2 m3 2. A fails and B is new leader; ISR = {B,C}; B commits m2, but not m3 L (A) L (B) F (C) m1 m1 m1 m2 m2 m2 m3 3. B commits new messages m4, m5 L (A) L (B) F (C) m1 m1 m1 m2 m2 m2 m3 m4 m4 m5 m5 4. A comes back, truncates to m1 and catches up; finally ISR = {A,B,C} F (A) L (B) F (C) F (A) L (B) F (C) m1 m1 m1 m1 m1 m1 m2 m2 m2 m2 m2 m4 m4 m4 m4 m4 m5 m5 m5 m5 m5

Setup • 3 brokers • 1 topic with 1 partition • Replication factor=3 • Message size = 1KB

Choosing btw Latency and Durability When producer Time to publish Durabilityon receives ack a message (ms) failures no ack 0.29 some data loss wait for leader 1.05 a few data loss wait for committed 2.05 no data loss

Producer Throughput varying messages per send varying # concurrent producers 70 70 60 60 50 50 MB/s MB/s 40 40 no ack no ack 30 30 20 leader 20 leader 10 committed 10 committed 0 0 1 10 100 1000 1 5 10 20 messages per send # producers

Consumer Throughput throughput vs fetch size 100 80 60 MB/s 40 20 0 1KB 10KB 100KB 1MB fetch size

Q/A • Kafka 0.8.0 (intra-cluster replication) – expected to be released in Mar – various performance improvements in the future • Checkout more about Kafka – http://kafka.apache.org/ • Kafka meetup tonight

Kafka replication apachecon_2013

More Related Content

What's hot

Similar to Kafka replication apachecon_2013

In this document

Kafka replication apachecon_2013