We have got 3 kafka brokers and topic with 40 partitions and replication factor set to 1. After uncontrolled kafka broker shutdown for some partition we see that it wasn't possible to elect new leader (see logs below). Eventually we cannot read from the topic. Please advise, if it is possible to survive such kind of crash without changing replication factor to bigger than 1.
We want to have a consistent state of our target database (created on the base on events from kafka topic) so we have also set parameter unclean.leader.election.enable to false.
Partition info after crash:
extenr-topic:1:882091242 extenr-topic:19:882091615 extenr-topic:28:882092273 Error: partition 18 does not have a leader. Skip getting offsets Error: partition 27 does not have a leader. Skip getting offsets Error: partition 36 does not have a leader. Skip getting offsets Exception from kafka broker:
2017-10-09 05:56:50,302 ERROR state.change.logger: Controller 236 epoch 267 initiated state change for partition [extenr-topic,15] from OfflinePartition to OnlinePartition failed kafka.common.NoReplicaOnlineException: No broker in ISR for partition [extenr-topic,15] is alive. Live brokers are: [Set(236, 237)], ISR brokers are: [235] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:66) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:342) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:203) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:118) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:115) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) There are also following errors in logs
2017-10-09 04:11:25,509 ERROR state.change.logger: Broker 235 received LeaderAndIsrRequest with correlation id 1 from controller 236 epoch 267 for partition [extenr-topic,36] but cannot become follower since the new leader -1 is unavailable.