2

I use Kafka Streams for data processing.

Kafka 1.1

I am generating the data rate 35k rps in kafka and the streams are also consuming at the same rate as seen through the kafka consumer metrics. But very often I see the reset partition errors which says that the fetch offset is out of range. This basically means that my consumer is slower than the log deletion which is happening very aggressively.

My log retention hours is 168 and log retention bytes is 10G. Here are the logs which I get in my consumer very often.

I tried to search for this related problem online, but I could not find any.

[sample-app-deploy-8c4fd5697-4xxbk sample-app] 09:28:24.291 [sample-app-0cf78aad-5faa-4197-853b-bfc08bb38f66-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-0cf78aad-5faa-4197-853b-bfc08bb38f66-StreamThread-1-consumer, groupId=sample-app] Fetch offset 116411050 is out of range for partition sample-topic-4, resetting offset [sample-app-deploy-8c4fd5697-4xxbk sample-app] 09:28:24.292 [sample-app-0cf78aad-5faa-4197-853b-bfc08bb38f66-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-0cf78aad-5faa-4197-853b-bfc08bb38f66-StreamThread-1-consumer, groupId=sample-app] Resetting offset for partition sample-topic-4 to offset 116411058. [sample-app-deploy-8c4fd5697-qmjnd sample-app] 09:28:24.306 [sample-app-e10caa03-b881-47f2-b1ce-e9513c12a98c-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-e10caa03-b881-47f2-b1ce-e9513c12a98c-StreamThread-1-consumer, groupId=sample-app] Fetch offset 237000869 is out of range for partition sample-topic-7, resetting offset [sample-app-deploy-8c4fd5697-qmjnd sample-app] 09:28:24.307 [sample-app-e10caa03-b881-47f2-b1ce-e9513c12a98c-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-e10caa03-b881-47f2-b1ce-e9513c12a98c-StreamThread-1-consumer, groupId=sample-app] Resetting offset for partition sample-topic-7 to offset 237000871. [sample-app-deploy-8c4fd5697-n5pw8 sample-app] 09:29:56.808 [sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1-consumer, groupId=sample-app] Fetch offset 471945398 is out of range for partition sample-topic-0, resetting offset [sample-app-deploy-8c4fd5697-n5pw8 sample-app] 09:29:56.810 [sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1-consumer, groupId=sample-app] Resetting offset for partition sample-topic-0 to offset 471945403. [sample-app-deploy-8c4fd5697-n5pw8 sample-app] 09:34:56.804 [sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1-consumer, groupId=sample-app] Fetch offset 474036996 is out of range for partition sample-topic-0, resetting offset [sample-app-deploy-8c4fd5697-n5pw8 sample-app] 09:34:56.805 [sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-1db56df6-1dab-40d2-94c2-e412eff0ee09-StreamThread-1-consumer, groupId=sample-app] Resetting offset for partition sample-topic-0 to offset 474036997. [sample-app-deploy-8c4fd5697-cjccm sample-app] 09:39:10.659 [sample-app-a7e7c388-0dd4-45e8-8d5a-3a84effb7dfd-StreamThread-1] INFO org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=sample-app-a7e7c388-0dd4-45e8-8d5a-3a84effb7dfd-StreamThread-1-consumer, groupId=sample-app] Fetch offset 236702790 is out of range for partition sample-topic-5, resetting offset 

Can anyone help in pointing out the area where the problem is happening?

6
  • What is your clean up policy? Commented Sep 12, 2019 at 14:22
  • @SantoshTulasiram looks like there is a similar issue issues.apache.org/jira/browse/KAFKA-6189 could you check if its related? Commented Sep 12, 2019 at 14:42
  • The cleanup policy is delete. I found that the cleanup was happening aggressively due to the retention time and not retention size as I expected. Seems like issue with the event times that we are pushing. I will update here once the problem gets fixed. Commented Sep 13, 2019 at 9:20
  • 2
    The problem was due to the timestamp. We were sending the event times in seconds instead of millis which caused the time retention to be aggressive. Commented Sep 13, 2019 at 10:35
  • 1
    @SantoshTulasiram Can you answer your question and accept then? Thanks. Commented Sep 13, 2019 at 18:11

1 Answer 1

1

The problem was due to the timestamp. We were sending the event times in seconds instead of millis to the partition which caused the time retention to be aggressive.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.