Retaining consumer offset when the consumer takes an extended hiatus

This is originally from a Slack thread. Copied here to make it more available (to non-members of the community slack and permanently.)
[You can join the community Slack here]

Hans Jacob Melby
it has just come to my attention that the offset for a client have a retention period the same way all events have a retention period. Now, the default retention period for our system is 7 days which is ok, but recently we had to stop polling Kafka for more than 7 days. to my “surprise” we began consuming from the beginning due to the following config

ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to "earliest",

Question : How should we (consumers) handle the use case of stopping consuming for a longer period of time and not loose the offset? I am not able to modify the retention period. Today we use a boolean config flag to indicate if we should do a poll or not , but that clearly was a bad idea… can/should we change ConsumerConfig.MAX_POLL_RECORDS_CONFIG to 0? will that “solve” our issue?

Alexei Zenin
You could save the offset outside Kafka to then “restore” from

Hans Jacob Melby
so how do i restore from that ? is it with seek method in the code?

Alexei Zenin
Yeah you can seek to the offset

Hans Jacob Melby
So then i need to store the offset for each partition right? and then seek for each partition?

Alexei Zenin
Yeah, it gets trickier if you don’t get the same partition reassigned. So you will want a global store if possible

Hans Jacob Melby
:thumbsup:

Some resources on how other systems allow this:

1 Like

Another way to solve this issue is to do a normal “depuplication”. In our scenario we know that the key of the message is unique, so we solved this issue by storing the key in a database and then do a deduplication check before each message is processed. That way, when we stop for a longer period of time, we start consume from the beginning, bit most of the messages are then dropped due to dedup logic. Worked just fine, but the ramp up time after a long stop is longer. (we need to “drop” over 1.5 milion messages before we start consume new ones…That can often take a day or at least some hours…