We have encountered an issue in our Apache Kafka setup where a consumer group ends up reprocessing the same offset after a rebalance, even though that offset was already processed, acknowledged, and committed.
Observed Scenario:
A consumer in the group successfully processed and committed offset 2054 for a partition <topic-partition>.
Right after this, the group went into a rebalance state with repeated logs indicating:
Group coordinator not available
Heartbeat failed
Group is rebalancing for around 4 minutes
After the rebalance completed and the consumer rejoined, we observed this log:[Consumer clientId=<consumer-id>, groupId=<group-id>] Setting offset for partition <topic-partition> to the committed offset FetchPosition{offset=2054, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[<broker-url>:9093 (id: 1 rack: null)], epoch=13}}
Then the same message with offset 2054 was consumed again, instead of the expected next offset 2055.
Current setup details:
Apache Kafka version: 3.8.1
Confluent Platform image version: 7.8.1-1-ubi8
Deployed using Helm charts on Kubernetes (on-prem)
Multi-member consumer group with 24 members consuming a topic with 6 partitions and 0 lag
Consumer Configuration:
enable.auto.commit = false
auto.offset.reset = earliest
heartbeat.interval.ms = 6000
session.timeout.ms = 30000
max.poll.interval.ms = 230000
max.poll.records = 15
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
max.partition.fetch.bytes = 1048576
Any insights or guidance on how to troubleshoot or prevent this behaviour would be greatly appreciated. Thanks in advance!
The offset that gets committed, must be the next offset to be processed, not the offset of the last message that was processed.
So if you processed message with offset 2054, you would commit offset 2055, not offset 2054. So it seems, what you observe is by design and it’s a bug in your commit logic?
Hi @mjsax
Thank you for your response and the helpful blog reference.
To clarify our observation with updated findings:
First Poll:
The consumer processed the message at offset 2054, but no commit was attempted before the rebalance began (verified in logs).
Rebalance triggers (Group coordinator not available, Heartbeat failed) appeared immediately after processing.
Rebalance:
The group remained in a rebalancing state for ~4 minutes.
Post-Rebalance:
Upon rejoining, the consumer reset to the last durable commit (offset 2054) and reprocessed the same message.
Only after this reprocessing was the offset committed successfully.
Since no commit was attempted for offset 2054 before the rebalance, the consumer correctly resumed from the last stored commit (per auto.offset.reset=earliest and Kafka’s protocol).
However, this led to unintended duplicate processing due to the lack of an intermediate commit.
Since no commit was attempted for offset 2054 before the rebalance, the consumer correctly resumed from the last stored commit (per auto.offset.reset=earliest and Kafka’s protocol).
This statement seems to be off? auto.offset.reset only applies if there is no committed offset. So if 2054 was the last committed offset, auto.offset.reset does not trigger, but 2054 is used. – Guess the end result is the same for you, but just wanted to clarify this point.
However, this led to unintended duplicate processing due to the lack of an intermediate commit.
Well, KafkaConsumer provides at-least-once semantics by default so this behavior is expected. You could also do manual offset commits, and commit offsets before a message was processed – this avoids duplicate processing, but it might also lead to data loss, if the offset commit is successful, but processing the message fails afterwards – this is called “at-most semantics”.