Hi all,
I am experiencing ConsumerGroups are disappearing when I restart a kafka broker.
Partitions are often offline/unavailable (lost) and the consumers recreate the ConsumerGroups (which results in huge lags, since it starts from the beginning again).
We have had this for multiple ConsumerGroups, and we have a hard time pinpointing where the problem is.
Do you have any idea or suggestions of what we can try or do to resolve this issue?
Our kafka cluster consists of 5 kafka brokers that each reside on different nodes.
All kafka brokers are configured with the following (relevant, I hope) configuration for HA:
default.replication.factor: 3
group.initial.rebalance.delay.ms: 0
log.flush.interval.messages: 10000
log.flush.interval.ms: 1 second
min.insync.replicas: 2
offsets.retention.minutes: 1 year and 7 days
offsets.topic.replication.factor: 3
transaction.state.log.min.isr: 2
transaction.state.log.replication.factor: 3
topic: __consumer_offsets
unclean.leader.election.enable: false
Any helps or hints would be greatly appreciated