Kafka Stream co-operative rebalance storm and Processing Blocked

kbhattad · 13 July 2021 22:33

We are using kafka streams in kubernetes deployment and running 10 instance of our pod. All the pod has static consumer group ID configured. Each pod has 4 streaming thread. The topic has 60 partition.

We are seeing rebalance storm and message not getting processed / lag being built. Again this problem is very much reproducible.

From kafka properties point of view
max.poll.inteval.ms is set to Int.MaxValue
session.timeout.ms is set to 60 sec
heartbeat.interval.ms is 20 sec.

Rebalance itself may be fine but we are seeing messages are not processed for long time and huge lag getting built. Please help how to overcome this issue.

mjsax · 14 July 2021 04:43

For static group membership, it’s usually recommended to increase the session timeout to a very large value, like 5 minutes (or even larger) to ensure that no rebalance is triggered if a POD is moved within the Kubernetes cluster. The session timeout should be larger than the maximum expected downtime of a POD.

Also, if you have a stateful application, you want to use “stateful set” to make sure that persistent volumes are re-attached to the same PODs to avoid expensive state store recovery.

Cf. Deploying Kafka Streams Applications with Docker and Kubernetes - Confluent

kbhattad · 14 July 2021 10:10

Thanks. I tried with session.timeout.ms with 5 minutes. Still seeing same issue.

mjsax · 14 July 2021 18:52

I guess you could inspect the log (client and broker side) to investigate why a rebalance is triggered… If a static member re-joins the group, no rebalance should be trigger (ie, if the static member was not removed from the group previously). – Could also be a config issue (ie, the static member ID is not configures correctly)?

kbhattad · 16 July 2021 14:58

Thanks for response. Find below the properties configured for our application for stream

serviceName = k8s appName (Fixed string)

application.id = “serviceName_Kafka-Stream-Application”
client.id = “serviceName_Kafka-Stream-Application”
group.id = “serviceName_Kafka-Stream-Application”
max.poll.interval.ms = Integer.MaxValue
session.timeout.ms = 60 second
heartbeat.interval.ms = 20 second
replication.factor = 3
num.stream.threads = 4
processing.guarantee = at_least_once

Do let me know if any of them need to be tuned or need to be different. The issue is always seen when we do rolling restart of our k8s application.

You mentioned could be static member ID not configured correctly? can you elaborate more.

mjsax · 16 July 2021 17:00

Your config does not contain a static group ID (ie, group.instance.id) that must be unique within the consumer group, ie, different for each POD. Check out the talk I linked above for more details.

Topic		Replies	Views
Kafka streams rebalance storm Kafka Streams	7	6618	16 July 2021
Rebalancing Loop when updating Kafka streams lib Kafka Streams	6	5123	18 May 2021
Kafka consumer group rebalancing frequently Clients	3	126	22 January 2025
Kafka connect with k8s config Kafka Connect	11	3390	7 May 2022
Syncgroup keeps on failing with message "The group began another rebalance" and it never ends Kafka Streams	0	4911	27 May 2022

Kafka Stream co-operative rebalance storm and Processing Blocked

Related topics