Kafka — frequent rebalances on one site with same config as another site

Umut · 7 October 2025 05:42

Hi Kafka community,

I have a 3-node Kafka 3.8.1 cluster and Connect cluster deployed identically in two sites (Site A and Site B).
In Site A, everything works smoothly. In Site B, we observe frequent and long-lasting rebalances, especially under occasional disk latency spikes, even though NVMe disks are used there too.

What I did so far:

Collected rebalance logs (with “rebalance delay: 30000 ms” etc.)
Analyzed connector config (Humio HEC sink): small buffer, no timeout/backoff/threads config, errors.tolerance = none, etc.
Noted that in Site B storage occasionally has momentary I/O latency increases (though same hardware type).
Proposed patch with humio.hec.buffer_size = 1000, timeout.ms = 10000, threads = 3, backoff settings, and changing errors.tolerance etc.
Questions:

Given this context, would the Kafka community consider this a misconfiguration, bug, or expected behavior in edge scenarios?
Could there be bugs logic that exacerbate such site-specific latency spikes?
Are there known best practices or community-backed config suggestions for connector sinks in geographically distinct sites with intermittent latency?
Would enabling static membership or tweaking scheduled.rebalance.max.delay.ms help significantly in such cases?

Topic		Replies	Views
Kafka connect rebalance failed Kafka Connect	2	5664	16 September 2021
Rebalancing tasks when new kafka connect is started Kafka Connect	5	6556	23 February 2021
Kafka connect with k8s config Kafka Connect	11	3427	7 May 2022
Kafka streams rebalance storm Kafka Streams	7	6861	16 July 2021
Kafka Stream application is getting rebalance Kafka Streams	0	1793	11 August 2023

Kafka — frequent rebalances on one site with same config as another site

Related topics