Kafka — frequent rebalances on one site with same config as another site

Hi Kafka community,

I have a 3-node Kafka 3.8.1 cluster and Connect cluster deployed identically in two sites (Site A and Site B).
In Site A, everything works smoothly. In Site B, we observe frequent and long-lasting rebalances, especially under occasional disk latency spikes, even though NVMe disks are used there too.

What I did so far:

  • Collected rebalance logs (with “rebalance delay: 30000 ms” etc.)
  • Analyzed connector config (Humio HEC sink): small buffer, no timeout/backoff/threads config, errors.tolerance = none, etc.
  • Noted that in Site B storage occasionally has momentary I/O latency increases (though same hardware type).
  • Proposed patch with humio.hec.buffer_size = 1000, timeout.ms = 10000, threads = 3, backoff settings, and changing errors.tolerance etc.
    Questions:
  1. Given this context, would the Kafka community consider this a misconfiguration, bug, or expected behavior in edge scenarios?
  2. Could there be bugs logic that exacerbate such site-specific latency spikes?
  3. Are there known best practices or community-backed config suggestions for connector sinks in geographically distinct sites with intermittent latency?
  4. Would enabling static membership or tweaking scheduled.rebalance.max.delay.ms help significantly in such cases?