We had high load kafka cluster running, after some maintenance requiring rollout restart we are seeing a high imbalance of requesthandlerpool for our kafka where our first restarted broker has 80% requesthandlerpool and the latest restarted broker have 15% requesthandlerpool (calculated from kafka_server_kafkarequesthandlerpool_requesthandleravgidlepercent_count
metric)
we suspected this is due to consumer not moving the connection out from previous replica. for example: for broker A, B , C, D ,E, F - being rollout restart on that order
all partition that have broker A as their replica, consumer on B, C, D, E, F will move to broker A and not move back to the previous replica after the other broker has went back up.
The simple solution for this is to rollout restart all our consumers, but that is not ideal and hard to pull off in some cases. is there another way to allow our consumers to rebalance after all brokers are up?
thankyou