How to balance partition/disc usage?

roadSurfer · 4 February 2021 09:18

I’m playing around with a simple 3-node cluster. I only have 7 topics, most with 10 partitions (one with 1, and another 25) and all replication factor of 2. I have ensured these partitions are spread amongst the Brokers. Most topics are set to “delete” and one to “compact”. Self-balancing is on and set to improve balance “anytime”. Like I say, simple.

Over time I can see in Control Center that the disc usage goes out of balance. Brokers 1 and 2 are at 4.8GB and 4.4GB respectively, but Broker 3 sits at 2.2GB. So something is up dspite the configuration being identical.

I have tried searching for information on how trace the problem, but it’s like looking for a needle in a haystack. I suspect it’s a problem with the Partition balance (Broker 3 seems to have about a third the partitions of the others in QA), but is this not what self-balancing is supposed to address?

rmoff · 4 February 2021 14:54

Do all the partitions have similar volumes of data going into them?

roadSurfer · 4 February 2021 14:56

In theory they should, but that is a very basic thing I should certainly go and check.
I guess if one or two are getting battered, there’s not a lot that balancing can do about it!

Edit: A quick mark one eyeball doesn’t show anyting amiss; I shall do some proper digging in a bit. Cheers for the tip. (And yes, I’ll come back and mark this as “Solved” if you’re on the money. )

roadSurfer · 4 February 2021 17:20

Thanks for the tip, @rmoff. It doesn’t appear to be data volume in my topics but while checking that out I spotted that a lot of the topics for Control Center (e.g. _confluent-controlcenter-6-0-1-1-cluster-rekey) are only replicated on the first two nodes, not evenly across the all brokers. Maybe I goosed some config somewhere.
That may not be the actual problem, but it’s the only thing that is popping out at me.

roadSurfer · 5 February 2021 13:21

Yes, it was the Control Center topics causing the imbalance. I am not quite sure why the self-balancing isn’t kicking into to move the topics/partitions around to restore balance, but at least I know the root cause now and can see that our own topics are not the source of the problem.

Thanks!

Topic		Replies	Views
Broker resync some partitions catch up very slowly Ops	1	987	23 February 2024
Topics disk space Increased after confluent upgrade Ops	1	3021	13 September 2022
Impact of replica not in clusterMetadata.brokers? Cluster Replication	0	3456	4 October 2021
Self Balancing Cluster on Kubernetes Containers Containers	2	3824	10 September 2021
✍️ Auto-Balance and Optimize Kafka Clusters with Improved Observability and Elasticity in Confluent Platform 7.0 News and Blogs	0	2918	6 January 2022

How to balance partition/disc usage?

Related topics