Number of Kafka Clusters

Hi,

we currently have two systems in our architecture. A new one and a legacy one. Both have their own Kafka cluster. As we merge these systems step by step, they become increasingly more similar in both size and load. Both clusters have 5 Brokers with ~2.500 Topics. Some of our services even use both.

My question now would be, if this is the recommended, correct approach? Is there any upside to having two clusters? Or would it be strictly better to run only one Kafka cluster for both load and administrative overhead?

Thanks in advance!

There’s not really a universal statement to make about whether 1 or 2 is better in this scenario.

Some upsides of remaining on two clusters:

  1. different administrative teams
  2. different cluster configuration (e.g., security settings)
  3. dev teams desire cluster isolation for housekeeping or risk reduction reasons
  4. mitigate concern about noisy neighbors (service A overloading cluster 1 doesn’t impact service B using cluster 2)
  5. maybe load – it depends how bursty and correlated the separate cluster loads are currently. e.g., at the unlikely extreme, the clusters’ loads don’t overlap and you can get away with one five-broker surviving cluster
  6. avoid migration cost / risk. Consolidating will require a fresh performance testing look at the overall system. Will you need to migrate data from the legacy cluster? Are there any topic name collisions? What about offsets – will clients need to pick up on the new cluster exactly where they left off on the legacy cluster? What about client migration – will you be able to have downtime or is stopping the world to move clients over not acceptable, in which case this gets tricky?

I’m not sure how many of these apply to your scenario since one of these is called “legacy” and you are merging systems. Do any of 1-5 apply? Leaving things alone comes at an administrative overhead cost that might just need to be weighed against #6.

I think point 2 might be more of a downside for us, as we have some services, which use both clusters. So we have to connect to both of them, which is unnecessarily complicated, if they are set up differently.

Point 3 and 4 are valid concerns. Keeping them separated in case of heavy load or unexpected downtime, would result in only some services having problems.

Point 6 is something I would have to talk about with the affected teams. As you said, merging two clusters probably isn’t very straightforward and might require some downtime.

All in all, thank you very much for the help and fast reply! We will take these points into consideration.