Kafka Connect - Cluster Architectures

A common question in running Kafka Connect is how to architect the cluster, or clusters, of workers. Here’s a discussion from Confluent Community Slack on the subject - feel free to add your thoughts in the thread below


Hans Jacob Melby asked:

Are there any whitepapers on best practice when using kafka connect; ideally in relation to architecture and security.
For example, should each team/product in an organization have their own kafka connect or is it “normal” to have one kafka connect cluster shared ?
The client i work for belives the connect cluster might have too much access towards the clusters… This sounds strange to me, but I won’t argue too much without knowledge of how other people use and configure kafka connect…
I have skimmed through the documentation and can not find any good reason for their concern…

Robin Moffatt:
this was the subject of an interesting Ask the Experts that @Gunnar Morling and I did at a recent Kafka Summit. Unfortunately it wasn’t recorded

tl;dr orgs take both approaches - both single large cluster, individual cluster per connector, as well as in-between (cluster per business unit or team)

Hans Jacob Melby

@rmoff Thanks for the info. Did the discussion also include security issues regarding access controll to topics etc or is this a “non topic” as in solved and not an issue. In short, are there any “arcitecuture pitfalls” (from a security perspective) when using connect?

Robin Moffatt

are there any “arcitecuture pitfalls” (from a security perspective) when using connect?

from a security perspective, not really that I can think of. My understanding (although I’m not an engineer) is that connectors are independent

however from a broader perspective the reason you’d carve up one big cluster is that you can then enforce stricter isolation from a workload perspective. If you’ve got a “noisy” connector that routinely gobbles up resources it might impact another connector that perhaps has a sensitive SLO, so you’d want to ringfence it for that reason

Gunnar Morling
I’d always recommend to go for a dedicated “cluster” per connector

could be a single node cluster, or a two node cluster e.g., if you want to have some means of failover in case a worker machine dies

the reasoning being that separate worker processes will give you better isolation between connectors

Hans Jacob Melby
@Gunnar Morling Thanks for the clarification :slightly_smiling_face: I’ll try this out and hopefully convince the client that kafka connect is not a bad ting…

1 Like

to backup @Gunnar Morling a bit. If you’re using the S3 connector(or any connector that uses the AWS IAM lib for auth), you can only have one login per connect cluster. It’s a limitation of the library.

On the other hand, there are also pitfalls of using separate Connect cluster per connector:

  1. It is way harder to monitor, since you will have to scrape much more endpoints (dynamic number of them, in fact) with your Prometheus instance, you will also have to organize your metrics more carefully and so on…
  2. It is way harder to maintain, since you don’t have a single endpoint, or a reasonable number of endpoints, which you can configure for your CI/CD pipelines
  3. Each Connect node is to be licensed separately, which may become very costy, if you are on Confluent Enterprise :wink:

The “cluster per business unit/team” approach may partially negate these things, but I personally still prefer a single Connect cluster with a reasonable number of Workers.

It is way easier to “tweak” one cluster config, rather than of N such clusters. In my opinion, it’s simply not possible to have an optimal configuration for a number of workers, which you don’t know from the beginning, and when you don’t know which connectors are going to live on them…

This is the architecture ,my client has tried out, but the issue is access controll to topics. If I anderstand the architecture correctly, each connect cluster is authorizing to to the broker. Sharing the cluster wil then have some access issues. I recon it is a cost/benifit issue.
anyway thanks for sharing your thoughts!

Kafka Connect cluster authenticates its connection to Brokers, this is right. But it can be configured in a way that each connector within your Connect cluster uses a separate principal username and password to access Topics data, see more info here:

Then, you specify the principals for your connector by adding these two lines to your connector config (I don’t know why these parameters are not listed among connector configuration options, but they were mentioned here, and as it is said there, the such config is then converted to [producer|consumer].sasl.jaas.config depending on your connector type):

"principal.service.name": "<username>",
"principal.service.password": "<password>"

You can as well use Secret Registry for secrets referencing, or another security provider.

Alternatively, you can specify principals directly by overriding SASL JAAS config using [producer|consumer].override.sasl.jaas.config configuration properties, however this is rather a low-level approach and one should avoid it in Connect.

That said, you have pretty granular access control using different ACLs defined for different connectors principals. And I do not see any security hiccups here… Simply don’t use a single principals pair for all your connectors in the Cluster. Or am I still missing something?

I mean, having 100 Connect clusters only because you need granular access control to your Kafka topics doesn’t look right to me at all… It’s a tremendous infrastructure overhead…

This is great news. I am pritty new to kafka connect and I dont think my client knows about this, so I will definitly take this comment back to my customer and see if this might change their minds :smiley:
Thank you so much for your thoughts and comments!

Does this discussion take Kubernetes into consideration ? I though Stimzi or another operator can deploy and k8s can manage the cluster nodes.

Am I wrong ? In fact we want JDBC source connectors and Elastic Search Sink connectors to be managed by Kubernetes.

And this may be a new thread. The throughput of JDBC source connectors and Elastic Search Sink connectors are what I am after. Should I read about it somewhere ? We cannot use DB log tailing as we don’t have those tools. JDBC connectors are our option.

Thanks.

I`m not a senior regarding kafka and kafka connect but I think you can set up a connect cluster using kubernetes and Stimzi might be a perfect fit. The question still remains though, if one should have one connect cluster for each domain/team or one common cluster to share. My original thought was that they could/should not share due to securty issues (access to topics) but as @whatsupbros stated above this can be solved for each connector… That leaves it to a purly cost/benifit evaluation. (freely inerpeted by me :wink: )