A common question in running Kafka Connect is how to architect the cluster, or clusters, of workers. Here’s a discussion from Confluent Community Slack on the subject - feel free to add your thoughts in the thread below
Hans Jacob Melby asked:
Are there any whitepapers on best practice when using kafka connect; ideally in relation to architecture and security.
For example, should each team/product in an organization have their own kafka connect or is it “normal” to have one kafka connect cluster shared ?
The client i work for belives the connect cluster might have too much access towards the clusters… This sounds strange to me, but I won’t argue too much without knowledge of how other people use and configure kafka connect…
I have skimmed through the documentation and can not find any good reason for their concern…
this was the subject of an interesting Ask the Experts that @
Gunnar Morling and I did at a recent Kafka Summit. Unfortunately it wasn’t recorded
tl;dr orgs take both approaches - both single large cluster, individual cluster per connector, as well as in-between (cluster per business unit or team)
Hans Jacob Melby
@rmoff Thanks for the info. Did the discussion also include security issues regarding access controll to topics etc or is this a “non topic” as in solved and not an issue. In short, are there any “arcitecuture pitfalls” (from a security perspective) when using connect?
are there any “arcitecuture pitfalls” (from a security perspective) when using connect?
from a security perspective, not really that I can think of. My understanding (although I’m not an engineer) is that connectors are independent
however from a broader perspective the reason you’d carve up one big cluster is that you can then enforce stricter isolation from a workload perspective. If you’ve got a “noisy” connector that routinely gobbles up resources it might impact another connector that perhaps has a sensitive SLO, so you’d want to ringfence it for that reason
I’d always recommend to go for a dedicated “cluster” per connector
could be a single node cluster, or a two node cluster e.g., if you want to have some means of failover in case a worker machine dies
the reasoning being that separate worker processes will give you better isolation between connectors
Hans Jacob Melby
Gunnar Morling Thanks for the clarification I’ll try this out and hopefully convince the client that kafka connect is not a bad ting…