A common question in running Kafka Connect is how to architect the cluster, or clusters, of workers. Here’s a discussion from Confluent Community Slack on the subject - feel free to add your thoughts in the thread below
Hans Jacob Melby asked:
Are there any whitepapers on best practice when using kafka connect; ideally in relation to architecture and security.
For example, should each team/product in an organization have their own kafka connect or is it “normal” to have one kafka connect cluster shared ?
The client i work for belives the connect cluster might have too much access towards the clusters… This sounds strange to me, but I won’t argue too much without knowledge of how other people use and configure kafka connect…
I have skimmed through the documentation and can not find any good reason for their concern…
Robin Moffatt:
this was the subject of an interesting Ask the Experts that @Gunnar Morling
and I did at a recent Kafka Summit. Unfortunately it wasn’t recorded
tl;dr orgs take both approaches - both single large cluster, individual cluster per connector, as well as in-between (cluster per business unit or team)
Hans Jacob Melby
@rmoff Thanks for the info. Did the discussion also include security issues regarding access controll to topics etc or is this a “non topic” as in solved and not an issue. In short, are there any “arcitecuture pitfalls” (from a security perspective) when using connect?
Robin Moffatt
are there any “arcitecuture pitfalls” (from a security perspective) when using connect?
from a security perspective, not really that I can think of. My understanding (although I’m not an engineer) is that connectors are independent
however from a broader perspective the reason you’d carve up one big cluster is that you can then enforce stricter isolation from a workload perspective. If you’ve got a “noisy” connector that routinely gobbles up resources it might impact another connector that perhaps has a sensitive SLO, so you’d want to ringfence it for that reason
Gunnar Morling
I’d always recommend to go for a dedicated “cluster” per connector
could be a single node cluster, or a two node cluster e.g., if you want to have some means of failover in case a worker machine dies
the reasoning being that separate worker processes will give you better isolation between connectors
Hans Jacob Melby
@Gunnar Morling
Thanks for the clarification I’ll try this out and hopefully convince the client that kafka connect is not a bad ting…