Running Kafka Connect with multiple connectors

Hey all,

Hope this is the right place.
I’m building a new data pipeline that will move data from Postgres to S3 with Kafka connect, in the first stage I will have the Postgres source and a s3 sink connectors. In next iterations I want to add more connectors.
The entire system will be installed in my private AWS account. I thinking about running the Kafka connect process on ECS with Fargate and was wondering what is the best practice for managing the Kafka connect dockers? should I have 1 docker file / image with all the connectors I want, running as 1 big task in ECS? or should I have smaller dockers (one for each connector) and run each one as it’s own task?

Thanks,
Shlomi

Hi Shlomi,

Welcome to the community! Thanks for being part of it :slight_smile:

So, when we talk about container orchestration, Kafka Connect is always a balance between money, resiliency, and performance. Right off the bat, I will tell you that I’ve seen it both ways: individual pods per connector and giant pods for all connectors. Here are some of the questions to guide you in thinking what’s the best path for you:

  1. Can I afford the management of one connector per ECS task? Having to direct your connect POST to a specific endpoint depending on the connector may be a pain. I’ve seen this pattern work well when used with kubernetes and deploying with an automated pipeline, but I can see it being tougher with Fargate.

  2. Am I comfortable with the durability and processing guarantees? Clearly, doing on pod per connector would mean that to assure higher throughputs and higher availability, you’d need to scale that connector individually. This might be fine if there are no costs to be worried about, but it would be easier to just scale a single cluster with all connectors than to have multiple scaling strategies for multiple connectors that might cost more.

That’s not to say there aren’t benefits to the one pod per connector mantra. The isolation of connectors makes it easier to debug and easier to keep track of the resources needed for a specific connector. Individual scaling might be a blessing as well, since you don’t have to scale disproportionately to usage when trying to account for the extra connectors that might be housed in there.

I will say that unless there is a solid footing on what container orchestration entails, I commonly recommend going the multiple-connectors-per-pod route. It is easier to manage, understand, and scale. Once you have reached a certain level of usage and experience with the Connect Framework, I’d explore going the other route.

I hope this helped! And please let us know if you have any other questions!

4 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.