Advice/General guidelines associated with cluster & connector configurations in Kafka Connect

blackcow02 · 12 February 2021 17:24

I am looking for some general guidance/advice as to what the best practices might be as far as defining the different aspects of Kafka Connect (i.e. - # of connect clusters, cluster isolation characteristics, connector per cluster, etc…)

I have been exploring a simple scenario, in our use case, where we want to leverage a debezium source connector to monitor a few tables in a mysql source database then leverage a http sink connector to push that data out to some other external source systems. For this particular scenario, I am planning on defining two connect workers in a single connect cluster running a source connector and sink connector.

I have observed a few examples online where the configuration depicted a separate cluster for each connector type.

Are there any guidelines/best practices around configuration of the cluster and the connectors? When would it make sense to split out the different connectors into their own cluster?

bhaveshraheja · 22 February 2021 11:14

Overall, as the connector versions have improved, more connectors in one cluster is a great idea.

Earlier, some would use single-domain clusters, so for example, all debezium source connectors in one cluster, all S3 Connectors in another. That would help also based on # of connectors added vs updated periodically, given earlier all connectors would pause and rebalance each time a config was updated. With the awesomeness of Incremental Cooperative Rebalancing in Apache Kafka in place, this is no more the case.

Another reason to separate connect clusters per-use-case was worker level configurations. Even though you may have 10 source connectors, there is one overall producer to Kafka per worker, and therefore if you need custom producer properties, separate clusters would be the way to go. Note: Some of this is changing and allows overriding of certain worker level properties at the connector level

Are you planning to see similar workloads across tasks in a cluster, then definitely go for this. It helps with scaling up the worker-nodes and scaling down as the workload changes.

system · 8 March 2021 11:15

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kafka Connect - Cluster Architectures Self-Managed Connectors	7	4480	17 January 2022
How can I configure my connector to run in specific worker group in multicluster connect environment in distributed kafka connect? Kafka Connect	1	3097	13 September 2021
Using Multiple Connectors VS Multiple Topics on Single Connector Kafka Connect	1	1312	24 April 2024
Different kafka cluster and different connect cluster Self-Managed Connectors	12	4580	11 June 2021
Seperate worker config Kafka Connect	3	3371	25 February 2021

Advice/General guidelines associated with cluster & connector configurations in Kafka Connect

Related topics