Best practices / requirements for kafka connect

Hi, I would like to know what are sizing requirements when you set up a Kafka Connect Cluster (standalone or distributed) with connectors/tasks.

Confluent systems requirements only show few informations in Confluent System Requirements | Confluent Documentation

How many connectors could be run on the same workers ?
Do you have these kinds of informations to be sure to size our cluster ?
Thanks in advance.

First up Welcome to the forum!

Sizing kafka connect is tricky mainly because of the variety of connectors that you can run on a connect cluster. Some connectors like the S3 connector can be rather lite weight and take up very little resources. Others, like a JDBC connector can be rather heavy.

The common way to size a connect cluster is to figure 1-5mb/s per connect instance with 4-8 cores and 32Gb/ram. This is very thumb in the wind. Deploy your connectors and scale up the number of tasks until you’ve reached your desired throughput or you’ve reached a bottleneck. Normally the bottle neck, in my experience, is CPU. This is mainly due to the (de)serialization and message format conversion that folks do.

Don’t forget to factor in failover. Most commonly it’s N+1 for failover and HA.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.