Hi, I wonder if there is a recommendations on the max number of workers we can have for a connect cluster?
My understanding is every connect cluster forms a consumer group, but workers don’t strictly follow a consumer per partition. However, in distributed mode, we do rely on the rebalance protocol from the consumer group to do the work rebalance. I wonder if there is a limitation on how many number of workers a connect cluster can have?
We are thinking about O(1000) tasks, and would like to understand how many tasks a worker need to take in the setup.
tasks.max dictates the number of tasks the connector will instantiate when it is created. Less tasks may be instantiated than tasks.max depending on the level of parallelism that can be achieved - for example where there are less partitions than tasks in the case of sink connectors. Task scaling is really subjective to type of connector being used. The largest of the workloads that i have seen only needed 3 Connect worker nodes. As typically connect is not CPU- bound. 0.5 - 4 GB heap size depending on connectors.
Number of connect workers normally depends on fast you want the data to be transferred.