How to autoscale kafka connect cluster based on load

srihari_kusumanchi · 3 September 2021 09:18

We have Kafka connect cluster running in Kubernetes, we have multiple connectors needed to be deployed, that includes snapshot data and stream data. we are using the Debezium CDC MYSQL connector as of now. Data size is gonna be very huge. currently, we are using Kafka Connect with 3 replicas. So what is the best way to autoscale or pipeline jobs, is it at Debezium connector level or Kafka connect cluster level .? Please suggest

mitchell-h · 3 September 2021 13:32

In a very generalized use case, CPU will be your limiting factor when using DBZ+connect. So you’ll scale when you’ve hit N% of CPU for a period of time, and add a node or two.

Here’s the fine print that’s more important than when too autoscale, adding a node does not cause the connect tasks to rebalance. You’ll need to manually call Connect REST Interface | Confluent Documentation in order for that tasks to rebalance and get the benefit of additional resources. This will cause the tasks to pause until the rebalance is complete, somewhere between 1second and 3 minutes, during this time no work will be done and you’ll fall further behind.

It’s highly suggested that you have a steady state connect cluster that can handle your peak load.

waqasdilawar · 14 September 2021 07:51

Can you please elaborate this a bit further?

mitchell-h · 14 September 2021 19:49

In order to scale a connect cluster has to pause, and rebalance. This pause will cause you to get further and further behind. Also you can easily get into a state where you’re “flapping”, spending all the time in a paused state while you constantly rebalance.

system · 14 October 2021 19:49

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kafka connect rebalance failed Kafka Connect	2	5549	16 September 2021
Rebalancing tasks when new kafka connect is started Kafka Connect	5	6292	23 February 2021
Kafka connect with k8s config Kafka Connect	11	3388	7 May 2022
How to scale using Kafka Connect Kafka Connect	6	3488	8 February 2021
Initial Load + Continuous Load with Debezium and S3 Sink Kafka Connect	2	4490	5 October 2021

How to autoscale kafka connect cluster based on load

Related topics