How to scale using Kafka Connect

Anu · 5 February 2021 01:38

In my scenario a single table(SOURCE connect) exists. A deployed kafka connector will only deploy one consumer(worker) to capture data events from the source table(assume I am capturing events via cdc). How do I go about scaling my application and support kafka key message ordering?

rmoff · 5 February 2021 09:32

Hi @Anu - to clarify, are you meaning you want to increase the rate at which you ingest changes from the database into Kafka, and that a single Kafka Connect task isn’t able to keep up with the throughput?

Anu · 5 February 2021 16:04

Hello Robin.

Firstly, LOVE the forum idea. Perfect way to capture kafka topic discussions. Secondly, yes your statement regarding my scenario is correct. If there is only one consumer how do I consider scaling for a higher throughput?

rmoff · 5 February 2021 16:33

I’m glad you like it!

Each connector will have its own definition of how it can partition workload for scaling up to tasks.max. If you hit that limit and still find you’re bottlenecking you’ll need to analyse where that bottleneck is (for example, network interface) and vertically scale that on the worker node.

Anu · 5 February 2021 16:36

Robin,
THANK YOU always appreciate your mentorship. Will make note of your feedback in our kafka design strategy

shree · 8 February 2021 13:12

Maybe an addon to this question, how does the jdbc connector scale up on the DB side?
Will it create multiple DB sessions to ingest data faster into kafka topics?
Is this something once would need to specifically set via an configuration parameter? Or will just ensuring that tasks.max > 1 enables the concurrency?
Following is our setup for context:
tasks.max = 25, topic is 50-100 partitions and 10 tables on DB side.
We still see just on session being opened by jdbc connector.

rmoff · 8 February 2021 13:18

As I understand it, the JDBC source connector will treat tables as units of parallelism. That is, if you have ten tables it could run ten tasks, each polling the tables in parallel. If you’re not seeing this behaviour perhaps start another topic and we can walk through some debug steps to see what’s going on.

Topic		Replies	Views
More than 1 task on single table Kafka Connect	5	3748	16 November 2021
What is the actual maximum value for max.tasks property? Kafka Connect	5	2601	30 October 2023
Can I horizontally scale MQTT Kafka Source Connector? Kafka Connect	3	816	28 February 2024
Kafka connector, workers Kafka Connect	2	2100	11 June 2023
JDBC sink connector - multiple workers Kafka Connect	1	3208	13 November 2022

How to scale using Kafka Connect

Related topics