How to scale using Kafka Connect

In my scenario a single table(SOURCE connect) exists. A deployed kafka connector will only deploy one consumer(worker) to capture data events from the source table(assume I am capturing events via cdc). How do I go about scaling my application and support kafka key message ordering?

Hi @Anu - to clarify, are you meaning you want to increase the rate at which you ingest changes from the database into Kafka, and that a single Kafka Connect task isn’t able to keep up with the throughput?

Hello Robin.

Firstly, LOVE the forum idea. Perfect way to capture kafka topic discussions. Secondly, yes your statement regarding my scenario is correct. If there is only one consumer how do I consider scaling for a higher throughput?

:tada: I’m glad you like it! :slight_smile:

Each connector will have its own definition of how it can partition workload for scaling up to tasks.max. If you hit that limit and still find you’re bottlenecking you’ll need to analyse where that bottleneck is (for example, network interface) and vertically scale that on the worker node.

1 Like

THANK YOU always appreciate your mentorship. Will make note of your feedback in our kafka design strategy :pray:

1 Like

Maybe an addon to this question, how does the jdbc connector scale up on the DB side?
Will it create multiple DB sessions to ingest data faster into kafka topics?
Is this something once would need to specifically set via an configuration parameter? Or will just ensuring that tasks.max > 1 enables the concurrency?
Following is our setup for context:
tasks.max = 25, topic is 50-100 partitions and 10 tables on DB side.
We still see just on session being opened by jdbc connector.

1 Like

As I understand it, the JDBC source connector will treat tables as units of parallelism. That is, if you have ten tables it could run ten tasks, each polling the tables in parallel. If you’re not seeing this behaviour perhaps start another topic and we can walk through some debug steps to see what’s going on.