I have a question about JdbcSourceConnector bulk mode

thewayhj · 10 March 2022 05:20

Hi all, This is a beginner question.
I have tried to load 100 millions of record from MSSQL server to Kafka topic through Kafka JdbcSourceConnector.

I tried Incremental query modes is “bulk”
But, when I running the connector in my local environment, it only gets 100 data records.

My guess is it’s a JVM memory issue. (now local KAFKA_HEAP_OPTS : -Xms8G -Xmx8G)
I plan to test it by modifying the KAFKA_HEAP_OPTS option on the development server.

So, I have a question regarding this.
In “bulk” mode, Is there a way to split and import 100 million data?
I was wondering if there is a way to get data by polling in “bulk” mode.

here’s my JdbcSourceConnector config setting

"tasks.max": "1",
            "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
            "connection.url": "jdbc:sqlserver:// ${datasource}",
            "connection.user": "${user}",
            "connection.password": "${password}",
            "topic.prefix": "mssql-source-",
            "mode": "bulk",
            "table.whitelist": "${table name}",
            "poll.interval.ms": "86400000"

rmoff · 10 March 2022 09:42

Is there a way to split and import 100 million data?

I think batch.max.rows is the configuration option you’re looking for here.

thewayhj · 11 March 2022 05:52

@rmoff Thanks for your reply.

We are also considering the options you answered.
However, the amount of data could increase by more than 100 million.

If there is no other way, We will proceed with the option below.

batch.max.rows: 1000000000

I don’t know, is it the right way?

rmoff · 11 March 2022 09:34

If I understand it correctly, the option says how many rows you’ll pull from the database each batch, until all rows are fetched. So if you have 1000 rows and set batch.max.rows to 100 it’s going to take 10 iterations of the batch fetch to complete. If there are then 2000 rows, the connector will still work, it’ll just take 20 iterations.
The point is that instead of trying to eat a whole elephant at once (and blowing the JVM memory from trying to hold all the DB records in one go) you eat it one bite (batch of records) at a time (and thus fewer in JVM memory at once).

That is my understanding of it anyway - I have not looked at the code to verify it.

thewayhj · 15 March 2022 13:38

@rmoff
Yes. That’s what I thought at first.

As you mentioned, if the mode is bulk, can it be iterations?

I tested it while revising the options below, but I couldn’t get the result I wanted.

Am I mistaken??

"mode": "bulk",
"poll.interval.ms": "10000",
"batch.max.rows" : "100",
"table.poll.interval.ms": "5000"

It’s the result I want. (ex. total topic count : 8000)

add 100 messages
after poll.interval.ms
add 100 messages
It will continue to 8000.

result

add 8000 messages
after poll.interval.ms
add 8000 messages
continue add messages

SajanaW · 16 March 2022 08:58

As you mentioned, if the mode is bulk, can it be iterations?

So in bulk mode, we have a table querier thread that will continuously read the next set of records from the table. When this fills a batch - we will commit them to kafka. So we do not wait till the entire result set is loaded to memory before we commit to kafka. We will buffer (at the application level) at most the batch size.

add 100 messages

after poll.interval.ms

add 100 messages

It will continue to 8000.

This understanding is incorrect. The poll interval ms impacts when we run the subsequent query. I.e if we’re still working on some result set, we don’t really care about the poll.interval.ms . After we have finished some query, we will wait at most poll.interval.ms before we running the next query

thewayhj · 29 March 2022 08:32

@SajanaW Thank you. I’ll study kafka more.

system · 28 April 2022 08:32

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Only one batch query executed Kafka Connect	1	2068	5 May 2023
How to use the useBulkCopyForBatchInsert on the JdbcSinkConnector? Kafka Connect	1	3254	19 August 2022
How to increase performanve of JDBC sink connector to sql server in upsert mode Self-Managed Connectors	0	2888	19 December 2022
Kafka JDBC Sink connector(oracle) slowness after certain load Kafka Connect	2	3156	19 March 2022
Kafka JDBC Source Connector- UPDATE Kafka Connect	8	439	5 August 2024

I have a question about JdbcSourceConnector bulk mode

Related topics