Hi all, This is a beginner question.
I have tried to load 100 millions of record from MSSQL server to Kafka topic through Kafka JdbcSourceConnector.
I tried Incremental query modes is “bulk”
But, when I running the connector in my local environment, it only gets 100 data records.
My guess is it’s a JVM memory issue. (now local KAFKA_HEAP_OPTS : -Xms8G -Xmx8G)
I plan to test it by modifying the KAFKA_HEAP_OPTS option on the development server.
So, I have a question regarding this.
In “bulk” mode, Is there a way to split and import 100 million data?
I was wondering if there is a way to get data by polling in “bulk” mode.
If I understand it correctly, the option says how many rows you’ll pull from the database each batch, until all rows are fetched. So if you have 1000 rows and set batch.max.rows to 100 it’s going to take 10 iterations of the batch fetch to complete. If there are then 2000 rows, the connector will still work, it’ll just take 20 iterations.
The point is that instead of trying to eat a whole elephant at once (and blowing the JVM memory from trying to hold all the DB records in one go) you eat it one bite (batch of records) at a time (and thus fewer in JVM memory at once).
That is my understanding of it anyway - I have not looked at the code to verify it.
As you mentioned, if the mode is bulk, can it be iterations?
So in bulk mode, we have a table querier thread that will continuously read the next set of records from the table. When this fills a batch - we will commit them to kafka. So we do not wait till the entire result set is loaded to memory before we commit to kafka. We will buffer (at the application level) at most the batch size.
add 100 messages
after poll.interval.ms
add 100 messages
It will continue to 8000.
This understanding is incorrect. The poll interval ms impacts when we run the subsequent query. I.e if we’re still working on some result set, we don’t really care about the poll.interval.ms . After we have finished some query, we will wait at most poll.interval.ms before we running the next query