Problem with data received from CDC containing large amount of data but Kafka has problem to capture them all

msbeigiai · 27 June 2022 12:16

Hi all,
I have connected Debezium to our SQL server. Our table has more than 51M records.
I have encountered several questions and problems:
1- Will Debezium send all records from the beginning of table creation in the SQL server to Kafka?
2 - I receive data completely defer from what is in our table. I can’t even count how many records were received
to Kafka broker but I’m sure that amount of data that is received is not what we have in the table. Obviously, they
are different and I’m monitoring that new CDC data is not received by Kafka consumers.
3- What Kafka config do you think is needed to manipulate the mentioned volume of data?
4- Do you think, should I use more resources or change the config to replication and partition?
According to this question, is there any way not to capture and omit those 51M records and just capture those for example received today?
Any help?
– Server resource: 8 Core CPU, 16GB RAM, 60GB HDD
Thank you

OneCricketeer · 28 June 2022 13:02

It would be useful if you could show your config, then we can tell you which settings you might want to change/add.

But if you don’t want existing data, then disable it from snapshotting

msbeigiai · 29 June 2022 04:59

Thanks @OneCricketeer ,
I set snapshot.mode to schema_only, and this is the exact solution that I wanted.
But now I have a problem when new records are stored in the database and nothing come to Kafka. I encounter this error in the connector log:
ERROR [debezium-task-0] Skipping change ChangeTableResultSet{changeTable=Capture instance "dbo_***" [sourceTableId=***.dbo.***, changeTableId=***.cdc.dbo_***_CT, startLsn=000153af:00011f37:014a, changeTableObjectId=1425635414, stopLsn=NULL], resultSet=SQLServerResultSet:4424, completed=false, currentChangePosition=NULL(NULL)} as its LSN is NULL which is not expected (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource:219)

and the connector configuration is:

{
    "name": "debezium",
    "config": {
        "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
        "database.hostname": "***.**.**.**",
        "database.port": "****",
        "database.user": "**********",
        "database.password": "*********",
        "database.dbname": "*******",
        "database.server.name": "****_******",
        "table.include.list": "dbo.***************",
        "decimal.handling.mode": "double",
        "time.precision.mode": "adaptive",
        "database.history.kafka.bootstrap.servers": "localhost:9092",
        "key.converter": "org.apache.kafka.connect.json.JsonConverter",
        "value.converter": "org.apache.kafka.connect.json.JsonConverter",
        "key.converter.schemas.enable": false,
        "value.converter.schemas.enable": false,
        "database.history.kafka.topic": "test5_dbhistory",
        "snapshot.mode": "schema_only"
    }
}

msbeigiai · 30 June 2022 07:18

Problem solved.
The database administrator should have accessed me CDCRole.
Now, all records capturing.
Thanks again.

system · 7 July 2022 07:19

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CDC delete row issue ksqlDB	2	4447	31 July 2022
Kafka connect configured properly but any records captured from MySQL table Kafka Connect	2	2862	5 July 2022
MSSQL CDC connector in Confluent Cloud Confluent Cloud	1	808	27 March 2024
CDC debizium sql server is inserting exact duplicate Confluent Cloud	0	59	25 September 2024
Kafka Connector Debezium for SqlServer Kafka Connect	2	5340	4 July 2021

Problem with data received from CDC containing large amount of data but Kafka has problem to capture them all

Related topics