Problem with data received from CDC containing large amount of data but Kafka has problem to capture them all

Hi all,
I have connected Debezium to our SQL server. Our table has more than 51M records.
I have encountered several questions and problems:
1- Will Debezium send all records from the beginning of table creation in the SQL server to Kafka?
2 - I receive data completely defer from what is in our table. I can’t even count how many records were received
to Kafka broker but I’m sure that amount of data that is received is not what we have in the table. Obviously, they
are different and I’m monitoring that new CDC data is not received by Kafka consumers.
3- What Kafka config do you think is needed to manipulate the mentioned volume of data?
4- Do you think, should I use more resources or change the config to replication and partition?
According to this question, is there any way not to capture and omit those 51M records and just capture those for example received today?
Any help?
– Server resource: 8 Core CPU, 16GB RAM, 60GB HDD
Thank you

It would be useful if you could show your config, then we can tell you which settings you might want to change/add.

But if you don’t want existing data, then disable it from snapshotting

Thanks @OneCricketeer ,
I set snapshot.mode to schema_only, and this is the exact solution that I wanted.
But now I have a problem when new records are stored in the database and nothing come to Kafka. I encounter this error in the connector log:
ERROR [debezium-task-0] Skipping change ChangeTableResultSet{changeTable=Capture instance "dbo_***" [sourceTableId=***.dbo.***, changeTableId=***.cdc.dbo_***_CT, startLsn=000153af:00011f37:014a, changeTableObjectId=1425635414, stopLsn=NULL], resultSet=SQLServerResultSet:4424, completed=false, currentChangePosition=NULL(NULL)} as its LSN is NULL which is not expected (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource:219)

and the connector configuration is:

{
    "name": "debezium",
    "config": {
        "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
        "database.hostname": "***.**.**.**",
        "database.port": "****",
        "database.user": "**********",
        "database.password": "*********",
        "database.dbname": "*******",
        "database.server.name": "****_******",
        "table.include.list": "dbo.***************",
        "decimal.handling.mode": "double",
        "time.precision.mode": "adaptive",
        "database.history.kafka.bootstrap.servers": "localhost:9092",
        "key.converter": "org.apache.kafka.connect.json.JsonConverter",
        "value.converter": "org.apache.kafka.connect.json.JsonConverter",
        "key.converter.schemas.enable": false,
        "value.converter.schemas.enable": false,
        "database.history.kafka.topic": "test5_dbhistory",
        "snapshot.mode": "schema_only"
    }
}

Problem solved.
The database administrator should have accessed me CDCRole.
Now, all records capturing.
Thanks again.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.