SQL Debezium source to topic to JDBC sink target

pebriggs · 10 February 2022 18:27

We are currently running a POC to prove we can replicate MS SQL Server tables from one server to another. We currently have CDC configured on the source table and have a Debezium connector configured. This data flows into a topic with no issues. Now using a JDBC sink connector, we want to read from the topic and replicate to the target. We currently have this working but we had to define EVERY COLUMN in the fields.whitelist property for this to work because it seemed to have other columns it was either looking for or trying to create. Is this normal and do we have to specify every column from the source that we want synched to target or is there a better way or a way to blacklist the fields?
Thanks for your assistance.

whatsupbros · 11 February 2022 11:04

If I remember it right, Debezium wrappes sourced payload with an “envelope”, so there is a number of internal auxilary fields.

You should be able to manipulate the messages using different SMTs before sinking them to the target. Debezium provides a number of transformations for this, and usually you use Unwrap to pair it with JDBC Sink.

It should also be possible to use standard Flatten SMT, but its usage should probably be combined with fields.whitelist or fields.blacklist to achieve what you want to achieve.

Hope this helps!

pebriggs · 11 February 2022 15:23

Ok so do I remove fields before I put the data in the topic or do I remove fields before I insert into target?

Also I don’t see a fields.blacklist in the documentation…I hope I am wrong because that is exactly what I need.

whatsupbros · 11 February 2022 15:40

Well, it is up to you when to apply the unwrapping, but enveloping of the data was implemented there for a reason actually, so I would recommend using it on the Sink connector side.

As of fields.blacklist - sorry for the confusion, what I meant was actually ReplaceField SMT, which allows you to whitelist or blacklist the fields…

pebriggs · 11 February 2022 22:27

That ReplaceField did the trick!
Also had issues with the DateTime and found a setting for that too.
Thanks a bunch!

whatsupbros · 14 February 2022 09:17

You’re welcome! You may want to mark the answer which helped you as the “Solution” so it’s also usable for others!

pebriggs · 14 February 2022 15:52

One last question, my preference would be to eliminate those fields BEFORE putting them into the topic (storage reasons)…will that cause any issues? For example, the __deleted column is one that I am removing in the sink connector…if I would remove this from the source connector, would that create issues?

whatsupbros · 14 February 2022 23:04

Technically, this is totally fine to do so - should not be a problem at all.

But from architecture perspective this may cause some issues in the future, please refer to the “Outbox pattern” to understand the potential hiccups.

pebriggs · 14 February 2022 23:21

I actually tried to do this and it didn’t work. The __deleted column must be something added after the fact or internal to Debezium. My ReplaceField function did not work and I still needed it for the target. This is fine though…all good. Thanks.

whatsupbros · 15 February 2022 08:42

I do not know what exactly, but something is wrong.

SMTs should work fine on both Source and Sink connector sides. It is up to you where to use them. The field “being internal” is not relevant here, because after the field is added by the connector to a connector record, this field is always accessible for SMTs before being forwarded to the converter:

You can refer to this course, i.e. to understand Kafka Connect internals better: How Kafka Connect Works: Integrating Data Between Systems

A potential reason why your ReplaceField SMT doesn’t work could be that your record structure is different to what you expect it to be at that time. I.e. you may apply some other SMT before, which already changes record structure, because SMTs are applied in a “chain” manner:

See more info here: Single Message Transforms in Kafka Connect

Anyway, if you speak only about __deleted field, which you’re not interested in, why don’t you try using Unwrap SMT for this, which is intended for this and has delete.handling.mode configuration property, responsible exactly for this?

system · 22 February 2022 08:43

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Creating new column in sink database using primary_key from source database (debezium - jdbc) connectors Kafka Connect	1	886	26 April 2024
SMT debezium to replace a string with another in topic name Self-Managed Connectors	2	2913	17 January 2023
Cannot get JDBC sink to work from MS SQL source Kafka Streams	1	3162	19 November 2021
Inserting data in multiple tables from single topic using JDBC Sink Connector Kafka Connect	4	2114	28 February 2024
How Sink Connector reflect DDL Kafka Connect	5	3732	1 August 2022

SQL Debezium source to topic to JDBC sink target

Related topics