Should I use a single sink connector for multiple tables or split them up?

Hello everyone,

I want to migrate a database to a newer postgres db with the help of Kafka Connect combined with a JDBC Source + Sink connector.
Instead of syncing every table from the legacy database, I only want to pick some of them and also limit the number of columns since I don’t need every column from these tables.

It’s also worth mentioning that these tables share some columns with the same name (e.g. table1.title, table2.title, …).

Now the problem:
There are some cases where I need the column with e.g. “title” and sometimes not. But I didn’t find a way to specify the target column more precisely than just add the column name to the “fields.whitelist” in my sink connector.

I think one ‘possible’ solution would be to split my one sink connector, which actually includes multiple tables with the help of the “topics” property. In this way, my “fields.whitelist” property would be unique in its context and wouldn’t lead to the described ambiguous problem.

But would this be efficient? Does Kafka Connect optimize cases like these? What would be the best practice in such a case. Is creating/using multiple sink connectors on the same database a thing or should you avoid this?

Thank you very much for your answers.

Best Regards

Hi @Polow,

I think you’re going to have to create multiple connectors here, driven by the level of refinement of configuration that you need.

One connector pulling multiple objects from a database can be throttled in its number of database connections by limited the number of tasks. Be aware that if you create multiple connectors then each one will spawn a task (at a minimum) so you’ll potentially increase the number of concurrent connections to your on your source database.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.