Day 0 scripts to populate Kafka topics

joe_trabajoe · 8 December 2022 17:48

How should the existing GBs or probably TBs of data in a large enterprise be made available for immediate availability in Kafka topics?

While I know that we can use the CDC capabilities of Kafka to build an event log or DB replica, with CDC the Kafka topics will be filled only whenever there is a change in the backend DB.

What is the best approach to build a Kafka layer which replaces the need to hit the real backend databases. How will the millions of records in existing database tables be available in Kafka topics as soon as we deploy our application integrated with Kafka CDC connector?

OneCricketeer · 21 December 2022 14:04

You’ll need to hit the database, one way or the other. That means read the oplog or use a driver to periodically query it in batches. Neither will immediately offer you the full database, and only the oplog option (i.e. Debezium or Oracle Goldengate, as popular options) will give you full information about operations ran on the database rather than just a point in time snapshot of every row.

Topic		Replies	Views
Using CDC fed Kafka topics for replay with new consumers Architecture and Design	4	4253	25 January 2022
CDC from Z/OS to Kafka Architecture and Design	3	2617	4 April 2023
More than one master table , lookup data in KStream for data enrichment Managed Connectors	1	3491	13 October 2021
MSSQL CDC connector in Confluent Cloud Confluent Cloud	1	805	27 March 2024
Does Kafka Connect work if CT is enabled instead of CDC on SQL server Kafka Connect	2	1941	30 July 2023

Day 0 scripts to populate Kafka topics

Related topics