Reddit CDC with Debezium

A quick but interesting read from the Reddit Tech Blog on their use of a Kafka Connect Debezium connector to help them snapshot their data more efficiently. Take a look to see their reasoning for moving to this architecture as well as some pros and cons of their current set up!

One disadvantage to using Debezium is that initial snapshotting could be too slow if the volume of your data is large because Debezium builds the snapshot sequentially with no concurrency.

It would be interesting to know what settings within postgres snapshots could be possible and if that woudl help to solve the issues. I find it very confusing trying to keep track of the differences between databases and how they implement logging.

https://debezium.io/documentation/reference/connectors/postgresql.html#postgresql-snapshots

Gunnar Morling posted on twitter about this, https://twitter.com/gunnarmorling/status/1455549791254073352?s=20 . Getting concurrency for snapshots is on the roadmap.