I’ve been reading a lot about kafka, ksqlDB, streams, compaction and tombstoning, and message replay, and new consumers but I’m still struggling to answer my use case exactly enough to know that kafka is the right choice for me.
I was wondering if I’m correct with the below:
Given the below scenario:
- A table of Customers (id, name details, etc) in a Db.
- CDC on this table setup using debezium that send CRUD events to a Kafka topic (initial load and ongoing replication) .
- A ksqlDB Materialized Views that queries the topic and we also create this as a customer ksqlDB stream.
- Log compaction and tombstoning is configured.
- A number of consumers subscribed to the customer ksqlDB stream/view.
Is the below correct?
Log compaction and tombstoning on this customer ksqlDB stream means that the ksqlDB view only shows the current state of the Db table data? Maybe by using LATEST_BY_OFFSET?
If a new consumer comes along and we subscribe them to the customer ksqlDB view/stream… then all Current data in the ksqlDB view/stream will be sent to the new consumer as logical “inserts”, even though the compaction might leave the latest event for a row , as an “update” event? (I hope I’ve explained this in a way that makes sense? This is where I am struggling…)
And then after the initial load of the new consumer, any subsequent update events can/will then be seen as “updates” by the new consumer?(I hope I’ve explained this in a way that makes sense? This is where I am struggling…)