I’m processing messages produced by Debezium from updates to a SQL Server Database. There are multiple rows that can be updated in the same transaction and i’m trying to capture the latest values for all of them.
For example, when I make a Stream-Table JOIN on a table that’s updated at the same time as the stream, the Table often isn’t up to date yet and I get the old value.
Stream-Stream join with the WITHIN clause works, but only if both streams are updated within that period (sometimes, only one value will be updated).
I’ve tried using a “Window Session” of 5 seconds, which works when both rows are updated at the same time, but if only one is updated, then I get NULL in the other value.
So basically, I want to join a stream to a table and make sure I get the true latest value from that table.
I’m just looking for a general ideas on how to handle those kind of use cases.
Is it possible to “delay” by just a few seconds the processing of a stream?
I’ve toyed with the LATEST_BY_OFFSET aggregate function, but still having trouble with null values when only 1 of them is updated.