GlobalKTable consistency, ensuring latest state for processing

Can’t get my head around something.
Say, one is to deduplicate a raw event stream from redundant sources.
That is by stateful flatTransformValues.
Without going into detail on what the duplicate record criteria is, let’s say one is to store some view or aggregate of all previous values for key in a GlobalKTable and return null from the transformer for duplicates.

How is it ensured, that by the time a record is being processed the table actually has the latest state and won’t allow a duplicate through?
What if there were two apllication nodes, one died right after processing a record, repartition happened, and the other node just got a record to process, but not the latest state?
Is there some internal mechanism that rechekes linked partitions of service topics before giving control to the library user?
Or is there only eventual consistency? Would one need a strongly consistent ACID storage like an SQL database?

GlobalKTables are inherenty async, because they only update by reading from their input topic.

For de-duplication, you would need to use a regular store/KTable and partition the data accordingly. This way, your de-duplication step can read a input record, lookup the store, and update the store sync, avoiding any race condition.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.