Hey.
Can’t get my head around something.
Say, one is to deduplicate a raw event stream from redundant sources.
That is by stateful flatTransformValues.
Without going into detail on what the duplicate record criteria is, let’s say one is to store some view or aggregate of all previous values for key in a GlobalKTable and return null from the transformer for duplicates.
Question.
How is it ensured, that by the time a record is being processed the table actually has the latest state and won’t allow a duplicate through?
What if there were two apllication nodes, one died right after processing a record, repartition happened, and the other node just got a record to process, but not the latest state?
Is there some internal mechanism that rechekes linked partitions of service topics before giving control to the library user?
Or is there only eventual consistency? Would one need a strongly consistent ACID storage like an SQL database?