Process is into two steps. Step 1. For a day there are inflow of messages, where each message has a numeric filed to represent the version of record. I need to continuously find the latest version based on the numeric filed and only keep the latest version. This process will continue till EOD.
Step 2 happened after Step1 EOD , is to only take the message which are identified as latest and write to the file
How I am solving
Step1 – define a topology that read message from topic A and use KV store to only store the latest version. Logic compare the incoming record version with version stored in KV for that key. This process continue till EOD and then I write everything from KV Store to txt files where each file is one to one mapping with partition number like file-0.txt. this file has the latest offset number for the record. This output becomes a input for the Step2
Step 2 – load the output file-0.txt in HashMap onparition assigned call and then start consuming he message. As message comes in do the comparison of the offset I have previous stored in file to the current record. If offset matched then just write the record to files else simply ignore and increment the counter of the offset file to point to the next offset to compare.
I am reading about KStream joins and KTable goodness but not able to figure out how Step1 2nd half where after identifying the latest version till EOD, can be stored as a lookup table and Step 2 can simply join the topic A with the lookup table and only fetch the record that matched by there offset number. I am solving this currently with more of java solution and seems missing all Kafka goodness of Kstream/KTables. Any direction, guidance on how to convert values into a store that can be used for comparison.