Well, if you don’t have a key to join on, the first issue is that you won’t be able to run the program with more than one partition per input topic…
If one partition is good enough, using the Processor API you might be able to pull it off. In the end, it seems you only want to consider the latest message per input for the join. You can read both inputs and attach a state store for each processor. Each state store should contain at most one message. If you get an input record on the left, you can do a lookup into the right store, and the other way around. Using stateStore#all() you can just scan the full store as you don’t care about the keys.
Right now I would need to run the program in one partition as the entire dataset needs to be processed sequentially. I understand the implications of this, I wonder if this suggests Kafka may not be the right tool for this…
When processing I always need to be looking at the “next” message, to ensure the join for the “current” message is with the right timestamp. I will look more into the Processor API with State store and see if I can find a way, thanks for pointing me in the right direction.