Windowed Stream-Stream Join

Hi folks.

I have an architecture question. It is about windowed stream-stream joins.

It is well known that this kind of join is needed to use key and time, like it was explained on this great youtube video:

In this video Tim says we can join, for exemple, an order and a shipment streams, he uses a 4 hour window. And then he also creates a query to monitor shipments later than 1 hour.

But my question is:

What does about a shipments is made later than 4 hours, like 5 hours? Will we just miss this event?

Thanks for your time :smiley:

Hi Marcel, good question! Going back to the video, specifically at time 4:21, Tim explains that the events outside of the 4 hour window are ignored. You can view the windowing as part of the join condition, so rows that don’t fall within the window are not included in the result, just like a regular SQL join. Hope that helps!

1 Like

Hi, Danica. Thank you for your help. I think you are absolutely right.

But now I don’t think I asked my question correctly.

Let me try to redo this and create a scenario:

Imagine the same 4 hours windowed join and the same order and shipment topics; everything is working fine and the world seems to be really nice.

But suddenly something happens and the system stops doing shipments or we have a very lazy employee who is very delayed in his duties.

How could I be prepared to detect, recognize and recover those possible missed shipments? Should I create a bigger windowed join just in case?

I think that is an important scenario to think about, otherwise the topology would be not work as expected and we wil never get the whole job done.

Thanks in advanced.