I’m trying to do something like this from traditional SQL
SELECT *
FROM topic1
WHERE NOT EXISTS (
SELECT 1
FROM topic2
WHERE topic2.key1 = topic1.key1
AND topic2.key2 = topic2.key2
AND topic2.key2 = topic2.key2
)
In a “after the time window” kind of logic.
To give better perspective consider these records come into the two topics (topic is first column)
The goal of this query is to identify “dangling mismatching ends” per address. So in this case for example, we want to identify that topic1 is missing addressX matching value (after some time period of course)
I tried doing this with a LEFT JOIN as well as a FULL OUTER JOIN but these don’t seem to wait for the other topic to get the data as defined by the window, as soon as one side has something the null variant join rows are output, so there’s no way to identify a dangling record.
Hmm we’re on 0.21 right now but even so this is confusing to me. If the default grace period is 24h shouldn’t the LEFT/OUTER joins then hold on until that time? Did the actual logic change here?
UPDATED: ah nevermind, you’re right! I didn’t read past the section where the logical change is explained. Thank you, this would fix it, but we’ll need to update to 0.23 then.
We tried updating from Docker Hub which is the latest image released but the GRACE keyword is rejected on joins with mismatched input 'GRACE' expecting 'ON'
I noticed tho that the server version reported when using this docker image is 0.23.1-rc9 and the image was posted on Dec 15th 2021 almost a month before the announcement of v0.23.1? Is this a mis-release?
The query is now not eager, but it never returns the null cases.
E.g. if I have a single row in topic1 and nothing in topic2 the left join never returns anything. It returns matching records if there are any. I did WITHIN 10 SECONDS GRACE PERIOD 10 SECONDS and waited over a minute to be sure.
Is there something I’m missing about how this is suppose to behave? My understanding is that if grace period times out from the leftside record being “buffered” it’d output it with nulls for the right side values (in case of the left join)
The observed behavior is as expected. Note that time is tracked based on event-time, not wall-clock time. Left/Right join results are only emitted when stream-time (max observed event-time) passed window close time (ie, window-end time plus grace period). If you stop sending data, stream-time cannot advance and thus you won’t observe output.