You could use a tumbling window with EMIT FINAL
to accomplish this. E.g., if you have a stream with deduplication key column id
, and another column that might vary called val
, then you can get the last val
per id
over 2-second windows like this:
SELECT id,
LATEST_BY_OFFSET(val),
WINDOWSTART,
WINDOWEND
FROM test
WINDOW TUMBLING (SIZE 2 SECONDS, GRACE PERIOD 0 SECOND)
GROUP BY id
EMIT FINAL;
Note that data has to keep flowing in order for windows to close (see here).
Flink SQL’s window deduplication is also a good fit for this. By ordering by timestamp DESC
and returning the first row, you get the last row per window, e.g.:
SELECT id, val FROM (
SELECT id,
val,
window_start,
window_end,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY ts DESC) AS rownum
FROM TABLE(TUMBLE(TABLE test, DESCRIPTOR(ts), INTERVAL '2' SECONDS))
) WHERE rownum = 1;