Sliding Windows vs. Hopping Windows

In the 2.7.0 release, KIP-450 provided a new windowing strategy called Sliding Windows. With Sliding Windows we can now perform windowed aggregations over small increments of time in a more efficient manner than with Hopping Windows. Now that we have the option between these strategies, when would we choose one over the other?

From my point of view, the core difference is the “triggering” behavior of each window.

For hopping windows, the window is re-evaluated on fixed time intervals, independent of the actual content of the data stream (ie, independent of the records). You could use a hopping window if you need to get periodic results. For example:a daily business report over the last seven days; or an hourly update over the last 24h. Even if no new record are processed, you want to get a result in fixed time intervals sent downstream.

On the other hand, a sliding window is re-evaluated only if the content of the window changes, ie, each time a new record enters or leaves the window. This type of window is good for a “moving average” computation as an example. As long as no new records arrive, the result (current average) does not change and thus you don’t want get the same result sent downstream over and over again. Only if a record enters or leave the window, and the average changes you want to get an update. Alerting on a threshold may be a good use-case: it’s only useful to re-evaluate the threshold if it did change; there is no advantage to evaluate the same result in fixed time intervals.

5 Likes

In other words, the triggering behavior of sliding windows is based on stream time where the the triggering behavior of hopping windows is based on wall clock time.

Also, sliding windows are inclusive on both start and end time. Hopping windows are only inclusive on start time.

In other words, the triggering behavior of sliding windows is based on stream time where the the triggering behavior of hopping windows is based on wall clock time.

Not really. The triggering behavior of hopping/tumbling windows is based on stream-time, while the triggering behavior of sliding windows is data-dependent (similar to session windows). Wall-clock time plays no role – if no new records arrive, no new sliding windows would be created (the window does only move into the future, when new records arrive).

1 Like

I see! All windowing strategies are based on stream-time (the notion of advancing time based on events). Whether a window is opened or closed when new events occurs depends on the given strategy and parameters. For sliding windows a new window maybe created to contain the event (if it doesn’t exist), and a subsequent window is created 1ms after the event’s timestamp. Thus making sliding window’s start time data-dependent on the event’s timestamp. However, this type of reliance on the event’s timestamp to determine the start/end of the window doesn’t apply with hopping windows. With hopping windows the parameters (ie duration and advanceBy) primarily control if the window is opened or closed when events occur.

1 Like

From the great podcast conversation between Leah Thomas and @tlberglund one the benefits of using sliding windows over tiny hopping windows is that unnecessary windows are not created/stored in bursty scenarios.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.