Tumbling window and suppress

I have a setup where as and when a message is consumed from the source topic I have a tumbling window which aggregates the message as a list .

My intention is to group all incoming messages within a window and process them forward at once.

Tumbling window pushes forward the updated list for each incoming record, so we added suppress to get one event per window.

Because of which we see this behaviour where it needs a dummy event which has a stream time after window closing time to basically close the suppressed window and then process forward those messages. Otherwise it sort of never closes the window and we lose the messages unless we send a dummy message.

Looked at sliding window as well but it doesn’t give the same effect of tumbling window of reduced final updates.

Is my understanding/observation correct, if yes what can I do to get the desired behavior?

Your observation is correct.

In general, the assumption is, that data never stops, so in general it’s not a problem. The problem only occurs if there is no new input data. Thus, the question is, why would data stop for your use case? What are you doing exactly?

If your question is for (unit) testing with TopologyTestDriver, just sending a dummy event to advance time is just fine.

Looked at sliding window as well but it doesn’t give the same effect of tumbling window of reduced final updates.

As sliding window has totally different semantics…

My intention is to group all incoming messages within a window and process them forward at once.

It seems you want to put every message into a sinlge window only. For this case, tumbling windows—which do not overlap—are the right choice. Sliding windows are overlapping and thus seem not to fit your requirement.

1 Like

So we would have a burst of data but they arrive at their own time, I want to group the ones which arrive within a window so i can process them together publish one final message rather than having multiple final messages.

Data would stop if there are no more messages for that key.

are you saying windowing is not suitable if the data is published sparsely or if the publishing stops at any time?

So we would have a burst of data but they arrive at their own time, I want to group the ones which arrive within a window so i can process them together publish one final message rather than having multiple final messages.

Sounds a little bit like session-windows?

Data would stop if there are no more messages for that key.

Per-key is not an issue. Time is advanced based on data in a partition. Thus, as long as there is data flowing, no matter way keys, time does advance. – But when you say “burst of data” above, it seems data just stops, and it’s not that just key “disappears.”

It’s a know problem in Kafka Streams, and we hope to actually fix this in some future release (of course, this does not help you now, and it’s also unclear when we would be able to pick it up).

are you saying windowing is not suitable if the data is published sparsely or if the publishing stops at any time?

Depends on your detailed requirements. How long are these pauses? Can you tolerate that the last open windows are only closed when the next burst arrives, etc.

If the pauses are too long, and you cannot tolerate the delay, you can fall back to a custom solution using Processor, with state store and wall-clock time punctuations.