I have a setup where as and when a message is consumed from the source topic I have a tumbling window which aggregates the message as a list .
My intention is to group all incoming messages within a window and process them forward at once.
Tumbling window pushes forward the updated list for each incoming record, so we added suppress to get one event per window.
Because of which we see this behaviour where it needs a dummy event which has a stream time after window closing time to basically close the suppressed window and then process forward those messages. Otherwise it sort of never closes the window and we lose the messages unless we send a dummy message.
Looked at sliding window as well but it doesn’t give the same effect of tumbling window of reduced final updates.
Is my understanding/observation correct, if yes what can I do to get the desired behavior?
In general, the assumption is, that data never stops, so in general it’s not a problem. The problem only occurs if there is no new input data. Thus, the question is, why would data stop for your use case? What are you doing exactly?
If your question is for (unit) testing with TopologyTestDriver, just sending a dummy event to advance time is just fine.
Looked at sliding window as well but it doesn’t give the same effect of tumbling window of reduced final updates.
As sliding window has totally different semantics…
My intention is to group all incoming messages within a window and process them forward at once.
It seems you want to put every message into a sinlge window only. For this case, tumbling windows—which do not overlap—are the right choice. Sliding windows are overlapping and thus seem not to fit your requirement.
So we would have a burst of data but they arrive at their own time, I want to group the ones which arrive within a window so i can process them together publish one final message rather than having multiple final messages.
Data would stop if there are no more messages for that key.
are you saying windowing is not suitable if the data is published sparsely or if the publishing stops at any time?
So we would have a burst of data but they arrive at their own time, I want to group the ones which arrive within a window so i can process them together publish one final message rather than having multiple final messages.
Sounds a little bit like session-windows?
Data would stop if there are no more messages for that key.
Per-key is not an issue. Time is advanced based on data in a partition. Thus, as long as there is data flowing, no matter way keys, time does advance. – But when you say “burst of data” above, it seems data just stops, and it’s not that just key “disappears.”
It’s a know problem in Kafka Streams, and we hope to actually fix this in some future release (of course, this does not help you now, and it’s also unclear when we would be able to pick it up).
are you saying windowing is not suitable if the data is published sparsely or if the publishing stops at any time?
Depends on your detailed requirements. How long are these pauses? Can you tolerate that the last open windows are only closed when the next burst arrives, etc.
If the pauses are too long, and you cannot tolerate the delay, you can fall back to a custom solution using Processor, with state store and wall-clock time punctuations.
That’s a fair assumption in theory, but in practice, it’s not always true. There are many real-world cases — like overnight lulls or off-peak hours — where data legitimately pauses. And in those moments, you still want accurate, timely aggregation. Imagine the last user of the night performs some critical or even malicious action. If no new data arrives, that window may not flush until the next day. That’s not just a technical quirk — that’s a visibility and risk issue.
Also, the fact that so many developers resort to injecting dummy records just to force windows to close should be seen as a red flag. If an entire community has to implement hacks to compensate for internal stream clock behavior, that’s not a fringe case — it’s a usability gap.
Maybe it’s time to re-express this assumption, or at the very least, provide a more elegant mechanism to handle it.
There are many real-world cases — like overnight lulls or off-peak hours — where data legitimately pauses. And in those moments, you still want accurate, timely aggregation.
and
Also, the fact that so many developers resort to injecting dummy records just to force windows to close should be seen as a red flag.
and
Maybe it’s time to re-express this assumption, or at the very least, provide a more elegant mechanism to handle it.
Totally agree to all of it. It’s a known issue.
It would be helpful to me to know the reason for that though
There is no reason. It was implement this way many years ago (as you cannot do everything at once, we just happened to start there). While it did become apparent over the years, that we should improve it, it’s just that nobody did it yet. Again, a matter of prioritization: other issues just seemed to be more important. Cannot fix everything at once.
On a positive note: It’s actually on our “short list” of thing we want to fix. But even our “short list” is actually pretty long… I doubt that Confluent would have capacity to contribute this any time soon – earliest AK 4.3 release is my best guess… We are currently expanding our eng team for Kafka Streams – so maybe 4.3 could work out. We will see.
Of course, it’s an open source project, so anybody could pick it up and help to get it done sooner. So far, nobody from the community did step up though, even if we get a lot of contributions. So maybe, it’s also not the most important thing to fix for many others.
Last point: this conversation just bump the priority again
Honestly, I’m only venting because I’m still pretty new to Kafka Streams—and honestly, I think it’s awesome (I’d call it fun, but I’m not that much of a nerd ).
The project I’m building has been coming together really well overall, except for this one small hiccup. There are workarounds, for sure…
It was looking so clean—I just didn’t want to mess with that.
None of this applies to the testing side though, hahah.