In my case I’m afraid that topology processing will be slower than incoming event pressure. That is emulated by sleep.
Why do you think so? What is the expected throughput? Kafka Streams scales very well, and can processed hundreds of thousands of messages per second…
I believe that Kafka Streams is efficient, but I’m afraid that my own code, part of topology. will be slow. Not very slow, but slow enough to have a lag against sell stream.
I want process (sell 1 ) then (sell 2 | sell 3 )
Your example is still rather high level. Why process sell-1 in the first group, and sell-2+sell-3 in the second group, instead of sell-1+sell-2 in the first group, and sell-3 in the second group, or even process in three groups? What is the actual criteria when you split a group?
In my head I have analogy to Java BlockingQueue. You may call
poll() and process events one by one. But you may call
drainTo() and get all waiting events at once.
I know Kafka Streams have caching and can poll few events from topic in advance and merge them as single event for KTable.
Going back BlockingQueue, if I use
drainTo() and incoming events are [sell1, sell2, sell3, sell4, sell5] depending on timeline I can receive [sell1] [sell2, sell3] [sell4, sell5]. As Kafka polls topic in advance, it could be theoretically possible.
But I think it is not done and I must accept that my topology will process [sell1] [sell2] [sell3] [sell4] [sell5]
I think “window” is not my case because I want process incoming events immediately,
Records are processed immediately. Also, if you say you want to process sell2+sell3 in one group, you actually don’t process sell2 right away, but you wait for sell3 to arrive, too…
I don’t want wait for sell3. But If Kafka Stream can look ahead on partition and see sell3 is ready, I could process both sells at once
transform() might still be the best way forward. Let us know how it goes.
I tried but I it does not look promising. Event if possible, it needs reimplementing cache, which Kafka already has.