TLDR; How did you implement a delay/retry queue in Kafka?
Has anyone solved the problem of implementing retry/delay functionality in Kafka?
Originally I looked at Ubers example ( Uber retry topic ) but I do not want to create n topics and consumers as the overhead would be too much. I essentially want 1 topic to act as a retry queue and 1 topic to be a dead letter queue. I also would like to keep away from using a database.
This is what I designed but I quickly ran into some issues.
To begin with, I created the flow above and had the retry consumer wait a calculated amount of time based on the number of attempts (exponential backoff). If the retry failed it would produce the event back onto the topic. The main reason this was flawed was because when the delay was observed, it blocked the Consumer, this would also block consumption of other events, increasing the observed delay, essentially meaning events would be blocked for much longer than the expected time. (i.e event 1 is forced to wait 16 seconds but event 2, is forced to wait for the processing of event 1 event though it’s only meant to wait 2 seconds. As you can tell, this gets nasty with multiple queued events.)
Window processing in Streams.
The next option I thought of was creating a stream that processes events in a time window, for example, when the consumer puts the event on the topic it will calculate the time it needs to be processed. The stream then calculates what events need to be processed in the window and processes them.
I have not yet finished implementing the window processing in the stream as it feels like an overly complicated way to do something simple.
Has anyone dealt with this in the past? If so, how did you go about it? what technologies did you use? and what did you learn when implementing?
Thanks for taking the time to read and I look forward to reading your comments!