Wondering if anyone has spent time on persisting the messages in the producer buffer to guarantee no data loss. Is there any recommendation around storing the message in a KV store and then clear it once we get a callback from the broker. The record metadata does not have a guid that we could set during the send() and verify during the callback()
IMO this is not going to be very popular as it implies in having to deal with too much problems if they occur. There is lots of variations regarding how records are produced in Kafka that can break whatever implementation you come up with. Keeping caches in-sync with the app is usually troublesome, especially if the KV store is accessed as a remote process.
I think a better design is to rely on the guarantees that Kafka provides (such as retries) and fail-fast if something odd occurs.
Thanks. How do we safeguard against a situation where the client has some messages buffered and the running pod crashes?
Since the record are batched by the producer api how do we back up the messages assuming that they are received from a rest end point(in this case we would have returned a 202 response to the consumer already). The only way I see a solution is save the transaction to the dB and use connect.
If the message was the result of a user clicking a button that in turn triggered the message being sent — I would adopt the fail-fast pattern and ask the user (or whoever controls the sending of the message) to repeat the operation. Another option if this is a non-user driven transaction (likely a pipeline) is leave to the upstream layer of the pipeline that controls this transaction to manage the state of failed messages.
Yes it could be persisted but I would also want to evict this entry after I receive the callback to keep it light. The kafka record metadata does not return any unique identifier which could be set during the send operation
This is a microservice emitting the events. In the fail-fast approach the service calls producer.send() and assumes the event will be sent(kafka producer library should handle the retries as you mentioned earlier) Looks like the only options we have would be to keep the batch size small and add a flush during graceful shutdown to ensure the buffer is flushed. But in cases of hard kills there is a chance the messages in the buffer will be lost and the client should use its own persistent store to replay those events.