Hello. I’m begginer developer of kafka stream.
I’m using spring boot and kafka stream.
When I check the topolgoy, there are one input source node and two sink output node
topology is simply like below
[hello topic] - A processing - A sink
[hello topic] - B processing - B sink
I knew that, application.id property is used for “consumer group id”
That means that the topology is executed by one consumer (if partition is one)
so… when i think about the situation application shut down and,
“A processing and Sink succeed, and B processing and sink failed situation”
when this situation,
kafka stream works well? (no duplicate? or redo well?)
Hi @choiwonpyo , welcome to the forum!
It is common to have multiple output topics in a Kafka Streams app, as in your example.
It is also possible that events may be written to “A sink” topic, but a failure / crash prevents the processing results from B from being written to “B sink” topic. In this case, your application should be halted, the problem fixed, and the application restarted.
Since the consumer group for that application did not commit its offsets, it will reconsume the same batch of events and process them again. This can lead to duplicates. It is possible that you will end up with duplicate results in our output topics “A sink” AND possible in “B sink” as well.
no duplicate?
You could turn on exactly-once processing in the kafka streams configuration by setting processing.guarantee=exactly_once
or processing.guarantee=exactly_once_v2
(depending on your broker version).
Here is a blog that explains it in more detail: Enabling Exactly-Once in Kafka Streams | Confluent
Note that exactly once semantics may slow down your processing throughput.
redo well?
You can also accept that duplicates may occur, and program your consumers of “A sink” and “B sink” topics to handle the data accordingly. In many cases (though not all), it is possible to write idempotent business logic that simply doesn’t care about duplicates.
Hopefully this helps !
1 Like