What are the strategies to deploy application based on Kafka Streams, with a new topology, but same input topics, so that no records are missed, and no duplicates are sent downstream? Bumping application id causes new consumer group to be created, which offsets will be set according to auto.offset.reset config parameter. Assuming the requirement for exactly-once processing - what are the approaches to resume processing from offsets committed by the previous version of the application?
Currently, the official strategy is to use a new application.id or to reset the application and reprocess historical data. That is needed because the new topology might be incompatible with the old topology and errors with the states and the repartition topics may arise. Additionally, also the semantics of the processing may be changed with the new topology (but that is probably the reason to change the topology).
However, there are certainly cases where the new topology is compatible with the old one and the topology can be changed without the need for changing the application.id or reset the application.
So it depends on the changes to the topology.
We have discussed addition of topology evolution but we haven’t yet put it on our short-term roadmap.
Thanks @Bruno for your reply. It’s absolutely correct, when topology changes (in a breaking way) internal topics and state stores get new names, thus it is required to bump application id or perform application reset.
It may be somewhat niche problem what I am referring to, but what I observed is that sometimes it is required to start processing from where former application version stopped (in terms of source topics).
For now, I guess, some solution could be to set offsets manually for a consumer group created for new application-id, but it is quite cumbersome task. Another approach may be to create completely separate stack, with different output topics as well, and then switch consumer’s at some point - but it brings a notion of coordinated deployments chain, which should be avoided.
Is there any KIP with ongoing discussion about topology evolution, or is it too early stage to create one?
The naming issue can be somehow alleviated by explicitly naming all your operators. That obviously needs to be done when the topology is created. But this does not guarantee compatibility in general.
I personally do not think that it is such a niche problem, because a topology needs to be upgraded sooner or later. The simpler such an upgrade is the better.
No, there hasn’t been a KIP, yet. Just some ideas that I discussed with some folks.