Looking for advice on updating a production topology without reprocessing all data

pcallahan · 10 August 2022 21:32

We have a stateful kafka-streams topology running as a production system. It is working well but we want to make some changes and migrate to a newer topology.

Resetting the application and replaying all the data from the beginning after updating the topology is not really an option for us because we have a lot of data and it would take a significant amount of time to process all of it from the beginning - we don’t want that much downtime. Not to mention that would be against the notion of processing data “exactly once”.

One possible option would be to run the old topology, v1 and the new topology, v2 side-by-side and using the v1 topology until the v2 topology is sufficiently populated that we could shut off the v1 topology. I feel this isn’t ideal either because it introduces operational complexity and extra overhead of maintaining sets of processors, state stores, etc. It might be ok for updates that happen infrequently but we would like change the topology somewhat regularly as we iterate through our development plans.

I would like to see how others handled this situation.

Topic		Replies	Views
Kafka Streams application - new topology deployment Kafka Streams	10	5323	10 November 2023
Are changelog updates and topic updates batched together within a topology execution? Kafka Streams	2	3308	5 November 2021
Streaming Updates through Complex Operations in Kafka Streams at Scale [Kafka Summit 2022] Summit	0	3033	22 April 2022
Developing Kafka Streams Applications with Upgradability in Mind [Kafka Summit 2022] Summit	0	3583	24 April 2022
Using Modular Topologies in Kafka Streams to Scale ksql’s Persistent Queries Summit	0	3021	22 April 2022

Looking for advice on updating a production topology without reprocessing all data

Related topics