DR for Kafka Streams Apps in Confluent Cloud

kjang1750 · 22 January 2024 23:28

Hey guys! Does anyone have any suggestions to perform Kafka Streams Disaster Recovery in Confluent Cloud. I am using Cluster Linking to mirror my data. While I have read that you should failover your streams app and then re-process the messages from earliest, my concern is that the retention period in the Disaster Recovery Cluster may alter state. Meanwhile, if I manage two applications in two different Regions, it can become a bit hectic.

Does anyone have a good way to allow for a quick RTO/RPO when managing Kafka Streams?

mjsax · 27 January 2024 02:30

Depends on your processing logic. If your operations are “windowed”, you might re-create the oldest windows only partially, but newer windows would be complete, and thus state should be fine. In the end, if you re-process from earliest, you produce duplicate output anyway, right?

If you have non-windowed processing, it’s overall difficult to achieve fail over w/o potential data loss, because Cluster Linking is async:

You could use Cluster Linking to mirror the changelog and repartitions topics. However, because of the async data transfer, input topic offsets on the source cluster might be committed before data is replicated to the target cluster and thus, if you fail over to the last committed offset, data might be missing. – You could try to compensate for this by re-processing some more input data but it would be difficult to know how far you would need to go back.

Hope this helps.

Topic		Replies	Views
Confluent Kafka Ops	6	3111	2 August 2022
Confluent Replicator in disaster recovery and offsets Cluster Replication	0	19	29 September 2024
🎧 Multi-Cluster Apache Kafka with Cluster Linking ft. Nikhil Bhatia News and Blogs	0	3321	31 August 2021
How to manage ksql's queries when failover with cluster linking? ksqlDB	4	707	29 March 2024
Disaster Recovery Strategy Cluster Replication	7	4299	8 November 2023

DR for Kafka Streams Apps in Confluent Cloud

Related topics