Dependencies betweeen Stream and Global State Store

Hi Community,

i have started to work with kafka for several month now. Have learned how to create a Spring Boot Kafka Stream Application. Recently i have added a Global State Store to my Application.
Now that i know how to implement a Kafka Stream and a Global State Store, i would like to understand their dependencies on each other.

For example lets say i have an application with a stream and a global state store.

The Stream source from topic A with around 10mio records, it transforms data and writes the transformed data into topic B.

The Global State Store sources from topic B.

Case one(fresh deploy):
When the application starts for the first time, the stream will process all the data from topic B and the global state store will be build up instantly
→ the app is up and running right away while the stream still processing the data

When does the state store gets filled up? Whenever a new record has been written in topic B it also gets directly stored into the state store?

Case two (redeploy with different application-id)
When the application starts it is NOT up and running right away because:
the stream is reprocessing all data from topic A while the state store is getting restored. How long the state store needs to get restored depends on the amount of the data it has been build up before.

How is the process of restoring the global state store? Where does the data come from? What about the data from topic B where it sources from?
When does the store sources from topic B? When the store has been fully restored or while it is being restored?

Thank you for your time to help.

Best regards
Kafkanaut

When does the state store gets filled up? Whenever a new record has been written in topic B it also gets directly stored into the state store?

The global state store will pick up any write into its (global) input topic to update the content of the state store. So the update is no “direct”, ie, the update is async: first the write into the topic happens, and at at some later point (usually just a few ms later) the update will be read to update the store.

the stream is reprocessing all data from topic A while the state store is getting restored.

No. On startup, Kafka Streams will first restore the global state store (ie, read the full global topic). Only afterwards, processing will start. As you deploy the same topology (just with a different application-id), topic-B will be used to restore the global state store before processing of topic-A starts.

1 Like

Hi mjsax,

thank you very much for your answere!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.