Change of how global-store is read from kafka in kafka-streams 3.8.1

While upgrading to spring-boot 3.4 we noticed the following change in global-store processor:

Before the upgrade, using kafka-streams 3.7.1, while intializing the global store, the topic would be read directly to the store without invoking/consulting the processor. New records would then be passed to the processor and inserted to the store, if needed.

After the upgrade, using kafka-streams 3.8.1, the topic is consumed from the beginning via the processor, no “direct-copy” to the global store of existing records would take place.

Can you confirm what we noticed? Is this a wanted change? Is there any way to re-activate (configure) the old behaivour?

Yes, it is an intended change: [KAFKA-7663] Custom Processor supplied on addGlobalStore is not used when restoring state from topic - ASF JIRA

Some people use the “global Processor” to actually modify the data, and for this case, it’s necessary to re-apply the Processor during restore.

Atm, it is not possible to disable this behavior, even if I agree, that it would be a good optimization for the “direct-copy” case. It was a little bit unclear, how this optimization could be exposed to end-users and we might need to do a KIP for it, so it’s more complicated.

As a first step, it might be good to file a Jira ticket as feature request. Of course, you are very welcome to pickup this work yourself and contribute it :slight_smile:

@mjsax thanks for clarifying this.

As a side note, for us this is a breaking change and it would have been great if it would have been introduced as either an alternative API or a toggled functionality.

Very sorry to hear that it is a breaking change for you. Can you elaborate why? It should not be a breaking change…

Breaking changes are only allowed in major releases, and need to be backed by a KIP. Seems we did do something wrong. Would be good to understand this better to avoid in the future.

some of our global store processors are used to trigger change notification in the application. In some rare usages, we react to these state changes with some quite “expensive” tasks. This is how we came to discover the change: after passing all test stages, the application would not start in production because the queue for these tasks would get exhausted…

Oh, you trigger external side effects? – That is not really expected at all…

Maybe you can work around by using a KafkaStreams#setGlobalStateRestoreListener(StateRestoreListener) and only enable side effects after restoring finished?

Hi mjdax,

Thanks a lot for the Tipp!

I will evaluate this with the team -

since we are just starting the migration of quite many applications to spring-boot 3.4 this will surly be very useful!

Ron

1 Like