Understanding internal topics

pat · 5 December 2024 20:40

Especially in more complex topologies (some descriptions are several hundred lines) there are a lot of internal topics. In most cases they don’t need any attention. Sometimes, however, we need to switch from one version of an incoming topic to a new (breaking) version, and with that some internal topics are affected too, for example changelog topics from state stores that hold payloads of the same structure/type as the (changed) incoming topic. So we delete affected internal topics such that they get created and populated from scratch. This is pretty straight forward for most of the changelog topics, as we can determine the type of the payload and hence, whether the type changes along with the incoming topic or not.

However, some internal topics are a complete mystery:

what purpose do they serve?
what are the types of their payloads (some are mostly empty or of an unknown type, i.e. neither avro nor json)
should they be deleted and then re-created and re-populated again?

Examples for such topics are:

<application.id>-KTABLE-FK-JOIN-SUBSCRIPTION-STATE-STORE--changelog
<application.id>-KTABLE-FK-JOIN-SUBSCRIPTION-REGISTRATION--topic
<application.id>-KTABLE-FK-JOIN-SUBSCRIPTION-RESPONSE--topic

I searched for documentation that would explain the types of internal topics but didn’t find anything so far. Is there anything except studying the source code itself?

Any links, hints, explanations appreciated.

mjsax · 6 December 2024 04:42

As the name indicate, these topics are for FK-joins.

The SUBSCRIPTION-REGISTRATION and SUBSCRIPTION-RESPONSE topics and expected to be “empty”, as they are basically repartition topics, and thus they are purged periodically, after the data is processed downstream.

The SUBSCRIPTION-STATE-STORE--changelog topic should not be empty though. FK-joins use one additional internal “helper” state store, and this topic is the corresponding changelog for this helper store.

This blog post explains the internals of FK-join in details and should shed some light (in case you want to dig deeper): Real-Time Data Enrichment with Kafka Streams: Introducing Foreign-Key Joins

pat · 9 December 2024 16:51

Thanks Matthias

that was exactly the kind of information I was hoping for

Very good article and very interesting what (optimizations) happens behind the scenes!

I must say though, at least in some of our FK-joins, the optimizations are not really needed (e.g. when there’s a 1-to-1 relationship) and switching to the FK as key and then do a PK-join would be sufficient and would result in

a simpler topology and
intermediate changelog topics where we can look at (and understand) the keys and values.

Thanks again for the great and timely response.

mjsax · 9 December 2024 18:14

I must say though, at least in some of our FK-joins, the optimizations are not really needed (e.g. when there’s a 1-to-1 relationship) and switching to the FK as key and then do a PK-join would be sufficient

Well, we cannot know if there is an actual n:1 or 1:1 relationship… If you have a 1:1 relationship, you might want to prepare the data accordingly yourself, pre-precessing the data as KStream.selectKey(...).toTable() and do regular 1:1 join afterwards…

pat · 10 December 2024 13:37

I fully agree and I’m not criticizing at all. Considering all the aspects that the implementation takes into account, it is amazing how simple a FK-join is done in the DSL. Just saying that many developers might not really be aware of how a FK-join changes the topology.

Thanks again for the great and timely support!

system · 17 December 2024 13:37

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom names for intermediate FK-Join topics Kafka Streams	4	23	13 May 2025
New features in ksqlDB 0.15 ksqlDB	3	3340	19 February 2021
Global Ktable changelog topic retention vs Kafka topic retention Kafka Streams	11	47	25 May 2025
Changelog restoration with persistent storage Kafka Streams	3	123	16 December 2024
Re-key Global K-Table Kafka Streams	4	2281	12 April 2023

Understanding internal topics

Related topics