Especially in more complex topologies (some descriptions are several hundred lines) there are a lot of internal topics. In most cases they don’t need any attention. Sometimes, however, we need to switch from one version of an incoming topic to a new (breaking) version, and with that some internal topics are affected too, for example changelog topics from state stores that hold payloads of the same structure/type as the (changed) incoming topic. So we delete affected internal topics such that they get created and populated from scratch. This is pretty straight forward for most of the changelog topics, as we can determine the type of the payload and hence, whether the type changes along with the incoming topic or not.
However, some internal topics are a complete mystery:
what purpose do they serve?
what are the types of their payloads (some are mostly empty or of an unknown type, i.e. neither avro nor json)
should they be deleted and then re-created and re-populated again?
I searched for documentation that would explain the types of internal topics but didn’t find anything so far. Is there anything except studying the source code itself?
As the name indicate, these topics are for FK-joins.
The SUBSCRIPTION-REGISTRATION and SUBSCRIPTION-RESPONSE topics and expected to be “empty”, as they are basically repartition topics, and thus they are purged periodically, after the data is processed downstream.
The SUBSCRIPTION-STATE-STORE--changelog topic should not be empty though. FK-joins use one additional internal “helper” state store, and this topic is the corresponding changelog for this helper store.
that was exactly the kind of information I was hoping for
Very good article and very interesting what (optimizations) happens behind the scenes!
I must say though, at least in some of our FK-joins, the optimizations are not really needed (e.g. when there’s a 1-to-1 relationship) and switching to the FK as key and then do a PK-join would be sufficient and would result in
a simpler topology and
intermediate changelog topics where we can look at (and understand) the keys and values.
I must say though, at least in some of our FK-joins, the optimizations are not really needed (e.g. when there’s a 1-to-1 relationship) and switching to the FK as key and then do a PK-join would be sufficient
Well, we cannot know if there is an actual n:1 or 1:1 relationship… If you have a 1:1 relationship, you might want to prepare the data accordingly yourself, pre-precessing the data as KStream.selectKey(...).toTable() and do regular 1:1 join afterwards…
I fully agree and I’m not criticizing at all. Considering all the aspects that the implementation takes into account, it is amazing how simple a FK-join is done in the DSL. Just saying that many developers might not really be aware of how a FK-join changes the topology.