What you observe is by-design. The join computes the result record timestamp as max[left.ts, right.ts] what in general is semantically correct (the join has built-in event-time semantics). The result, semantically, cannot exist before both left and right record exist.
In your example, if left.ts = 1000 and right.ts=1500 at ts=1000 there is no join result, because right does not exist yet. The join can only happen at 1500 when both record exist. Does this make sense?
Given your business requirements, it seems you either don’t want proper event-time semantics, or your right input data is not timestamped correctly.
I guess you have a few options. The simples one might be, to modify the ts of the right input (maybe even set them to zero; depends on you business requirement), before the data got into the table:
However, after tracing our case through the Kafka Streams (v4.1.1) FK-join internals, I think there is an important distinction between the semantic “join result” and the internal …subscription-response-topic.
In our case we force the right-side table record timestamp to 0, and then we observe records with ts=0 on the internal FK topic, for example:
but ForeignTableJoinProcessorSupplier (the path triggered by a right-table update) forwards with:
record.withKey(…).withValue(…)
and does not recompute the timestamp
then ResponseJoinProcessorSupplier also forwards with:
record.withValue(result)
so it preserves the incoming response-record timestamp
So it seems that for FK joins there are two paths:
left/subscription-triggered path:
timestamp is computed as max(left.ts, right.ts)
right-triggered path:
the internal subscription-response-topic keeps the right-record timestamp
This would explain why, if we force right.ts = 0, the internal …subscription-response-topic also gets ts=0.
So my question is: is this understanding correct?
In other words, does the max(left.ts, right.ts) rule apply to the left-triggered FK path, while right-triggered FK responses preserve the right timestamp in the internal response topic?
What you say is correct, but that’s just an implementation detail you don’t need to worry about. What’s in the response topic is internal by definition, so not sure why you care about it?
In other words, does the max(left.ts, right.ts) rule apply to the left-triggered FK path, while right-triggered FK responses preserve the right timestamp in the internal response topic?
It applies to both, but it’s only computed on the left hand side.
When a left hand side update happens, the FK join first send a subscription request to the right-hand side. The right hand side would process the subscription update to send a subscription response. Afterwards the left side would process the response to compute the join result. – Of course, the response must encode to right hand side’s record ts, to allow the left hand side to compute max(l.ts, r.ts).
When a right hand side update happens, the FK join checks existing subscription and also sends a response to the left hand side to trigger a join result computation. Again, left hand side would pickup the response to compute the join result; including max(l.ts, r.ts) As a matter of fact, the left hand side does not even know, or care, if a response is triggered by a left or right hand side update, and it always include the ts computation, which is part of the join result computation.
Note that the response topic does not contain join results. It only contains the right hand side record which must be joined, and the left side picks-up the right hand side record to compute the join result.