Join Guarantees in Stream-Table Join

I have the following tables in postgres:

  1. clients : Columns - client_id, tenant_id
  2. divisions : Columns - division_id, client_id
  3. contacts : Columns - contact_id, division_id

I need a Kafka topic with denormalized data joining the above tables. With ksqlDB and using Kafka-connect (Postgres CDC):

  1. I created a clients stream and started materializing it into the ksql Table clients_table
  2. Then I created a divisions stream, joined it with the clients_table and materialized it again into a divisions_table (which now has the tenant_id info from clients_table )
  3. And created the final contacts stream and joined it with divisions_table and loaded it into Elasticsearch (again using Kafka-connect)

The above seems to satisfy my use-case, but:
When the queries in the Postgres tables are run in 10-100 millisecond gaps, I do not get the enriched data.
I am setting KSQL_KSQL_STREAMS_MAX_TASK_IDLE_MS to 2,500 milliseconds, only after setting the above to 25,000 ms, I see the data enriched properly. But it has slowed down the process as well.

If I only consider joining the divisions stream with client_table, I could easily get away with a 2,500ms setting, but for another layer of materialization (i.e with contacts, it requires more than 25,000ms to guarantee joins.
Is there a way to improve this?


Could make it work this way, with the same 2,500 idle ms.:

CREATE STREAM enriched_division_contacts AS
SELECT * FROM "contacts"
"divisions_table" on "contacts".division_id = "divisions_table"."division_id"
INNER JOIN "clients_table" on "divisions_table"."client_id" = "clients_table"."client_id" ;

Thanks @Aneel for the help.

Slack Thread:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.