I have the following tables in postgres:
-
clients: Columns - client_id, tenant_id -
divisions: Columns - division_id, client_id -
contacts: Columns - contact_id, division_id
I need a Kafka topic with denormalized data joining the above tables. With ksqlDB and using Kafka-connect (Postgres CDC):
- I created a
clientsstream and started materializing it into the ksql Tableclients_table - Then I created a
divisionsstream, joined it with theclients_tableand materialized it again into adivisions_table(which now has the tenant_id info fromclients_table) - And created the final
contactsstream and joined it withdivisions_tableand loaded it into Elasticsearch (again using Kafka-connect)
The above seems to satisfy my use-case, but:
When the queries in the Postgres tables are run in 10-100 millisecond gaps, I do not get the enriched data.
I am setting KSQL_KSQL_STREAMS_MAX_TASK_IDLE_MS to 2,500 milliseconds, only after setting the above to 25,000 ms, I see the data enriched properly. But it has slowed down the process as well.
If I only consider joining the divisions stream with client_table, I could easily get away with a 2,500ms setting, but for another layer of materialization (i.e with contacts, it requires more than 25,000ms to guarantee joins.
Is there a way to improve this?