I have the following tables in postgres:
clients: Columns - client_id, tenant_id
divisions: Columns - division_id, client_id
contacts: Columns - contact_id, division_id
I need a Kafka topic with denormalized data joining the above tables. With ksqlDB and using Kafka-connect (Postgres CDC):
- I created a
clientsstream and started materializing it into the ksql Table
- Then I created a
divisionsstream, joined it with the
clients_tableand materialized it again into a
divisions_table(which now has the tenant_id info from
- And created the final
contactsstream and joined it with
divisions_tableand loaded it into Elasticsearch (again using Kafka-connect)
The above seems to satisfy my use-case, but:
When the queries in the Postgres tables are run in 10-100 millisecond gaps, I do not get the enriched data.
I am setting
KSQL_KSQL_STREAMS_MAX_TASK_IDLE_MS to 2,500 milliseconds, only after setting the above to 25,000 ms, I see the data enriched properly. But it has slowed down the process as well.
If I only consider joining the
divisions stream with
client_table, I could easily get away with a 2,500ms setting, but for another layer of materialization (i.e with
contacts, it requires more than 25,000ms to guarantee joins.
Is there a way to improve this?