I have the following tables in postgres:
-
clients
: Columns - client_id, tenant_id -
divisions
: Columns - division_id, client_id -
contacts
: Columns - contact_id, division_id
I need a Kafka topic with denormalized data joining the above tables. With ksqlDB and using Kafka-connect (Postgres CDC):
- I created a
clients
stream and started materializing it into the ksql Tableclients_table
- Then I created a
divisions
stream, joined it with theclients_table
and materialized it again into adivisions_table
(which now has the tenant_id info fromclients_table
) - And created the final
contacts
stream and joined it withdivisions_table
and loaded it into Elasticsearch (again using Kafka-connect)
The above seems to satisfy my use-case, but:
When the queries in the Postgres tables are run in 10-100 millisecond gaps, I do not get the enriched data.
I am setting KSQL_KSQL_STREAMS_MAX_TASK_IDLE_MS
to 2,500 milliseconds, only after setting the above to 25,000 ms, I see the data enriched properly. But it has slowed down the process as well.
If I only consider joining the divisions
stream with client_table
, I could easily get away with a 2,500ms setting, but for another layer of materialization (i.e with contacts
, it requires more than 25,000ms to guarantee joins.
Is there a way to improve this?