Hey KSQLDB community:
I have several streams of events from Oracle Golden Gate that I
wish to join in ksqldb (v0.18.0).
There are “Plan” events that we get via CDC from Oracle Golden Gate:
plan_id | Name |
---|---|
123 | Foo |
As you might be aware, CDC events from OGG have a field called op_ts
to denote the timestamp which the change was made in Oracle. This “Plan” data is manifested into a ksqldb stream
The other is Plan-Agency “reference data” - this table cross references a “Plan” to an “Agency” for specific date range. This is also from OGG, and I have captured it into a ksqldb table for use
in a join:
plan_id | agency_id | Start | End |
---|---|---|---|
123 | 987 | 01-Jan-2021 | 30-Apr-2021 |
123 | 456 | 01-Apr-2021 | 30-Dec-2021 |
Ideally, I would want my join to allow me to get the value of agency_id
where the plan
values are equal and the op_ts
is between the start
and end
of the join table:
select plan.plan, plan.name, plan_agency.agency from
plan left join plan_agency on
plan.plan_id = plan_agency.plan_id and
plan.op_ts between plan_agency.start and plan_agency.end;
However, it appears that ksqldb does not (yet) support multiple conditions in the on
clause of a join.
I have attempted to rekey the data in agency
by plan_id
, start
and end
. I have used the scalar function PARSE_TIMESTAMP
to convert the op_ts
from plan
into a TIMESTAMP
, as well as on start
and end
, in the hopes I could use the between
clause on those.
We are also exploring upstream how the data might be simplified.
In the meantime, I thought it might be useful to reach out to you good people. I’m open to any good ideas.