Multi-column join conditions and date ranges

Hey KSQLDB community:

I have several streams of events from Oracle Golden Gate that I
wish to join in ksqldb (v0.18.0).

There are “Plan” events that we get via CDC from Oracle Golden Gate:

plan_id Name
123 Foo

As you might be aware, CDC events from OGG have a field called op_ts to denote the timestamp which the change was made in Oracle. This “Plan” data is manifested into a ksqldb stream

The other is Plan-Agency “reference data” - this table cross references a “Plan” to an “Agency” for specific date range. This is also from OGG, and I have captured it into a ksqldb table for use
in a join:

plan_id agency_id Start End
123 987 01-Jan-2021 30-Apr-2021
123 456 01-Apr-2021 30-Dec-2021

Ideally, I would want my join to allow me to get the value of agency_id where the plan values are equal and the op_ts
is between the start and end of the join table:

select plan.plan, plan.name, plan_agency.agency from
    plan left join plan_agency on
        plan.plan_id = plan_agency.plan_id and
        plan.op_ts between plan_agency.start and plan_agency.end;

However, it appears that ksqldb does not (yet) support multiple conditions in the on clause of a join.

I have attempted to rekey the data in agency by plan_id, start and end. I have used the scalar function PARSE_TIMESTAMP to convert the op_ts from plan into a TIMESTAMP, as well as on start and end, in the hopes I could use the between clause on those.

We are also exploring upstream how the data might be simplified.

In the meantime, I thought it might be useful to reach out to you good people. I’m open to any good ideas.

@sandonjacobs Right now complex join conditions are not allowed. Ie, no ON x AND y, only “simple” join conditions ON a = b or some expressions ON a+1 = b-1 are supported. There is an effort to improve the documentation on this: https://github.com/confluentinc/ksql/pull/7699

2 Likes

Thanks @rick. Do you think it would be feasible to use a lambda to do the date comparisons? That is something I have not tried yet, thinking that might be a candidate. If not, pretty sure I can gen up something in a Kafka Streams app…

This topic was automatically closed after 30 days. New replies are no longer allowed.