I am working on DB CDC to Kafka use case. Initially table A is getting populated with multiple records. After some time (varies from days to weeks) Table B is getting populated with single record. Both table has joining columns request_id.
I am creating KTable on TableA to hold data until TableB is populated. And Planning to join TableB KStream with TableA KTable.
How can I join KStream one record with KTable multiple records?
@masayyed you’ve posted this in #ksqldb , but mention KStreams and KTables - can you just confirm if you’re looking for an answer in Kafka Streams or with ksqlDB? thanks
@masayyed Sounds like what you want to do is a normal stream to table join, where your CDC table A goes to Topic A, and your table B goes to Topic B. You would want to stream your data via CDC (Debezium?) into your two Kafka topics, then create a Table in KSQLDB over Topic A and a Stream over Topic B. Then use a stream to table join which is neatly documented over here: Join Event Streams - ksqlDB Documentation . Take note of the partitioning requirements for your topics.
@Charlla Thanks for the reply. I Am using third party tool on Oracle database for CDC.
As I mentioned Table A has multiple records for same request_id and Table B has single record.
How can I achieve 1 (kstream-topicB) to many (ktable-topicA) join ?
If the join attribute is the key in TableB, you could mimic the join with a stream-table join by processing the changelog of TableA as STREAM, and processing TableB as TABLE.
Otherwise, you cannot do the join atm with ksqlDB, because non-key FK table-table joins are not supported yet. There is a ticket for it though: https://github.com/confluentinc/ksql/issues/4424
As mentioned in given link Support non-key joins. #4424, I have fixed my issue by using collect_set.
But I think we need non-key join support to fix this issue properly.
Thanks.