Join Stream-Table (ORDER-PROVIDER) when the stream does not have the PROVIDER'S ID but others informations

Hi, folks.

To introduce to my problem, first I will show how I could solve this on a relacional database:

But first: I know I just can join things (on KSQLDB) by its PRIMARY KEYS. And that’s my point.

SELECT
	*
FROM
	ORDERS O
INNER JOIN PROVIDER P ON
	O.PROV = P.PROVIDER_COD
	AND O.SUB_COD = P.SUB_COD;

Now let me explain:

I’m getting those data from MS SQL SERVER CDC drive. And according to this database design there’s no primary key any where. So, if I want to join those two tables I have to use two codes, the provider code and provider sub code .

I’ve done a lot, but I can’t figure out how to solve this. I’m almost thinking KSQLD can’t deal with data like this.

One thing I think would work out is setting a composite key on Kafka topic key, but I do not know how to do that.

Could, please, someone help me?

Thanks.

Not sure if I full understand. The title talks about stream-table join, but in the question you mention table-table joins (and that those only support PK-joins: btw: upcoming 0.19 release will include FK-table-table joins).

What is true is, that TABLES in ksqlDB always require a PK that must be stored in the message key in the topic. If your input data has not message key, you would pre-process it, by reading the topic as STREAM and set a key using PARTITION_BY. You can than read the result topic back as TABLE.

Hey… thanks for replying @mjsax .

Sorry for not myself clear.

In fact ORDER and PROVIDER are tables in MS SQL, but once they are in Kafka topics I’m creating a STREAM for ORDER and a TABLE for PROVIDER. (The query I’ve shown before is how I’d do it in MS SERVER).

If you wanna know how I get these data from MS SERVER to Kafka, check a this post I’ve created on stackoverflow apache kafka - KSQLDB - Getting data from debezium cdc source connector and joining Stream with Table - Stack Overflow .

I’m gonna try to explain better my question:

The PROVIDER TABLE does not have a proper PRIMARY KEY, but it has two important fields: provider code and provider sub code . Let’s say both fields could compose a composite primary key, but they are not. (But I’d like to know how to make them a composite primary key on a kafka’s topic row)

The ORDER STREAM also has both fields: provider code and provider sub code.

So, what do I want to?

I want to enrich the ORDER stream with data from PROVIDER table.

What is my problem?

I can’t join data from ORDER stream and PROVIDER table because it does not have a single key which is also present in ORDER stream, instead I have two fields (provider code and provider sub code).

My question:

There’s a way to join ORDER stream and PROVIDER table since I have to join it by using two fields? Can I turn this two fields into a single composite key and then perform the join?

Thanks

Thanks for the details! Atm ksqlDB does not allow to join a multiple columns. It’s a known gap that is already on our roadmap (the issue it, that ksqlDB in the past did not even support composite keys… now that we do support composite keys, we need to extend joins accordingly).

One workaround would be, to “merge” both column into a single column, eg, a VARCHAR: "<provider_code>-<producer_sub_code>" and use that as the PK of the table.

For the stream you don’t need to do anything, as you can specify the required key directly in the join: ON CONCAT(CAST(producer_code AS VARCHAR), '-', CAST(provider_sub_code AS VARCHAR)) = table.primary_key.

1 Like

Thanks, @mjsax . That sounds pretty nice.

Really happy for it. I’m gonna implement like that and wait for future improvements on this area.

Thank you.

1 Like

Hi, again @mjsax

I’m having a problem with the idea you gave me, I can create a concatenated field in the table, but I can’t make it the primary key of the table like you suggested.

I also tried to ‘merge’ the columns using SMT, but I wasn’t successful either.

Could you help me with this, please?

Not sure what you exactly tried?

SMT’s might actually work, but I am not familiar with the details.

If you want to do it with ksqlDB, you would need a “pre processing” step, by first reading the data as a STREAM to produce an output STREAM with the desired key. You can then create the TABLE over the output topic of the second STREAM.

1 Like

Great!!! Finally I did it.

For the purpose of documentation for others:

To rekey a STREAM like suggested by mjsax, you can follow this link:

https://kafka-tutorials.confluent.io/rekey-a-stream/ksql.html

1 Like