Foreign Key join - schema change

As i understand from this article Real-Time Data Enrichment with Kafka Streams: Introducing Foreign-Key Joins the foreign key join algorithm is using hashing on the left table value.
My question is what happens if the value schema of the left table is changed, can it influences the join?

we are using foreign key join in our code between two tables and it stops working properly. when we update the right table (account tables)nothing is triggered, although there is a matching account in the customer table.
It is fixed after we update the left table(customer table).
I noticed that there was a schema change in the left table value and i wonder if it may be the source of the problem.

fun joinCustomerAndAccount(
    customersTable: KTable<CustomerKey, CustomerAddressPhoneDrivingLicenseJoin>,
    accountsTable: KTable<AccountKey, Account>
): KTable<CustomerKey, CustomerAccountJoin> =
    customersTable.leftJoin(
        accountsTable,
        { customer -> AccountKey(customer.customer.userId.orElse(-1)) },
        { customer, account ->
            CustomerAccountJoin.newBuilder().apply {
                this.customer = customer.customer
                this.address = customer.address.orElse(null)
                this.phone = customer.phone.orElse(null)
                this.drivingLicense = customer.drivingLicense.orElse(null)
                this.account = account
            }.build()
        },
        materializedAs(Stores.CUSTOMER_ACCOUNTS)
    )

Hi omritzem,

Apologies for the delayed response. I think that’s possible, but I’ll have to look into it a little bit to be sure.

-Bill

1 Like

Hi @omritzem , Bill flagged me to check this out.

The hash function is based solely on the foreign key value extracted from the value of the left record. If this hasn’t changed, then the partition mapping shouldn’t change.

Can you elaborate on what you mean by the value schema of the left table changing, in particular, for KTable<CustomerKey, CustomerAddressPhoneDrivingLicenseJoin>)? Is it the primary key that changes (CustomerKey)? Did anything around the extracted Customer’s AccountKey change?

I also take it that the right KTable<AccountKey, Account> remains completely unchanged in this scenario.

So long as the foreign key value AND the primary key (and primary key schema) haven’t changed, I don’t think that a left-hand value schema change could affect the join. Any more information you have would be helpful in figuring this out, thanks.

1 Like

I think it’s about the hashing? [KAFKA-13386] Foreign Key Join filtering out valid records after a code change / schema evolved - ASF JIRA

thanks @mjsax for referencing this issue.

Ah there, thanks for the save @mjsax !