Hi folks,
I’m pulling cdc data from sql server using debezium successfully, but when I look at the topic, I see that connect is guessing at the key’s datatype:
print 'prismdata_cdc_prismdata.dbo.Affiliation' from beginning limit 1;
Key format: AVRO or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: AVRO
rowtime: 2021/09/17 15:31:01.064 Z, key: {"Id": "25D87CEE-47AD-4612-91FD-0001A8DA43AB"}, value: {"Id": "25D87CEE-47AD-4612-91FD-0001A8DA43AB",
...
The problem is that when I create a table over this topic a-la:
CREATE TABLE IF NOT EXISTS AffiliationTest ( RowKey VARCHAR PRIMARY KEY) WITH(VALUE_FORMAT='AVRO', KEY_FORMAT='KAFKA', KAFKA_T
OPIC='prismdata_cdc_prismdata.dbo.Affiliation');
The RowKey and the Id are not identical, which breaks any joins that use this table.
select RowKey, Id from AffiliationTest emit changes limit 1;
+----------------------------------------------------------------+----------------------------------------------------------------+
|ROWKEY |ID |
+----------------------------------------------------------------+----------------------------------------------------------------+
|H25D87CEE-47AD-4612-91FD-0001A8DA43AB |25D87CEE-47AD-4612-91FD-0001A8DA43AB
I’m assuming that this is a serialization issue that should be fixed with a key converter in the connector. If I try to add a "key.converter": "org.apache.kafka.connect.storage.StringConverter"
to the connector, the Id of the record becomes a struct:
ROWKEY |ID |
+----------------------------------------------------------------+----------------------------------------------------------------+
|Struct{Id=25D87CEE-47AD-4612-91FD-0001A8DA43AB} |25D87CEE-47AD-4612-91FD-0001A8DA43AB
Hoping that someone can explain what is happening and how to get the key in a state where I can do joins successfully.
Thanks in advance!