Hi, How can I remove duplicate records in a kstream or ktable based on the values of two fields in the value portion of key, value pair. The value is a deserialized json record. I want to use two fields from this record to do a group by operation. I did try it with one of the fields but it does not seem to work. The resulting ktable still has all the duplicate rows.
Do I get it right.
Messages:
msg1 - value: {“fld1”:“val1”, “fld2”:“val2”,“fld3”:“foo”}
msg2 - value: {“fld1”:“val1”, “fld2”:“val2”,“fld3”:“bar”}
Goal: ignore msg2, because fld1 and fld2 are identical to msg1?
Did I understand that correct?
Do both records share the same key?
No something like this,
key1, value1
key1, value2
key2, value1
key1, value2
key3, value1
key2, value1
and I want
key1, value1
key1, value2
key2, value1
key3, value1
Figured it out. I implemented a key, value state store to do the deduplication. Works like a charm.
There is actually an example for de-duplication on GitHub: kafka-streams-examples/EventDeduplicationLambdaIntegrationTest.java at a2962a8e9d01477390c60ec6ba5cd1ee9ae8d880 · confluentinc/kafka-streams-examples · GitHub
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.