GitHub Issue: Tombstone message in Table when filtering duplicate events · Issue #8145 · confluentinc/ksql · GitHub
Provide details of the setup you’re running
ksql> version
Version: 0.21.0-rc6
Outline your question
I am trying to filter duplicate events and end up seeing TOMBSTONE messages - something I was not expecting to see in the output.
Below is my setup
## create a stream so store vehicle location
CREATE STREAM VEHICLE_LOCATION (
VIN VARCHAR,
LOCATION_NAME VARCHAR
) WITH (
KAFKA_TOPIC = 'vehicle-location',
FORMAT = 'JSON',
PARTITIONS = 3
);
## insert few values into the stream
INSERT INTO VEHICLE_LOCATION(VIN, LOCATION_NAME) VALUES ('2G1WL54T4R9165225', 'DALLAS');
INSERT INTO VEHICLE_LOCATION(VIN, LOCATION_NAME) VALUES ('2G1WL54T4R9165225', 'DALLAS');
INSERT INTO VEHICLE_LOCATION(VIN, LOCATION_NAME) VALUES ('2G1WL54T4R9165225', 'DALLAS');
INSERT INTO VEHICLE_LOCATION(VIN, LOCATION_NAME) VALUES ('2G1WL54T4R9165225', 'HOUSTON');
INSERT INTO VEHICLE_LOCATION(VIN, LOCATION_NAME) VALUES ('2G1WL54T4R9165225', 'HOUSTON');
INSERT INTO VEHICLE_LOCATION(VIN, LOCATION_NAME) VALUES ('2G1WL54T4R9165225', 'HOUSTON');
## create a table to filter the duplicates
CREATE TABLE DETECTED_VEHICLE_LOCATION AS
SELECT
VIN AS KEY1,
LOCATION_NAME AS KEY2,
AS_VALUE(VIN) AS VIN,
AS_VALUE(LOCATION_NAME) AS LOCATION_NAME
FROM VEHICLE_LOCATION
GROUP BY VIN, LOCATION_NAME
HAVING COUNT(VIN) = 1;
## query for distinct values from the table
SELECT * FROM DETECTED_VEHICLE_LOCATION
WHERE VIN IS NOT NULL
EMIT CHANGES;
But the output contains TOMBSTONE messages. I am confused as to why we see “one” TOMBSTONE and who/what part of query is deleting the messages and why ? Also shouldn’t there be
- 2 TOMBSTONE messages per location
- Also, why does “WHERE VIN IS NOT NULL” not filter the output to not show the tombstone messages?
+------------------------------+------------------------------+------------------------------+------------------------------+
|KEY1 |KEY2 |VIN |LOCATION_NAME |
+------------------------------+------------------------------+------------------------------+------------------------------+
|2G1WL54T4R9165225 |DALLAS |2G1WL54T4R9165225 |DALLAS |
|2G1WL54T4R9165225 |DALLAS |<TOMBSTONE> |<TOMBSTONE> |
|2G1WL54T4R9165225 |HOUSTON |2G1WL54T4R9165225 |HOUSTON |
|2G1WL54T4R9165225 |HOUSTON |<TOMBSTONE> |<TOMBSTONE> |
And below is the topic that contains the messages including TOMBSTONE messages
PRINT 'DETECTED_VEHICLE_LOCATION' FROM BEGINNING LIMIT 6;
Key format: JSON or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: JSON or KAFKA_STRING
rowtime: 2021/09/17 18:09:16.087 Z, key: {"KEY1":"2G1WL54T4R9165225","KEY2":"HOUSTON"}, value: {"VIN":"2G1WL54T4R9165225","LOCATION_NAME":"HOUSTON"}, partition: 0
rowtime: 2021/09/17 18:09:16.276 Z, key: {"KEY1":"2G1WL54T4R9165225","KEY2":"HOUSTON"}, value: <null>, partition: 0
rowtime: 2021/09/17 18:09:15.420 Z, key: {"KEY1":"2G1WL54T4R9165225","KEY2":"DALLAS"}, value: {"VIN":"2G1WL54T4R9165225","LOCATION_NAME":"DALLAS"}, partition: 1
rowtime: 2021/09/17 18:09:15.687 Z, key: {"KEY1":"2G1WL54T4R9165225","KEY2":"DALLAS"}, value: <null>, partition: 1
Much appreciated.