Hi,
this topic is basically about the observation that delete marker records (tombstones) are not deleted as expected.
I have the following initial situation (using Kafka 2.6)
1 topic with 1 partition
Offsets: Partition: 0; low: 2; high: 947; offset: 947; #(high-low): 945
Number of not null records: 5
Number of all records : 270
Topic configuration is this:
cleanup.polic=compact
min.compaction.lag.ms=3.600.000
min.cleanable.dirty.ratio=0.5
segment.ms=3.600.000
delete.retention.ms=86.400.000
The log files of this partition are
43029 Nov 21 20:55 00000000000000000000.log
326 Nov 22 12:14 00000000000000000937.log
772 Nov 24 15:11 00000000000000000939.log
561 Nov 24 16:17 00000000000000000941.log
174 Nov 24 17:17 00000000000000000943.log
772 Dec 17 11:38 00000000000000000944.log
174 Dec 17 12:38 00000000000000000946.log
The oldest record (tombstone) is from 2021-06-21. The last offsets are (distinguished between tombstone and data record):
936 tombstone (delete marker)
937 tombstone (delete marker)
938 tombstone (delete marker)
939 data
940 data
941 tombstone (delete marker)
942 data
943 tombstone (delete marker)
944 data
945 data
946 tombstone (delete marker) 2021-12-17
All older records are tombstone records. So, there is a huge amount of tombstone records that are not deleted although delete.retention.ms
is one day. The *000.log
and *937.log
files include only tombstones and also the timestamps of these files are very old.
My questions is, why are the old tombstones not deleted? Trying to find an answer to this questions lead to the blog post âKafka quirks: tombstones that refuse to disappearâ and Kafka Issue KAFKA-8522.
But I am not sure if my observation fits to the post and issue. Espacially changing delete.retention.ms
to â0â did not change anything on my system.
Any explanation is really welcome. Hints to overcome this situation even more as we have topics with high throughput and lots of tombstone message which seem to gather at the beginning of the log.
Thanks and kind regards
Franz