Topic cleanup delete policy

I created a topic with the following configurations -

cleanup.policy = delete
segment.bytes = 16328
retention.ms = 300000

After inserting 500 messages I expected the messages to be deleted after 5 minutes however this was not the case. The messages were deleted after 8 minutes.
I then created another topic with retention.ms = 120000 and observed that the same 500 messages were deleted after 5 minutes.

Could anyone please clarify as to where the additional 3 minutes are being added to the retention.ms config that is specified at the time of topic creation?

Great question! And one that gets asked fairly often.

It’s important to remember with the log.cleaner that the cleaning(deletes) don’t happen in deterministic time. Meaning it won’t happen exactly when you expect, the messages will stay at least as long as your retention.ms, then be eligible for deletion.

Another factor is that the active segment, the segment currently being written to, will not be cleaned. A segment roll has to happen for the log.cleaner to being the cleanup/deletion process.

Given that you were setting the segment.bytes artificially low(small note: given that size of segment you can easily cause performance issues), you should probably check your logs for when the segment rolled in relation to when the data was cleaned up. That probably explains the +3 minutes.

In addition to this the log.cleaner could be slower because of load, log.cleaner.io.max.bytes.per.second is the setting that would govern the speed of the log cleaner threads. log.cleaner.backoff.ms is another suspect, you would need to check your configurations to see if this has been set higher.

5 Likes

Thanks @mitchell-h for your prompt and detailed response. It clarifies some of my doubts. I do have a couple of follow up questions which are -

  1. The delay of +3 minutes (or any other non-deterministic value) in deletion is not observed when cleanup.policy = compact,delete and I send the same 500 messages.

  2. The segment.bytes=16328 config was just for testing purpose to observe the inner workings. From your point - “Another factor is that the active segment, the segment currently being written to, will not be cleaned” - Even when there was only 1 segment file (.log file), it got cleaned up but only after a certain duration of inactivity (4-6 mins in our case). When the segment.bytes was reduced to create multiple log segments and rollover kept occurring, the log.cleaner became active and cleaned up more frequently than what was observed with one segment file. Is this a correct interpretation of what you mentioned in your reply?

To reiterate, I am doing these tests just to understand better and not because of having any doubts about the internal workings! :slight_smile:

  1. You can make it fairly deterministic if for compaction using the max.compaction.lag KIP-354: Add a Maximum Log Compaction Lag - Apache Kafka - Apache Software Foundation configuration.

note: one of the cooler features of KIP-354 is that it will compact the active segment.
2) Yes.

1 Like

@mitchell-h Thanks a lot for your response. The info in kip 354 was helpful. I do have a last couple of queries -

  1. You mentioned that KIP 354 has a feature that compacts active segments however, in my testing I observed that this was not the case. Here are my topic configurations -
    cleanup.policy=compact
    segment.bytes=16328000
    min.compaction.lag.ms=10000
    max.compaction.lag.ms=15000
    min.cleanable.dirty.ratio=0.01
    delete.retention.ms=5000
    I sent 500 messages, waited for 5 seconds and sent the same 500 messages again. I also observed that all the messages were stored in the same .log file. Is there anything I’m missing?

  2. I’m still not sure as to why the hybrid cleanup policy deletes messages instantly while normal delete cleanup policy does not. Here are the topic configs I set for hybrid policy -
    cleanup.policy=compact,delete
    segment.bytes=16328
    retnetion.ms=5000
    I send 500 messages which get deleted within a couple of seconds. Is there any config that is causing this that I’m not aware of?