When data retention for a Kafka topic is configured both in terms of time and size, then Kafka will purge old data* whenever any of the criteria is met.
- If the size limit was hit but the time limit was not, the size limit is used, and purging of old data* will begin.
- If the time limit was hit but the size limit was not, the time limit is used, and purging of old data* will begin.
So if we ingest like a gig of data into topic within a matter of minutes, how will the log cleaner handle such scenarios?
This means the log cleaner will begin purging data from inactive segments, because 1,000 MB of input data exceeds the configured size limit of 50 MB. The time limit of 1 day will not factor into the equation in this specific scenario (because size limit is hit first).
*The log cleaner will purge data only from inactive (older) segments, it will never purge data from the currently active segments. In Kafka, a topic-partition is actually composed of a sequence of so-called segment files. The active segment of a topic-partition is the one segment file to which incoming data is currently being written to (which is also the newest segment for that partition).