Is it possible to create a new active segment without writing to a partition?

Hi,

For GDPR reasons we have many of our long-lived topics controlled by removing records by configuring topics to compact, delete and producing tombstone records for the relevant keys. Configuration and the mechanics of compaction is clear to me and this setup works in most cases.

However, compaction is never allowed to run on active segments. And regardless of how segment roll is configured it appears that unless a record is written to a partition a new segment will not become active. For topics/partitions with low or perhaps even no traffic this means that compaction will in fact never run and records lingers indefinitely.

Is it in fact impossible to create a new empty active segment? If so I guess we need to setup some sort of monitoring of all topics and generate dummy records where appropriate.

Thankful for any input.

What configuration setting are you using to control segment roll? AFAIK, you can force a log to roll after a given period of time using segment.ms which seems to me should satisfy your requirement.

Hi,

Thanks for your reply. This setting is indeed used to control how often a segment should roll. However, after those milliseconds passes, and a new segment is created, it won’t become active until a record is written to it. Or, perhaps, if a new segment should be created or not (i.e. has enough ms passed) is only tested when a new record is written. I’m not familiar enough with the internals to know. Behaviorally it is all the same. When back to the computer I can post an example of what I mean to further clarify.

Consider the following log:

$ kcat -b kafka:9092 -t foo -C -K:
1:hello
2:world
1:

the key 1 won’t be compacted unless I publish a dummy message after 1:null :

# `segment.ms` has passed
$ echo "3:compact" | kcat -b kafka:9092 -t foo -P -K:
$ kcat -b kafka:9092 -t foo -C -K:
2:world
1:
3:compact

I investigated further and learned that Kafka rolls a segment when a new record’s timestamp differs from the timestamp of the first record in the segment by more than segment.ms. So, if no new records are appended, the active segment won’t be rolled.

1 Like

Thanks for the help Dave! A bit unfortunate conclusion but now I know and can plan accordingly.

Thanks again!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.