What happens if I increase `log.segment.bytes`

Hi,

we have a topic with a few months of retention and around 20 Tb in Kafka. Because the default log.segment.bytes is around 1Gb and Kafka keeps a file descriptor for every segment - open or closed - this boils down to roughly 20k open file descriptors.

We noticed things like broker restarts scaling with the number of file descriptors. Therefore we were thinking about increasing log.segment.bytes to something like 4Gb. However looking for sources online and in “Kafka - The definitive Guide” we only found the opposite case of lowering log.segment.bytes on low volume topics.

Before doing this in production, I would therefore like to ask whether someone here already has experience with this or knows how Kafka would react to that.

We know from experience with lowering log.segment.bytes that this will have no effect on existing, closed segments. They will remain 1Gb large, but eventually fall out of retention. Also we can leave log.index.size.max.bytes untouched as it should be large enough for 5Gb segments according to the strimzi blog Deep dive into Apache Kafka storage internals: segments, rolling and retention

However

  • is our reasoning correct that this will reduce the number of file descriptors and this in turn improve cluster performance?
  • will this heavily the influence the memory footprint?
  • are there more consequences / side effects we are not aware of?

Hi @maow

from my understanding and the docs it should yes.
I would expect at least a slightly smaller amount of open files

I think so, there will be more memory needed to keep all the files open

one thing which came to my mind is confluent tiered storage
if you’d like to use the feature 4gb would take a bit more time to copy the data to a remote location

in general I think a could way to try this is to change the log.segment.bytes on a “test topic”.

hth,
michael

1 Like

I know it’s an old thread.
but… I was wondering how can I set it to be higher than 1Gb.
In the documentation it’s say it’s an int value.
so 2Gb is the max.
what am I missing here?

you‘re looking for how to increase
log.segment.bytes ?

best, michael

correct.
I have a rate of more than 100Mb/s (much more) which causes too many open files.
I thought that if I increase the segment.bytes to an higher value I can get less open files

Would love to hear about other ways to decrease the number of open files.

I decrease the retention policy to 1d which helped a lot but this is a temporary solution because I would like it to be more

you could increase log.segment.bytes on broker level with server.properties or via segment.bytes on topic level.

how many partitions exist in your cluster?

best,
michael

right. but segment.bytes is an int. so 2Gb is the max.

i am running a cluster of 10 machines, each have 5 SSD disks.
therefore, each topic was defined with 50 partitions - as the number of disks in the cluster.

ok I see
might be worth to try
though keep in mind increasing log file sizing might lead to slower deletion of old segments.

how are you systems configured especially RAM, CPU,…?

I am running 10 m5.4xlarge (16 CPUs , 64 RAM, each machines uses gp3 as primary memory and 5 st1 disk - total of 15TB ).

regarding my original question, how can I increase the segment.bytes to a value higher than 2gb?

not sure if it‘s possible.

let me check
keep you posted.

best,
michael