What happens if I increase `log.segment.bytes`

maow · 22 August 2022 09:50

Hi,

we have a topic with a few months of retention and around 20 Tb in Kafka. Because the default log.segment.bytes is around 1Gb and Kafka keeps a file descriptor for every segment - open or closed - this boils down to roughly 20k open file descriptors.

We noticed things like broker restarts scaling with the number of file descriptors. Therefore we were thinking about increasing log.segment.bytes to something like 4Gb. However looking for sources online and in “Kafka - The definitive Guide” we only found the opposite case of lowering log.segment.bytes on low volume topics.

Before doing this in production, I would therefore like to ask whether someone here already has experience with this or knows how Kafka would react to that.

We know from experience with lowering log.segment.bytes that this will have no effect on existing, closed segments. They will remain 1Gb large, but eventually fall out of retention. Also we can leave log.index.size.max.bytes untouched as it should be large enough for 5Gb segments according to the strimzi blog Deep dive into Apache Kafka storage internals: segments, rolling and retention

However

is our reasoning correct that this will reduce the number of file descriptors and this in turn improve cluster performance?
will this heavily the influence the memory footprint?
are there more consequences / side effects we are not aware of?

mmuehlbeyer · 23 August 2022 06:29

Hi @maow

from my understanding and the docs it should yes.
I would expect at least a slightly smaller amount of open files

I think so, there will be more memory needed to keep all the files open

one thing which came to my mind is confluent tiered storage
if you’d like to use the feature 4gb would take a bit more time to copy the data to a remote location

in general I think a could way to try this is to change the log.segment.bytes on a “test topic”.

hth,
michael

ido · 6 April 2023 20:25

I know it’s an old thread.
but… I was wondering how can I set it to be higher than 1Gb.
In the documentation it’s say it’s an int value.
so 2Gb is the max.
what am I missing here?

mmuehlbeyer · 6 April 2023 20:27

you‘re looking for how to increase
log.segment.bytes ?

best, michael

ido · 6 April 2023 20:30

correct.
I have a rate of more than 100Mb/s (much more) which causes too many open files.
I thought that if I increase the segment.bytes to an higher value I can get less open files

ido · 6 April 2023 20:33

Would love to hear about other ways to decrease the number of open files.

I decrease the retention policy to 1d which helped a lot but this is a temporary solution because I would like it to be more

mmuehlbeyer · 6 April 2023 20:37

you could increase log.segment.bytes on broker level with server.properties or via segment.bytes on topic level.

how many partitions exist in your cluster?

best,
michael

ido · 6 April 2023 20:38

right. but segment.bytes is an int. so 2Gb is the max.

i am running a cluster of 10 machines, each have 5 SSD disks.
therefore, each topic was defined with 50 partitions - as the number of disks in the cluster.

mmuehlbeyer · 6 April 2023 20:53

ok I see
might be worth to try
though keep in mind increasing log file sizing might lead to slower deletion of old segments.

how are you systems configured especially RAM, CPU,…?

ido · 6 April 2023 21:00

I am running 10 m5.4xlarge (16 CPUs , 64 RAM, each machines uses gp3 as primary memory and 5 st1 disk - total of 15TB ).

ido · 6 April 2023 21:04

regarding my original question, how can I increase the segment.bytes to a value higher than 2gb?

mmuehlbeyer · 6 April 2023 22:48

not sure if it‘s possible.

let me check
keep you posted.

best,
michael

Topic		Replies	Views
Topic retention priority Ops	1	3722	16 February 2021
Limits of topic log compaction Architecture and Design	5	4309	23 July 2021
Kafka Broker -java.io.IOException: No space left on device Containers	3	4773	25 April 2022
Confluent Kafka as a logs caching server Lounge	3	5025	25 January 2022
__consumer_offsets topic with very big partitions] Confluent Cloud	1	4108	11 June 2021

What happens if I increase `log.segment.bytes`

Related topics