Slow performance on kafka

how is your people performance on kafka ?

i have sometimes a high ms on producer on apps like 5-10 sec!

this isi internal benchmark

im running 3.10 kraft mode k8s 3 node cluster

kafka-producer-perf-test.sh --topic test --num-records 1000000 \
--throughput -1 --producer-props bootstrap.servers=localhost:9092 \ batch.size=1000 acks=1 linger.ms=100000 buffer.memory=4294967296 \ compression.type=text request.timeout.ms=300000 --record-size 1000
15825 records sent, 3163.7 records/sec (3.02 MB/sec), 2167.5 ms avg latency, 3632.0 ms max latency.
15872 records sent, 3159.9 records/sec (3.01 MB/sec), 5220.2 ms avg latency, 8348.0 ms max latency.
50704 records sent, 10130.7 records/sec (9.66 MB/sec), 10231.6 ms avg latency, 12277.0 ms max latency.
38640 records sent, 7728.0 records/sec (7.37 MB/sec), 14150.6 ms avg latency, 16483.0 ms max latency.
53936 records sent, 10761.4 records/sec (10.26 MB/sec), 17935.6 ms avg latency, 20375.0 ms max latency.
59504 records sent, 11877.0 records/sec (11.33 MB/sec), 21704.8 ms avg latency, 24171.0 ms max latency.
48016 records sent, 8287.2 records/sec (7.90 MB/sec), 25574.3 ms avg latency, 28991.0 ms max latency.
66736 records sent, 13347.2 records/sec (12.73 MB/sec), 31376.3 ms avg latency, 32609.0 ms max latency.
108944 records sent, 17183.6 records/sec (16.39 MB/sec), 33180.1 ms avg latency, 36541.0 ms max latency.
110240 records sent, 22048.0 records/sec (21.03 MB/sec), 38206.1 ms avg latency, 39271.0 ms max latency.
229248 records sent, 45849.6 records/sec (43.73 MB/sec), 39345.1 ms avg latency, 39520.0 ms max latency.
63360 records sent, 11212.2 records/sec (10.69 MB/sec), 39541.4 ms avg latency, 43723.0 ms max latency.
1000000 records sent, 15269.040493 records/sec (14.56 MB/sec), 32191.74 ms avg latency, 43900.00 ms max latency, 37720 ms 50th, 43774 ms 95th, 43889 ms 99th, 43897 ms 99.9th.

Wow, this is an very low batch.size a very high buffer.memory and ludicrous linger.ms. The latter setting basically allows to wait up to 100 seconds until a batch is full and allocates 4 GB Ram to buffer messages in the meantime. To reduce the maximal latency reducing linger.ms to something like 50 should help a lot.

The thing that is still puzzling me a bit however is how it can take up to 40 seconds to produce a batch with only 1000 bytes batch.size. How much RAM does your machine have? Is it swapping maybe?

I would expect much better performance with default parameters and then even more throughput but higher latency with the ones in recommended in How to optimize your Kafka producer for throughput using Confluent

If you try it, I would be happy about an update. :slight_smile:

1 Like

ima try set the linger.ms to 50, it got like 2 gb ram for each node so 6 gb for the cluster

this is my current config (i run kafka kraft in k8s)

what would be the best thing to do or change ?
we dont do many bulk messages but mostly single low size messages, our currently latency/duration is around 150 ms for each job , couldt this be better or is this good performance ?

sed -e "s+^node.id=.*+node.id=$NODE_ID+" \
-e "s+^controller.quorum.voters=.*+controller.quorum.voters=$CONTROLLER_QUORUM_VOTERS+" \
-e "s+^listeners=.*+listeners=$LISTENERS+" \
-e "s+^advertised.listeners=.*+advertised.listeners=$ADVERTISED_LISTENERS+" \
-e "s+^listener.security.protocol.map=.*+listener.security.protocol.map=$LISTENER_SECRUITY_PROTOCOOL_MAP+" \
-e "s+^log.dirs=.*+log.dirs=$SHARE_DIR/$NODE_ID+" \
-e "s+^num.replica.fetchers=.*+num.replica.fetchers=3+" \
-e "s+^replication.factor=.*+replication.factor=2+" \
-e "s+^min.insync.replicas=.*+min.insync.replicas=2+" \
-e "s+^num.partitions=.*+num.partitions=3+" \
-e "s+^offsets.topic.replication.factor=.*+offsets.topic.replication.factor=2+" \
-e "s+^linger.ms=.*+linger.ms=20+" \
-e "s+^batch.size=.*+batch.size=150000+" \
-e "s+^num.partitions=.*+num.partitions=3+" \
-e "s+^num.io.threads=.*+num.io.threads=12+" \
-e "s+^num.network.threads=.*+num.network.threads=8+" \
-e "s+^auto.create.topics.enable.*+auto.create.topics.enable=false+" \
-e "s+^acks=.*+acks=1+" \
/opt/kafka/config/kraft/server.properties > server.properties.updated \
&& mv server.properties.updated /opt/kafka/config/kraft/server.properties

can someone elaborate on what does the average latency means here?

Is that the average time between createTime and logAppendTime of message?

this might be helpful

so i tweaked and tuned some random settings to see if it would help, and for some it did as you an see we are getting some lower 100 ms but also some VERY HIGH like above 1 sec

any ides tips or solution how to trace/debug it , try to find out whats happening

@mmuehlbeyer i tried to look into how to set it up but i couldt figure it out …

we are running a local k8s with a 3 node kafka 3.30 cluster

hmm I also would recommend this excellent blog post by
@danicafine and Nikoleta Verbeck

1 Like

that was a REALLY nice blog, gonna go throug it, current issue right now i cant seem to get JMX working inside my k8s kafka cluster … i used to have it working but disabled it … now i cant get to configure the right config for making it work again, as soon its up i can start fine.-tuning and see what is working for kafka and what is actually not

1 Like

im trying to config the producer.config and consumer, im running kafka kraft k8s, whereever i try to set the configs they dont seem to apply ? is it not possible to define when running kafka in k8s ? i tried in env and in the server.properties file when creating the cluster but no kinda effect

another thing, when i connect first time to the kafka cluster with my app latency is like 1.2 sec, then the rest of the requiest is around 128 ms

ive applied the log from kafka

some questions:

  • where do you producer and consumer run?

  • how do you start them?

  • how do you deploy kafka on k8s?

best,
michael

ive uploadede my config for k8s which should anwser all the questions

hi,

thanks one open question:
how do you start your producer and consumer? and where?

didn’t get it from your deploy.sh

how i start them, like how i produce and consume messages from the application using kafka ?

basically you start with the cli tools

though how did you gather your last benchmarks you’ve posted above?

i used the cli tools in kafka binary

ok so the kafka-cli tools are running on you local machine and the kafka cluster
somewhere in k8s right?