Kafka: error publishing request: Failed to allocate memory within the configured max blocking time 100 ms

guneetbh · 16 February 2021 08:29

I am new to kafka and I am getting this error in producer- kafka: error publishing request: Failed to allocate memory within the configured max blocking time 100 ms.
What settings do I need to check?
Why do I get this error, would like to understand if it is configuration issue on Producer (and finetuning needed) or resource issue (jvm)?
If this error is related to network, what steps or logs I need to check to ensure the issue is because of network?

Thanks in advance!!

dave · 22 February 2021 15:35

Producer config settings that specifically relate to this error are max.block.ms and buffer.memory.

The error occurs when a producer send is blocked longer than max.block.ms due to the associated additional buffer memory needed would exceed buffer.memory. Existing message batches that have not been flushed are using that memory. This could be due to producer batching behavior controlled by linger.ms and batch.size and producer request behavior related to max.in.flight.requests.per.connection. It could also be related to the broker ability to process requests received. These requests include those received from producers, consumers, as well as replica fetch requests from other brokers.

Bottom line, there is not a simple answer. More detail is needed. Confluent Control Center is one source for this detail. JMX metrics is another source.

Additional info:

Producer configs - Kafka producer configuration reference | Confluent Documentation

Broker configs - Kafka broker and controller configuration reference | Confluent Documentation

Monitoring Kafka metrics - Monitoring Kafka with JMX | Confluent Documentation

guneetbh · 22 February 2021 18:27

Thanks Dave for your explanation. I will go through the references.

I was wandering if ‘producer send is blocked longer than max.block.ms due to associated additional buffer memory needed would exceed buffer.memory’… .
This problem on producer could possibly be because one of the brokers in cluster has problem and hence the messages are not getting processed which is resulting in ‘Failed memory allocation…’ on producer side.
Please suggest…

dave · 22 February 2021 19:11

If the topic replication factor is greater than 1, then if the broker on which the leader replica is located fails, one of the follower replicas will be elected the new leader. This might mean a short interruption for the producer but the default producer and broker configuration settings should be fine with a properly sized cluster.

If the topic replication factor is equal to 1, then it is unavailable to produce to until the broker is recovered. In this case though, I believe a different exception would occur related to the partition leader being unavailable.

guneetbh · 23 February 2021 05:53

Actually, the error initially occured
Failure majorly is:
kafka: error publishing request: Expiring 2 record(s) for topic-8: 5461 ms has passed since last append [This error is found in kafka code -which says two messages when writing to topic with partition 8]

When the request.timeout.ms increased to default 30000ms and hence the ‘failed to allocate memory started showing up…’
So what if broker with partition 8 has some issue while writing messages which in turn resulted the increase in buffer on producer side. by the way replication factor is 3 with acks=all

dave · 23 February 2021 17:32

Have a look at this thread on stackoverflow. It might help.

dave · 23 February 2021 18:30

A co-worker of mine … who is somewhat of a Kafka wizard … pointed out to me that it seems you are running on a Kafka release prior to AK 2.1.0. Kafka now has a delivery.timeout.ms setting that controls the producer retry timeout behavior.

guneetbh · 3 March 2021 11:36

Thanks @dave, yes I am using AK version 1.0.1.

I need guidance on tuning following producer properties or any recommendations. I am sharing my understanding. Please suggest me further on this.

batch.size → Given that the message size is 400-900 bytes. batch.size=1000 is low and not optimal for compression. And if it is increased to 80000 still it is too high which could result in waste memory. Should the default value of batch.size should be used and tested and further we can increase it.
linger.ms → It would increase latency, so to start what shall i keep it 5 seconds and test.
max.block.ms → Default is 1 min, It can have impact on memory in case of retries, what is the suggested value?
buffer.memory - increasing will this add to heap usage?

Thanks in advance

dave · 3 March 2021 13:53

I suggest using kafka-producer-perf-test with various values for the settings you mention to identify which combination of values provides a result that comes closest to answering your throughput and latency goal.

guneetbh · 3 March 2021 15:12

Can I run kafka-producer-perf-test in production?

dave · 3 March 2021 16:09

Personally, I wouldn’t do so. As its name implies, it is a tool used for testing and I think most would agree only suitable to be run in a test or development environment.

Topic		Replies	Views
Spikes in record-queue-time-max on kafka producer Java Clients	0	612	23 April 2024
Kafka Streams applications cause high memory utilization on the Kafka Broker Kafka Streams	3	1731	11 March 2024
Broker performance issues caused by too many locks on log.append Ops	5	2235	21 March 2024
Kafka RAM consumption Ops	1	3037	30 March 2023
Batches in kafka producer client Clients	0	3056	21 July 2022

Kafka: error publishing request: Failed to allocate memory within the configured max blocking time 100 ms

Related topics