Kafka: error publishing request: Failed to allocate memory within the configured max blocking time 100 ms

I am new to kafka and I am getting this error in producer- kafka: error publishing request: Failed to allocate memory within the configured max blocking time 100 ms.
What settings do I need to check?
Why do I get this error, would like to understand if it is configuration issue on Producer (and finetuning needed) or resource issue (jvm)?
If this error is related to network, what steps or logs I need to check to ensure the issue is because of network?

Thanks in advance!!

Producer config settings that specifically relate to this error are max.block.ms and buffer.memory.

The error occurs when a producer send is blocked longer than max.block.ms due to the associated additional buffer memory needed would exceed buffer.memory. Existing message batches that have not been flushed are using that memory. This could be due to producer batching behavior controlled by linger.ms and batch.size and producer request behavior related to max.in.flight.requests.per.connection. It could also be related to the broker ability to process requests received. These requests include those received from producers, consumers, as well as replica fetch requests from other brokers.

Bottom line, there is not a simple answer. More detail is needed. Confluent Control Center is one source for this detail. JMX metrics is another source.

Additional info:

Producer configs - Producer Configurations — Confluent Documentation

Broker configs - Broker Configurations — Confluent Documentation

Monitoring Kafka metrics - Monitoring Kafka — Confluent Documentation

1 Like

Thanks Dave for your explanation. I will go through the references.

I was wandering if ‘producer send is blocked longer than max.block.ms due to associated additional buffer memory needed would exceed buffer.memory’… .
This problem on producer could possibly be because one of the brokers in cluster has problem and hence the messages are not getting processed which is resulting in ‘Failed memory allocation…’ on producer side.
Please suggest…

If the topic replication factor is greater than 1, then if the broker on which the leader replica is located fails, one of the follower replicas will be elected the new leader. This might mean a short interruption for the producer but the default producer and broker configuration settings should be fine with a properly sized cluster.

If the topic replication factor is equal to 1, then it is unavailable to produce to until the broker is recovered. In this case though, I believe a different exception would occur related to the partition leader being unavailable.

Actually, the error initially occured
Failure majorly is:
kafka: error publishing request: Expiring 2 record(s) for topic-8: 5461 ms has passed since last append [This error is found in kafka code -which says two messages when writing to topic with partition 8]

When the request.timeout.ms increased to default 30000ms and hence the ‘failed to allocate memory started showing up…’
So what if broker with partition 8 has some issue while writing messages which in turn resulted the increase in buffer on producer side. by the way replication factor is 3 with acks=all

Have a look at this thread on stackoverflow. It might help.

A co-worker of mine … who is somewhat of a Kafka wizard … pointed out to me that it seems you are running on a Kafka release prior to AK 2.1.0. Kafka now has a delivery.timeout.ms setting that controls the producer retry timeout behavior.

Thanks @dave, yes I am using AK version 1.0.1.

I need guidance on tuning following producer properties or any recommendations. I am sharing my understanding. Please suggest me further on this.

  • batch.size → Given that the message size is 400-900 bytes. batch.size=1000 is low and not optimal for compression. And if it is increased to 80000 still it is too high which could result in waste memory. Should the default value of batch.size should be used and tested and further we can increase it.
  • linger.ms → It would increase latency, so to start what shall i keep it 5 seconds and test.
  • max.block.ms → Default is 1 min, It can have impact on memory in case of retries, what is the suggested value?
  • buffer.memory - increasing will this add to heap usage?

Thanks in advance

I suggest using kafka-producer-perf-test with various values for the settings you mention to identify which combination of values provides a result that comes closest to answering your throughput and latency goal.

Can I run kafka-producer-perf-test in production?

Personally, I wouldn’t do so. As its name implies, it is a tool used for testing and I think most would agree only suitable to be run in a test or development environment.