java.io.IOException: Packet len 9592748 is out of range! (zk- broker not communicating)

SaiKrishnaNeeli · 4 April 2025 08:35

Hi Team,

We recently encountered an issue where there was a communication gap between the Kafka brokers and Zookeeper, leading to aborted internal operations.

Broker Log Observed:
Image: Image: confluentinc/cp-kafka:7.8.1-1-ubi8
java.io.IOException: Packet len 9592748 is out of range!

Zookeeper Log Observed:
Image: confluentinc/cp-zookeeper:7.8.1-1-ubi8
java.io.IOException: Broken pipe

This indicated a failure in communication due to exceeding the buffer size.
Upon investigation, we identified that the default buffer size (jute.maxbuffer) in Zookeeper was insufficient for the size of packets being exchanged.

We have temporarily mitigated the issue by adding the following parameter to the Zookeeper configuration:
-Djute.maxbuffer=49107800
Post this change, the communication between brokers and Zookeeper resumed normally and I/O operations are functioning.

and, can you please provide what is the default value for confluent kafka? since we are using confluent Images.

Following are logs:
Zookeerper:
WARN Close of session 0x30876fc0a100002 (org.apache.zookeeper.server.NIOServerCnxn)
java.io.IOException: Broken pipe
at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(Unknown Source)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at java.base/sun.nio.ch.IOUtil.write(Unknown Source)
at java.base/sun.nio.ch.IOUtil.write(Unknown Source)
at java.base/sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at org.apache.zookeeper.server.NIOServerCnxn.handleWrite(NIOServerCnxn.java:289)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:366)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:153)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)

Broker:
WARN Session 0x30876fc0a100003 for server kafka-east-uat-cp-zookeeper-headless/<{IP}>:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn)
java.io.IOException: Packet len 9592748 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:121)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:84)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)
[2025-04-03 04:38:50,620] WARN [GroupCoordinator 0]: Failed to write empty metadata for group : The group is rebalancing, so a rejoin is needed. (kafka.coordinator.group.GroupCoordinator)

can you please help us investigate what caused the sudden increase in packet size which led to exceeding the Zookeeper buffer limit? Understanding the root cause will help us apply a more permanent fix and avoid such issues in the future.

mmuehlbeyer · 4 April 2025 08:36

hey @SaiKrishnaNeeli

could you share some config and details about your setup?
is this docker based?

why did you start with zookeeper? recommended approach would be to switch to kraft instead of zookeeper.

SaiKrishnaNeeli · 4 April 2025 09:23

Thanks for replying back @mmuehlbeyer
We are using default configurations for both broker and zookeeper.

Yes, we’re currently using Docker-based Confluent Platform images version for deploying in kubernetes using Helm charts.

We have 5 brokers and 3 zookeeper pods.
Images:
confluentinc/cp-zookeeper:7.8.1-1-ubi8
confluentinc/cp-kafka:7.8.1-1-ubi8

As part of our roadmap, we’re planning to migrate to KRaft mode from ZK mode with Apache Kafka 4.0 in the upcoming release. At the moment, we’re actively testing the transition from ZooKeeper mode to KRaft mode in our development environment, ensuring a seamless migration without any data loss.

mmuehlbeyer · 4 April 2025 09:51

I see are you deploying with Confluent for Kubernetes?

and could you check the topic.config.sync.interval.ms parameter?

SaiKrishnaNeeli · 4 April 2025 11:05

We’re not using the CFK (Confluent for Kubernetes) operator. Instead, we deploy Confluent Platform components—Kafka, ZooKeeper, Schema Registry, Kafka REST, KSQL, and MirrorMaker 2—directly using Helm with custom configurations. For cross-cluster topic replication, we’re not using Kafka Replicator; rather, we rely on MirrorMaker 2 with the following configuration.

“sync.topic.configs.enabled”: “true”,
“sync.topic.configs.interval.seconds”: 60,

mmuehlbeyer · 4 April 2025 12:00

ok I see
hmm need to dig around a bit

SaiKrishnaNeeli · 4 April 2025 12:20

Thanks for the update @mmuehlbeyer Please let me know if you need any further information or details from my side.

Topic		Replies	Views
Kafka Broker failed to connect to Zookeeper Confluent Cloud	0	114	5 September 2024
Serializartion of Kafka metrics java.lang.IllegalArgumentException: Message size to large? Confluent Cloud	0	3007	30 August 2022
Serialization exception for Confluent Cloud Confluent Cloud	0	2813	30 August 2022
Brokers failing to connect to zookeeper Ops	14	4791	11 July 2022
Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected Java Clients	0	14524	27 July 2021

java.io.IOException: Packet len 9592748 is out of range! (zk- broker not communicating)

Related topics