Hi Team,
We recently encountered an issue where there was a communication gap between the Kafka brokers and Zookeeper, leading to aborted internal operations.
Broker Log Observed:
Image: Image: confluentinc/cp-kafka:7.8.1-1-ubi8
java.io.IOException: Packet len 9592748 is out of range!
Zookeeper Log Observed:
Image: confluentinc/cp-zookeeper:7.8.1-1-ubi8
java.io.IOException: Broken pipe
This indicated a failure in communication due to exceeding the buffer size.
Upon investigation, we identified that the default buffer size (jute.maxbuffer) in Zookeeper was insufficient for the size of packets being exchanged.
We have temporarily mitigated the issue by adding the following parameter to the Zookeeper configuration:
-Djute.maxbuffer=49107800
Post this change, the communication between brokers and Zookeeper resumed normally and I/O operations are functioning.
and, can you please provide what is the default value for confluent kafka? since we are using confluent Images.
Following are logs:
Zookeerper:
WARN Close of session 0x30876fc0a100002 (org.apache.zookeeper.server.NIOServerCnxn)
java.io.IOException: Broken pipe
at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(Unknown Source)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
at java.base/sun.nio.ch.IOUtil.write(Unknown Source)
at java.base/sun.nio.ch.IOUtil.write(Unknown Source)
at java.base/sun.nio.ch.SocketChannelImpl.write(Unknown Source)
at org.apache.zookeeper.server.NIOServerCnxn.handleWrite(NIOServerCnxn.java:289)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:366)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:153)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Broker:
WARN Session 0x30876fc0a100003 for server kafka-east-uat-cp-zookeeper-headless/<{IP}>:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn)
java.io.IOException: Packet len 9592748 is out of range!
at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:121)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:84)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)
[2025-04-03 04:38:50,620] WARN [GroupCoordinator 0]: Failed to write empty metadata for group : The group is rebalancing, so a rejoin is needed. (kafka.coordinator.group.GroupCoordinator)
can you please help us investigate what caused the sudden increase in packet size which led to exceeding the Zookeeper buffer limit? Understanding the root cause will help us apply a more permanent fix and avoid such issues in the future.