Using docker compose, I am trying to deploy a 3 node kafka cluster. Randomly 2 nodes are working fine and 3rd node is always getting the following error:
[2023-11-03 04:12:59,865] INFO [broker-1-to-controller-forwarding-channel-manager]: Starting (kafka.server.BrokerToControllerRequestThread)
[2023-11-03 04:12:59,959] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-11-03 04:13:00,026] INFO [RaftManager id=1] Registered the listener org.apache.kafka.image.loader.MetadataLoader@351741309 (org.apache.kafka.raft.KafkaRaftClient)
[2023-11-03 04:13:00,061] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-11-03 04:13:00,162] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don't know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)
[2023-11-03 04:13:00,218] ERROR Encountered fatal fault: Unexpected error in raft IO thread (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
java.lang.IllegalStateException: Received request or response with leader OptionalInt[1] and epoch 18 which is inconsistent with current leader OptionalInt.empty and epoch 0
at org.apache.kafka.raft.KafkaRaftClient.maybeTransition(KafkaRaftClient.java:1513)
at org.apache.kafka.raft.KafkaRaftClient.maybeHandleCommonResponse(KafkaRaftClient.java:1473)
at org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1071)
at org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:1550)
at org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:1676)
at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:2251)
at kafka.raft.KafkaRaftManager$RaftIoThread.doWork(RaftManager.scala:64)
at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:127)
I have 3 dedicated controller nodes running separately. Here is the broker config that I am using for node1 which is down currently:
I have 3 controller nodes and 3 broker nodes.
I am using the same cluster ID for all the 6 nodes. All the nodes have their own docker-compose file.
I am doing the following steps:
deployed all the controller nodes first
then started the deployment of broker nodes. some time randomly node2 is not working and some time node1. Out of 3 nodes only two nodes are showing as working and 3rd is not.
for broker node1, controller quorum voters are controller node2 and 3. And like wise its repeated for all the 3 broker nodes.
Here is the config that I am using for now to configure brokers:
When I am specifying all the 3 controller nodes, its complaining with the following error:
# docker logs kafka-broker-1
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
Running in KRaft mode...
===> Running preflight checks ...
===> Check if /var/lib/kafka/data is writable ...
===> Running in KRaft mode, skipping Zookeeper health check...
===> Using provided cluster id mX-qLvc-T2y2OPeJ3AMRXg ...
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: If process.roles contains just the 'broker' role, the node id 1 must not be included in the set of voters controller.quorum.voters=Set(1, 2, 3) at scala.Predef$.require(Predef.scala:337) at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:2246) at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:2160) at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:1568) at kafka.tools.StorageTool$.$anonfun$main$1(StorageTool.scala:50) at scala.Option.flatMap(Option.scala:283) at kafka.tools.StorageTool$.main(StorageTool.scala:50) at kafka.tools.StorageTool.main(StorageTool.scala)
Let say if I am deploying broker1, the I cant add it for controller 1. Same logic is applicable for all the remaining two nodes as well.
To fix this issue, I changed the ID of all brokers to 101, 102 and 103 and for all controllers 1, 2 and 3. Then added all the controllers to the quorum list for brokers and started the services like: