Problem with clusters

Hi all! I run into this issue frequently: when I reboot or update my kafka servers some of them fail to start due to the cluster id.

  1. I have the cluster ids set the same between all versions
  2. Each Broker has a unique id and I have 3 brokers (0, 1, 2)
  3. I am NOT using the /tmp directory to store my logs. They are in /opt/kafka/kafka-logs
  4. I am running zookeeper on all 3 of my brokers.
  5. All server.properties files have the zookeeper.connect pointing to all 3 zookeeper services
  6. Only 1 service is failing to start properly with the “The Cluster ID Pwzs3pUqSZCJm79gXzJqGw doesn’t match stored clusterId”
  7. The above cluster id was the old one which I blew out to get all of the brokers working together again. Listed below is the new one.
  8. My servers all have the new setting cluster.id=3eS38X7oT0aHo10ZJfo9UQ
  9. Kafka is kafka_2.13-3.5.0
  10. I have no important data here yet. I dont mind blowing everything away and restarting from scratch if needed. I just want this issue to go away and never occur again between upgrades and reboots of my servers. FYI my kafka-logs dir is NOT underneath my install dir so I can upgrade kafka with no issues.
  11. One broker is refusing to come up and its reporting that error that I listed above,
  12. All three brokers are running on different servers and there are no docker containers involved.

What am I doing wrong here?

From step 7:

What was broken in the cluster that caused you to want to change the cluster ID? Were you hitting the InconsistentClusterIdException or something else?

How did you blow away the old one? Bear in mind that the cluster ID lives in a file called meta.properties on the broker in the log.dirs directory, e.g.:

[appuser@broker data]$ cat meta.properties
#
#Fri Jul 12 18:00:16 UTC 2024
broker.id=1
version=0
cluster.id=vM8Y5EE-SZycFGARoLmVAA

And the cluster ID is also stored in ZooKeeper at the /cluster/id node:

get /cluster/id
{"version":"1","id":"vM8Y5EE-SZycFGARoLmVAA"}

Do you see different values for your deployment? This would cause that error. If they are different, you can either make them match to get things working again, or start over with a clean slate (fresh ZooKeeper install).

Thank you! This is some good insight. I will post again once I try and resinstall and get it all working.

So its still not working. I blew away the directories and reinstalled and configured both zookeeper and kafka. I am using kafka_2.13-3.7.1 now. I am pointing to the two other brokers in my cluster which now have the correct ids.
Why isnt it picking up the correct cluster id from the other brokers? Do I need to list the other brokers first in the server.properties file?

Ok I got it working by eliminating the Zookeeper running on my 3rd broker. I updated it to no longer point to localhost but to the other two brokers and kafka came up fine.

I am still interested in knowing why this one failed. You mentioned some sort of cluster id setting in Zookeeper but I had no clue which file to edit to change it. Can you be more explicit?

Thanks!

You would need to use the CLI to edit it. E.g., the get /cluster/id command above was in the zookeeper-shell tool. The corresponding command to set the ID in ZooKeeper is set:

set /cluster/id {"version":"1","id":"<CLUSTER ID>"}

I’m getting this error:
./zookeeper-shell.sh set /cluster/id {“version”:“1”,“id”:“3eS38X7oT0aHo10ZJfo9UQ”}

Connecting to set

ZooKeeper -server host:port [-zk-tls-config-file ] cmd args

[2024-07-17 11:34:48,071] ERROR Unable to resolve address: set/:2181 (org.apache.zookeeper.client.StaticHostProvider)

java.net.UnknownHostException: set: Temporary failure in name resolution

Command not found: Command not found /cluster/id

To get into the shell, run:

zookeeper-shell.sh <ZK host>:<port>

<ZK host>:<port> would probably be localhost:2181.

Then run the set command inside the shell.

Ok I will try that. Thanks so much for this help! It has been a great learning experience!

1 Like