Hi,
had a(nother) weird issue in our test cluster just now -
I was trying to look at broker/topic default values to find reasons for our performance issue.
Wehn connecting to the broker and running any command (kafka_config, kafka_topics, anything) I was getting a Java Null Pointer exception:
sh-4.4$ ./kafka-topics --list
java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Unknown Source)
at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
at java.base/java.nio.file.Path.of(Unknown Source)
at java.base/java.nio.file.Paths.get(Unknown Source)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Unknown Source)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(Unknown Source)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(Unknown Source)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(Unknown Source)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(Unknown Source)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(Unknown Source)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(Unknown Source)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(Unknown Source)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(Unknown Source)
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(Unknown Source)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(Unknown Source)
at java.base/jdk.internal.platform.SystemMetrics.instance(Unknown Source)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Unknown Source)
at java.base/jdk.internal.platform.Container.metrics(Unknown Source)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(Unknown Source)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(Unknown Source)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(Unknown Source)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(Unknown Source)
at jdk.management.agent/sun.management.jmxremote.ConnectorBootstrap.startLocalConnectorServer(Unknown Source)
at jdk.management.agent/jdk.internal.agent.Agent.startLocalManagementAgent(Unknown Source)
at jdk.management.agent/jdk.internal.agent.Agent.startAgent(Unknown Source)
at jdk.management.agent/jdk.internal.agent.Agent.startAgent(Unknown Source)
Exception thrown by the agent : java.lang.NullPointerException
I thought it might be networking or an error, so I looked at that, and at the broker logs, but nothing… Didnt see any other issue, client app was running fine, metrics were gathered fine…
I started searching the internet but no realy simple solution came up except a “restarted everything and it worked again” so thats what I did, I restarted the broker on the affected box.
And it helped, working fine again.
So the question now is - what the *** is the problem here?
I mean a central component like kafka should not become instable by itself, thats not leaving a good impression
I upgraded to “release”: “7.8.1-37” sometime last week to see if it had any fixes for the perf issue (it didnt)
Any idea what might have happened here ?
Thanks