Null pointer exception on any command in broker

Hi,

had a(nother) weird issue in our test cluster just now -
I was trying to look at broker/topic default values to find reasons for our performance issue.

Wehn connecting to the broker and running any command (kafka_config, kafka_topics, anything) I was getting a Java Null Pointer exception:

sh-4.4$ ./kafka-topics --list
java.lang.NullPointerException
        at java.base/java.util.Objects.requireNonNull(Unknown Source)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(Unknown Source)
        at java.base/java.nio.file.Path.of(Unknown Source)
        at java.base/java.nio.file.Paths.get(Unknown Source)
        at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(Unknown Source)
        at java.base/java.security.AccessController.doPrivileged(Unknown Source)
        at java.base/jdk.internal.platform.CgroupUtil.readStringValue(Unknown Source)
        at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(Unknown Source)
        at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(Unknown Source)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(Unknown Source)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(Unknown Source)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(Unknown Source)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getInstance(Unknown Source)
        at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(Unknown Source)
        at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(Unknown Source)
        at java.base/jdk.internal.platform.CgroupMetrics.getInstance(Unknown Source)
        at java.base/jdk.internal.platform.SystemMetrics.instance(Unknown Source)
        at java.base/jdk.internal.platform.Metrics.systemMetrics(Unknown Source)
        at java.base/jdk.internal.platform.Container.metrics(Unknown Source)
        at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(Unknown Source)
        at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(Unknown Source)
        at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(Unknown Source)
        at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(Unknown Source)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(Unknown Source)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
        at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(Unknown Source)
        at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
        at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
        at java.base/java.util.stream.ReferencePipeline.forEach(Unknown Source)
        at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(Unknown Source)
        at jdk.management.agent/sun.management.jmxremote.ConnectorBootstrap.startLocalConnectorServer(Unknown Source)
        at jdk.management.agent/jdk.internal.agent.Agent.startLocalManagementAgent(Unknown Source)
        at jdk.management.agent/jdk.internal.agent.Agent.startAgent(Unknown Source)
        at jdk.management.agent/jdk.internal.agent.Agent.startAgent(Unknown Source)
Exception thrown by the agent : java.lang.NullPointerException

I thought it might be networking or an error, so I looked at that, and at the broker logs, but nothing… Didnt see any other issue, client app was running fine, metrics were gathered fine…

I started searching the internet but no realy simple solution came up except a “restarted everything and it worked again” so thats what I did, I restarted the broker on the affected box.
And it helped, working fine again.
So the question now is - what the *** is the problem here?
I mean a central component like kafka should not become instable by itself, thats not leaving a good impression

I upgraded to “release”: “7.8.1-37” sometime last week to see if it had any fixes for the perf issue (it didnt)

Any idea what might have happened here ?
Thanks

hi @Rand
anything in the brokers or controllers logs?

best,
michael

Hi,

no,
controller only saying this upon broker restart (that is still an unanswered question btw how that can happen with identical images):

[2025-02-14 14:00:33,764] WARN [QuorumController id=1] Broker 6 registered with feature metadata.version that is unknown to the controller (org.apache.kafka.controller.ClusterControlManager)
[2025-02-17 09:21:58,400] WARN [QuorumController id=1] Broker 4 registered with feature metadata.version that is unknown to the controller (org.apache.kafka.controller.ClusterControlManager)

Broker

[2025-02-14 14:00:05,909] WARN [ReplicaFetcher replicaId=4, leaderId=6, fetcherId=0] Partition transientChatter-events-5 marked as failed (kafka.server.ReplicaFetcherThread)
[2025-02-14 14:00:05,909] WARN [ReplicaFetcher replicaId=4, leaderId=6, fetcherId=0] Partition transientChatter-events-0 marked as failed (kafka.server.ReplicaFetcherThread)
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
Running in KRaft mode...
SSL is enabled.
===> Running preflight checks ...
===> Check if /var/lib/kafka/data is writable ...
===> Running in KRaft mode, skipping Zookeeper health check...
===> Using provided cluster id <id> ...
2025-02-17 09:21:50.754 | main | INFO | io.prometheus.jmx.JavaAgent | Starting ... 2025-02-17 09:21:51.071 | main | INFO | io.prometheus.jmx.JavaAgent | HTTP enabled [true] 2025-02-17 09:21:51.071 | main | INFO | io.prometheus.jmx.JavaAgent | HTTP host:port [0.0.0.0:8091] 2025-02-17 09:21:51.071 | main | INFO | io.prometheus.jmx.JavaAgent | OpenTelemetry enabled [false] 2025-02-17 09:21:51.120 | main | INFO | io.prometheus.jmx.JavaAgent | Running ... Log directory /data/cpkafka-data is already formatted. Use --ignore-formatted to ignore this directory and format the others.
===> Launching ...
===> Launching kafka ...
2025-02-17 09:21:52.230 | main | INFO | io.prometheus.jmx.JavaAgent | Starting ...
2025-02-17 09:21:52.540 | main | INFO | io.prometheus.jmx.JavaAgent | HTTP enabled [true]
2025-02-17 09:21:52.540 | main | INFO | io.prometheus.jmx.JavaAgent | HTTP host:port [0.0.0.0:8091]
2025-02-17 09:21:52.540 | main | INFO | io.prometheus.jmx.JavaAgent | OpenTelemetry enabled [false]
2025-02-17 09:21:52.589 | main | INFO | io.prometheus.jmx.JavaAgent | Running ...

Nothing indicating any issue at all…

@Rand hope you doing well.

currently I do face the same issue.

running plain Kafka 3.8.1 on podman container.
I wondering if you found a solution for it?

Hi,

as is said I restarted and I think it never happened again, or I would have prodded @mmuehlbeyer more;)

Sorry I can’t be of more help, good luck.

Regards

Hello @Rand,
thx for you feedback. and FYI I did restart as well and it worked for the first Session, unfortunately the next day/next session it appears again.

Seems similar to this issue…

What distribution and version of Java does your image contain? I’m reaching here but maybe this OpenJDK issue in Java 8 and 11 is relevant (fixed in 17).