Hi,
We had a critical error in our kafka cluster yesterday. As a result, the cluster has been shutdown. And we needed to restart the kafka service immediately.
And everything works fine for now.
After that operation we tried to investigate the problem but couldn’t figure it out.
I simplified the log file and provided as an attachment. If you see it, tells us java.nio.file.NoSuchFileException: /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log
We didnt’t see any problem in disk storage. After restarted the kafka service, it shows [2022-03-18 17:49:17,573] INFO [Log partition=__consumer_offsets-40, dir=/opt/data/kafka] Found file /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log.swap from interrupted swap operation. (kafka.log.Log)
So it seems, for any reason it couldn’t write data but after restart operation, it finds an interrupted operation and says that I can fix it. So we couldn’t find anything that causes this problem.
Our kafka broker runs on debian 11 as in a vm, temurin-11-jdk-amd64 is used. Kafka version is 2.2.1.
Any idea? (edited)
[2022-03-18 17:32:29,482] ERROR Failed to clean up log for __consumer_offsets-40 in dir /opt/data/kafka due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:430)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:489)
at kafka.log.Log.asyncDeleteSegment(Log.scala:1960)
at kafka.log.Log.$anonfun$replaceSegments$6(Log.scala:2023)
at kafka.log.Log.$anonfun$replaceSegments$6$adapted(Log.scala:2018)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.Log.replaceSegments(Log.scala:2018)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:582)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:512)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:511)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.Cleaner.doClean(LogCleaner.scala:511)
at kafka.log.Cleaner.clean(LogCleaner.scala:489)
at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:350)
at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Suppressed: java.nio.file.NoSuchFileException: /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log -> /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log.deleted
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:417)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:803)
... 17 more
[2022-03-18 17:32:29,534] INFO [ReplicaManager broker=24] Stopping serving replicas in dir /opt/data/kafka (kafka.server.ReplicaManager)
[2022-03-18 17:32:29,617] INFO [ReplicaFetcherManager on broker 24] Removed fetcher for partitions Set(__consumer_offsets-22, and many more )
[2022-03-18 17:32:29,643] ERROR Failed to clean up log for __consumer_offsets-40 in dir /opt/data/kafka due to IOException (kafka.server.LogDirFailureChannel)
[2022-03-18 17:32:29,643] ERROR Failed to clean up log for __consumer_offsets-40 in dir /opt/data/kafka due to IOException (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log.swap
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:430)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:806)
at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:489)
at kafka.log.Log.$anonfun$replaceSegments$7(Log.scala:2026)
at kafka.log.Log.$anonfun$replaceSegments$7$adapted(Log.scala:2026)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.Log.replaceSegments(Log.scala:2026)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:582)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:512)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:511)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.Cleaner.doClean(LogCleaner.scala:511)
at kafka.log.Cleaner.clean(LogCleaner.scala:489)
at kafka.log.LogCleaner$CleanerThread.cleanLog(LogCleaner.scala:350)
at kafka.log.LogCleaner$CleanerThread.cleanFilthiestLog(LogCleaner.scala:319)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:300)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
Suppressed: java.nio.file.NoSuchFileException: /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log.swap -> /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:417)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:803)
... 16 more
---------
[2022-03-18 17:49:17,573] INFO [Log partition=__consumer_offsets-40, dir=/opt/data/kafka] Found file /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log.swap from interrupted swap operation. (kafka.log.Log)
----------
[2022-03-18 17:50:45,024] INFO Registered broker 24 at path /brokers/ids/24 with addresses: ArrayBuffer(EndPoint(10.xxx.xxx.24,9092,ListenerName(PLAINTEXT),PLAINTEXT)), czxid (broker epoch): 25769845019 (kafka.zk.KafkaZkClient)
[2022-03-18 17:50:45,026] WARN No meta.properties file under dir /opt/data/kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2022-03-18 17:50:45,087] INFO [ExpirationReaper-24-topic]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-03-18 17:50:45,095] INFO [ExpirationReaper-24-Heartbeat]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-03-18 17:50:45,095] INFO [ExpirationReaper-24-Rebalance]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2022-03-18 17:50:45,130] INFO [GroupCoordinator 24]: Starting up. (kafka.coordinator.group.GroupCoordinator)
[2022-03-18 17:50:45,131] INFO [GroupCoordinator 24]: Startup complete. (kafka.coordinator.group.GroupCoordinator)
[2022-03-18 17:50:45,135] INFO [GroupMetadataManager brokerId=24] Removed 0 expired offsets in 4 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2022-03-18 17:50:45,154] INFO [ProducerId Manager 24]: Acquired new producerId block (brokerId:24,blockStartProducerId:22000,blockEndProducerId:22999) by writing to Zk with path version 23 (kafka.coordinator.transaction.ProducerIdManager)
[2022-03-18 17:50:45,181] INFO [TransactionCoordinator id=24] Starting up. (kafka.coordinator.transaction.TransactionCoordinator)
[2022-03-18 17:50:45,183] INFO [TransactionCoordinator id=24] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[2022-03-18 17:50:45,184] INFO [Transaction Marker Channel Manager 24]: Starting (kafka.coordinator.transaction.TransactionMarkerChannelManager)
[2022-03-18 17:50:45,228] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2022-03-18 17:50:45,238] WARN [Log partition=xxxxxxx-0, dir=/opt/data/kafka] retention.ms for topic xxxxxxx is set to 432000000. It is smaller than message.timestamp.difference.max.ms's value 9223372036854775807. This may result in frequent log rolling. (kafka.log.Log)
[2022-03-18 17:50:45,252] INFO [SocketServer brokerId=24] Started data-plane processors for 1 acceptors (kafka.network.SocketServer)
[2022-03-18 17:50:45,265] INFO Kafka version: 2.2.1 (org.apache.kafka.common.utils.AppInfoParser)
[2022-03-18 17:50:45,265] INFO Kafka commitId: 55783d3133a5a49a (org.apache.kafka.common.utils.AppInfoParser)
[2022-03-18 17:50:45,267] INFO [KafkaServer id=24] started (kafka.server.KafkaServer)
---------
[2022-03-18 17:51:44,785] INFO [Log partition=__consumer_offsets-40, dir=/opt/data/kafka] Deleting segment 0 (kafka.log.Log)
[2022-03-18 17:51:44,787] INFO Deleted log /opt/data/kafka/__consumer_offsets-40/00000000000000000000.log.deleted. (kafka.log.LogSegment)
[2022-03-18 17:51:44,792] INFO Deleted offset index /opt/data/kafka/__consumer_offsets-40/00000000000000000000.index.deleted. (kafka.log.LogSegment)
[2022-03-18 17:51:44,793] INFO Deleted time index /opt/data/kafka/__consumer_offsets-40/00000000000000000000.timeindex.deleted. (kafka.log.LogSegment)