I configured a Kafka cluster using three brokers with Kafka version 3.6.1. I used Kafka Raft for this setup, and it seemed to work without any issues. However, the user of the kafka cluster reported a Kafka-related issue. Upon checking the server.log of the broker, I noticed the following message at the INFO level.
[2024-04-22 13:07:13,785] INFO [SnapshotGenerator id=1] Creating new KRaft snapshot file snapshot 00000000000007077187-0000000044 because we have waited at least 60 minute(s). (org.apache.kafka.image.publisher.SnapshotGenerator)
[2024-04-22 13:07:13,794] INFO [SnapshotEmitter id=1] Successfully wrote snapshot 00000000000007077187-0000000044 (org.apache.kafka.image.publisher.SnapshotEmitter)
[2024-04-22 13:07:39,551] INFO [UnifiedLog partition=__cluster_metadata-0, dir=/mnt/app/kafka_logs] Incremented log start offset to 6157538 due to snapshot generated (kafka.log.UnifiedLog)
[2024-04-22 13:07:39,555] INFO [MetadataLog partition=__cluster_metadata-0, nodeId=1] Marking snapshot OffsetAndEpoch(offset=6150339, epoch=36) for deletion (kafka.raft.KafkaMetadataLog)
[2024-04-22 13:08:39,555] INFO Deleted snapshot files for snapshot OffsetAndEpoch(offset=6150339, epoch=36). (org.apache.kafka.snapshot.Snapshots)
[2024-04-22 13:12:39,550] INFO [UnifiedLog partition=__cluster_metadata-0, dir=/mnt/app/kafka_logs] Incremented log start offset to 6164737 due to snapshot generated (kafka.log.UnifiedLog)
[2024-04-22 13:12:39,554] INFO [MetadataLog partition=__cluster_metadata-0, nodeId=1] Marking snapshot OffsetAndEpoch(offset=6157538, epoch=36) for deletion (kafka.raft.KafkaMetadataLog)
[2024-04-22 13:13:39,554] INFO Deleted snapshot files for snapshot OffsetAndEpoch(offset=6157538, epoch=36). (org.apache.kafka.snapshot.Snapshots)
[2024-04-22 13:13:44,792] INFO [AddPartitionsToTxnManager broker=1]Node 1 disconnected. (org.apache.kafka.clients.NetworkClient)
[2024-04-22 13:13:44,842] INFO [TransactionCoordinator id=1] Node 1 disconnected. (org.apache.kafka.clients.NetworkClient)
[2024-04-22 13:16:44,106] INFO [GroupCoordinator 1]: Preparing to rebalance group subs_1613957521038137715-baas-trace-service-prd in state PreparingRebalance with old generation 777 (__consumer_offsets-48) (reason: Removing member pid:0,hostname:baas-trace-service-job-tx-7464cfbb46-lc5bn-63e18bef-d4ff-47d2-adb6-7830279ee3af on LeaveGroup; client reason: not provided) (kafka.coordinator.group.GroupCoordinator)
[2024-04-22 13:16:44,106] INFO [GroupCoordinator 1]: Group subs_1613957521038137715-baas-trace-service-prd with generation 778 is now empty (__consumer_offsets-48) (kafka.coordinator.group.GroupCoordinator)
[2024-04-22 13:16:44,106] INFO [GroupCoordinator 1]: Member MemberMetadata(memberId=pid:0,hostname:baas-trace-service-job-tx-7464cfbb46-lc5bn-63e18bef-d4ff-47d2-adb6-7830279ee3af, groupInstanceId=None, clientId=pid:0,hostname:baas-trace-service-job-tx-7464cfbb46-lc5bn, clientHost=/10.20.54.128, sessionTimeoutMs=30000, rebalanceTimeoutMs=60000, supportedProtocols=List(PartitionAssignerByPartitionId)) has left group subs_1613957521038137715-baas-trace-service-prd through explicit `LeaveGroup`; client reason: not provided (kafka.coordinator.group.GroupCoordinator)
Although it’s at the INFO level, it appears that the broker is being disconnected. I’m wondering if I misconfigured something or if this is a normal situation. Here is my server.properties.
############################# Server Basics #############################
# The role of this server. Setting this puts us in KRaft mode
process.roles=broker,controller
# The node id associated with this instance's roles
node.id=2
# The connect string for the controller quorum
controller.quorum.voters=1@baas-kafka-1.luniverse.com:9093,2@baas-kafka-2.luniverse.com:9093,3@baas-kafka-3.luniverse.com:9093
############################# Socket Server Settings #############################
# The address the socket server listens on.
# Combined nodes (i.e. those with `process.roles=broker,controller`) must list the controller listener here at a minimum.
# If the broker listener is not defined, the default listener will use a host name that is equal to the value of java.net.InetAddress.getCanonicalHostName(),
# with PLAINTEXT listener name, and port 9092.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://baas-kafka-2.luniverse.com:9092,CONTROLLER://baas-kafka-2.luniverse.com:9093
# Name of listener used for communication between brokers.
inter.broker.listener.name=PLAINTEXT
# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
#advertised.listeners=PLAINTEXT://localhost:9092
# A comma-separated list of the names of the listeners used by the controller.
# If no explicit mapping set in `listener.security.protocol.map`, default will be using PLAINTEXT protocol
# This is required if running in KRaft mode.
controller.listener.names=CONTROLLER
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3
# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
log.dirs=/mnt/app/kafka_logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=268435456
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=600000
default.replication.factor=3
min.insync.replicas=2
Thanks in advance for your advice!