I’m deploying Kafka in KRaft mode and utilizing the default log storage paths within the /tmp
directory, specifically for various node configurations (/tmp/kraft-combined-logs/
, /tmp/kraft-broker-logs/
, /tmp/kraft-controller-logs/
). Given the ephemeral nature of /tmp
in Linux environments—where it’s cleared upon reboot or periodically—I’m concerned about the resilience of Kafka’s data, particularly in a production setting.
Context: My Kafka setup consists of three separate controllers and brokers. Despite the systemd unit configurations designed to mitigate risks (e.g., setting KAFKA_CLUSTER_ID
and pre-formatting storage to ensure node identification even if /tmp
is cleared), I’m uncertain about the implications of potential data loss for logs stored in /tmp
.
Specifically, my questions are:
- What impact does clearing the
/tmp
directory have on Kafka’s operation in KRaft mode? - Are there Kafka mechanisms or best practices to recover or rebuild logs if they are lost due to
/tmp
clearance?
Attempts to Solve:
- Maintained default log directory paths to assess implications and see what is next ?
- Applied systemd unit configurations for resilience, including pre-formatting storage with
KAFKA_CLUSTER_ID
to safeguard againstmeta.properties
file loss. Example below:
Environment=KAFKA_CLUSTER_ID="XXXXXXXXXX"
ExecStartPre=/bin/bash -c '/opt/kafka/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c /opt/kafka/config/kraft/controller.properties --ignore-formatted'
What I add to systemd unit might recover meta.properities
which contains critical information such as node.id
, version
, and cluster.id
. But, what about the other files/ logs in that directory ? What will happen if they got lost ?
Goal: I aim to understand potential risks and establish a recovery strategy for Kafka data stored in /tmp
, ensuring the system’s stability and integrity in production.
Thank you for your guidance.