How to know what actual disk size is occupied on each broker?

Today I had a case at work where I needed to see how much physical space on disk was occupied on each broker in order to project some volumetry.

I quickly googled and found no compelling answer.

And then I discovered the kafka-log-dirs tool. This tool available on the bin folder of Kafka lets you query the size occupied by each partition by broker.

You can launch to see size information for each topics like this:
kafka-log-dirs --bootstrap-server my-kafka:9092 --describe

You can also specify specific topics to be queried using the --topic-list option.

The result is structured like this:

Querying brokers for log directories information
Received log directory information from brokers 1,2,3,4
{
  "version": 1,
  "brokers": [
    {
      "broker": 1,
      "logDirs": [
        {
          "logDir": "/opt/kafka_data/data",
          "error": null,
          "partitions": [
            {
              "partition": "__consumer_offsets-13",
              "size": 8832,
              "offsetLag": 0,
              "isFuture": false
            }
          ]
        }
      ]
    }
  ]
}

The size is the size occupied by the partition in bytes. A small groovy script allowed me to calculate the total size for a specific topic easily.

2 Likes

Hey,
I wrote two tools that allow you to monitor the disk usage either by topic (which includes all replicas) or by broker.

First is a Prometheus exporter: GitHub - redpanda-data/kminion: KMinion is a feature-rich Prometheus exporter for Apache Kafka written in Go. It is lightweight and highly configurable so that it will meet your requirements.
second is a Kafka UI: GitHub - redpanda-data/console: Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.

Both are open source and free to use :-). Hope that helps. By the way the actual disk utilization (in percent) is not possible to export because the Kafka API does not expose the size / available bytes :confused:

1 Like