Topic disk size utilised

Hi all

Is there a way to find out the amount of space a topic is occupying and number of messages on it… total… as per retention setting.

G

anyone…

curious to try and throw 100 000 records structured as JSON at one topic and then do the same using ProtoBuf at a second topic,
See the space utilised different and then with this a descent number see the product time…

G

still hoping someone can point me to how to calculate the space consumed by a topic…

performance wise i’ve found my protobuf version did 8000/sec, the json only got to 500, so space is def not going to be a decider between the 2, it’s performance vs upstream sources.

This tutorial is a kcat-based approach for counting messages.

For determining the space that a topic uses, take a look at Apache Kafka’s kafka-log-dirs utility. You would need to sum up the byte sizes returned for each partition:

$ kafka-log-dirs --describe --bootstrap-server broker:9092 --topic-list foo
Querying brokers for log directories information
Received log directory information from brokers 1
{
  "brokers": [
    {
      "broker": 1,
      "logDirs": [
        {
          "partitions": [
            {
              "partition": "foo-4",
              "size": 588,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "foo-5",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "foo-2",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "foo-3",
              "size": 0,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "foo-0",
              "size": 392,
              "offsetLag": 0,
              "isFuture": false
            },
            {
              "partition": "foo-1",
              "size": 196,
              "offsetLag": 0,
              "isFuture": false
            }
          ],
          "error": null,
          "logDir": "/var/lib/kafka/data"
        }
      ]
    }
  ],
  "version": 1
}

Note that this is what the Kafka admin client’s describeLogDirs method returns. It’s the byte size on disk of a partition’s .log file. It doesn’t contain the other supporting data on disk, e.g., index and timeindex files. To get the total size needed to support a topic, you’d need broker access where you can run du. E.g., for a topic foo, you’d go on all brokers and run:

$ du -ch foo-*
12K	foo-0
12K	foo-1
8.0K	foo-2
8.0K	foo-3
12K	foo-4
8.0K	foo-5
60K	total
1 Like