Rdkafka metadata cache not used when producing messages?

gpou · 5 October 2023 07:59

Hello there!
We have a producer implemented in Ruby, using the rdkafka-ruby gem.
We produce messages to a topic with 12 partitions, using a partition key (a string). We use the default partitioner.
We observed that for each message produced, the underlying rdkafka library is sending 1 request to fetch the topic metadata from the Kafka clusters, which is needed by the partitioner to know the number of partitions.
Before observing this behaviour, we assumed that the metadata cache (based on the setting topic.metadata.refresh.interval.ms) was used when producing messages, and that leaving that setting with the default of 5 minutes, we would see only 1 request to kafka to get the metadata. But it’s not the case.
Is this the expected behaviour of rdkafka, or are we doing something wrong when setting up the client?
As far as we understand it, this metadata cahe is provided not by the ruby gem, but by the librdkafka library, so we guess that the same behaviour is happening with other non-ruby clients.

gpou · 5 October 2023 08:04

In order to bypass this problem, for now we have added a monkey patch to the ruby/rdkafka producer class, very similar to the one that Karafka had for a while: Introduce partition count cache and improve retries by mensfeld · Pull Request #286 · karafka/waterdrop · GitHub
But this patch was removed from the Karafka code after upgrading to newer rdkafka versions.
We are using ruby-rdkafka 0.13.0, which uses librdkafka 2.0.2

maciejmensfeld · 6 October 2023 18:11

Hey,

Maciej here. Maintainer of karafka-rdkafka, karafka, waterdrop and rdkafka-ruby

Please use karafka-rdkafka for now that is API compatible with rdkafka-ruby and that has needed cache in place.

I am working on porting and merging both karafka-rdkafka and rdkafka-ruby together.

The reason why this was moved away from waterdrop was because it was added to karafka-rdkafka.

maciejmensfeld · 29 October 2023 14:46

Backported to rdkafka-ruby here:

ref: Introduce partition count cache key for `partition_key` usage by mensfeld · Pull Request #309 · karafka/rdkafka-ruby · GitHub

Topic		Replies	Views
Using a String Key in Messages - Excessive Increase in Resource Usage Ops	3	1933	9 May 2023
Rdkafka producer performance limits Clients	9	4341	9 November 2021
Librdkafka 1.6.0 , C++ API producer, producing immediately after topic creation SOMETIMES result in errors (in RdKafka::Producer::produce call) Non-Java Clients	0	2970	14 September 2022
Is there any API using which kafka client can force the metadata refresh for stream application? Stream Processing	0	1337	13 January 2024
Data in the cache volume grows continuously Kafka Streams	0	2789	26 September 2022

Rdkafka metadata cache not used when producing messages?

Related topics