Using kafka as configuration storage

I would like confirmation that the approach we want to take makes sense.
We want to use kafka also as configuration storage.

We have topics such as images-jpg, and images-png with many partitions: consumers subscribe to these topics and let kafka decide how to share the load.

Then we have a configuration topic, where messages (keyed by extension) contain configuration params that tell us how to handle a particular image format.

We want each worker process to consume all configuration messages from all partitions, without ever storing offsets/committing them.
We then want each consumer to subscribe to all possible images-* topics, based on the list of accumulated configuration messages, and let kafka assign partitions.

The approach we want to take is the following:

  • we have one configuration consumer that queries kafka (using list_topics) to know how many partitions the configuration topic is made of
  • we then call configuration_consumer.assign passing the list of all topic partitions
  • once all configuration messages are consumed, we build the list of images topics
  • we then call images_consumer.subscribe with the list of images topics

So overall:

  • 1 configuration consumer that is manually assigned all topic partitions of the configuration topic
  • 1 images consumer that subscribes to the resulting list of images-* topics

This would be a major change for us, as we currently do this stuff based on configuration files, which are getting harder and harder to maintain.
So, before we hit prod, do you see anything wrong with this approach?

There’s nothing inherently wrong with your approach. In fact this is how things like connect-configs works for control the Kafka connect configurations. I would suggest you take a look at the code used there as an example for how to do this.

Do you have a link to share? I must admit I’m pretty new, and I also use kafka in a python shop, so I miss lots of java-only cool stuff

Also, do you think this mixed usage of dynamic assignment (via subscribe) and manual assignment (via assign) could be troublesome?

kafka/ at cd1ce49bdbbd08eff255fdb5795e1fbd647a13c2 · apache/kafka · GitHub would be a good place to start.