I would like confirmation that the approach we want to take makes sense.
We want to use kafka also as configuration storage.
We have topics such as
images-png with many partitions: consumers
subscribe to these topics and let kafka decide how to share the load.
Then we have a
configuration topic, where messages (keyed by extension) contain configuration params that tell us how to handle a particular image format.
We want each worker process to consume all
configuration messages from all partitions, without ever storing offsets/committing them.
We then want each consumer to
subscribe to all possible
images-* topics, based on the list of accumulated configuration messages, and let kafka assign partitions.
The approach we want to take is the following:
- we have one
configurationconsumer that queries kafka (using
list_topics) to know how many partitions the
configurationtopic is made of
- we then call
configuration_consumer.assignpassing the list of all topic partitions
- once all configuration messages are consumed, we build the list of
- we then call
images_consumer.subscribewith the list of images topics
configurationconsumer that is manually
assigned all topic partitions of the
subscribes to the resulting list of
This would be a major change for us, as we currently do this stuff based on configuration files, which are getting harder and harder to maintain.
So, before we hit prod, do you see anything wrong with this approach?